Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
A web application which you can search, buy or sell shares with current prices which provided by IEX.

CS50 - Stock Exchange A web application which you can search, buy or sell shares with current prices which provided by IEX. Table of Contents Setup St

1 May 28, 2022
Random pass word generator made with python. PyQt5 module is used to design GUI.

Differences in this GUI program : Default titlebar removed Custom Minimize,Maximize and Close Buttons Drag & move window from any point Program work l

Dimuth De Zoysa 1 Jan 26, 2022
Virtual Assistant Using Python

-Virtual-Assistant-Using-Python Virtual desktop assistant is an awesome thing. If you want your machine to run on your command like Jarvis did for Ton

Bade om 1 Nov 13, 2021
The goal of this program was to find the most common color in my living room.

The goal of this program was to find the most common color in my living room. I found a dataset online with colors names and their corr

1 Nov 09, 2021
inverted pendulum fuzzy control python code (python 2.7.18)

inverted-pendulum-fuzzy-control- inverted pendulum fuzzy control python code (python 2.7.18) We have 3 general functions for 3 main steps: fuzzificati

arian mottaghi 4 May 23, 2022
Python Control Systems Library

The Python Control Systems Library is a Python module that implements basic operations for analysis and design of feedback control systems.

Control Systems Library for Python 1.3k Jan 06, 2023
dta Convert Dict To Attributes!

dta (Dict to Attributes) dta is very small dict (or json) to attributes converter. It is only have 1 files and applied to every python versions.

Rukchad Wongprayoon 0 Dec 31, 2021
OpenTable Reservation Maker For Python

OpenTable-Reservation-Maker The code that corresponds with this blog post on writing a script to make reservations for me on opentable Getting started

JonLuca De Caro 36 Nov 10, 2022
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

High-Performance and Automatic Computing 151 Dec 24, 2022
A lightweight solution for local Particle development.

neopo A lightweight solution for local Particle development. Features Builds Particle projects locally without any overhead. Compatible with Particle

Nathan Robinson 19 Jan 01, 2023
A short course on Julia and open-source software development

Advanced Scientific Computing: producing better code This course is taught as a 6-session "nanocourse" at Washington University in St. Louis. See the

Tim Holy 230 Jan 07, 2023
The official Repository wherein newbies into Open Source can Contribute during the Hacktoberfest 2021

Hacktoberfest 2021 Get Started With your first Contrinution/Pull Request : Fork/Copy the repo by clicking the right most button on top of the page. Go

HacOkars 25 Aug 20, 2022
This is a Saleae Logic custom high level analyzer that allows you to search and mark specific packets.

SaleaePacketParser This is a Saleae Logic custom high level analyzer that allows you to search and mark specific packets. Field "Search For" is used f

1 Dec 16, 2021
Better GitHub statistics images for your profile, with stats from private and public repos

Better GitHub statistics images for your profile, with stats from private and public repos

Jacob Strieb 2k Dec 30, 2022
PORTSCANNING-IN-PYTHON - A python threaded portscanner to scan websites and ipaddresses

PORTSCANNING-IN-PYTHON This is a python threaded portscanner to scan websites an

1 Feb 16, 2022
When should you berserk in lichess arena tournament games?

When should you berserk in a lichess arena tournament game? 1+0 arena tournament 3+0 arena tournament Explanation For details on how I arrived at the

18 Aug 03, 2022
Logo DYS (Doküman Yönetim Sitemi) API Python Implementation

dys-connector Logo DYS (Dokuman Yonetim Sistemi) API Python Implementation Python Package: https://pypi.org/project/dys-connector Quick Start from dys

Logo Group 8 Mar 19, 2022
Extended functionality for Namebase past their web UI

Namebase Extended Extended functionality for Namebase past their web UI.

RunDavidMC 12 Sep 02, 2022
A Brainfuck interpreter written in Python.

A Brainfuck interpreter written in Python.

Ethan Evans 1 Dec 05, 2021
Thumbor-bootcamp - learning and contribution experience with ❤️ and 🤗 from the thumbor team

Thumbor-bootcamp - learning and contribution experience with ❤️ and 🤗 from the thumbor team

Thumbor (by @globocom) 9 Jul 11, 2022