Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Chemical Analysis Calculator, with full solution display.

Chemicology Chemical Analysis Calculator, to solve problems efficiently by displaying whole solution. Go to releases for downloading .exe, .dmg, Linux

Muhammad Moazzam 2 Aug 06, 2022
0CD - BinaryNinja plugin to introduce some quality of life utilities for obsessive compulsive CTF enthusiasts

0CD Author: b0bb Quality of life utilities for obsessive compulsive CTF enthusia

12 Sep 14, 2022
OnTime is a small python that you set a time and on that time, app will send you notification and also play an alarm.

OnTime Always be OnTime! What is OnTime? OnTime is a small python that you set a time and on that time, app will send you notification and also play a

AmirHossein Mohammadi 11 Jan 16, 2022
Extract continuous and discrete relaxation spectra from G(t)

pyReSpect-time Extract continuous and discrete relaxation spectra from stress relaxation modulus G(t). The papers which describe the method and test c

3 Nov 03, 2022
ใ€Œ๐Ÿ“–ใ€Tool created to extract metadata from a domain

Metafind is an OSINT tool created with the aim of automating the search for metadata of a particular domain from the search engine known as Google.

9 Dec 28, 2022
Binjago - Set of tools aiding in analysis of stripped Golang binaries with Binary Ninja

Binjago ๐Ÿฅท Set of tools aiding in analysis of stripped Golang binaries with Bina

W3ndige 2 Jul 23, 2022
Collection of Beginner to Intermediate level Python scripts contributed by members and participants.

Hacktoberfest2021-Python Hello there! This repository contains a 'Collection of Beginner to Intermediate level Python projects', created specially for

12 May 25, 2022
Traits for Python3

Do you like Python, but think that multiple inheritance is a bit too flexible? Are you looking for a more constrained way to define interfaces and re-use code?

121 Nov 15, 2022
Ssma is a tool that helps you collect your badges in a satr platform

satr-statistics-maker ssma is a tool that helps you collect your badges in a satr platform ๐ŸŽ–๏ธ Requirements python = 3.7 Installation first clone the

TheAwiteb 3 Jan 04, 2022
Convert-Decimal-to-Binary-Octal-and-Hexadecimal

Convert-Decimal-to-Binary-Octal-and-Hexadecimal We have a number in a decimal number, and we have to convert it into a binary, octal, and hexadecimal

Maanyu M 2 Oct 08, 2021
A module to develop and apply old-style links

Old-Linkage-Dev (OLD) Old Linkage Development is a module to develop and apply old-style links. Old-style links stand for some traditional or conventi

Tarcadia 2 Dec 04, 2021
SmartGrid - Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper & Lars Zwaan

SmartGrid - Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper & Lars Zwaan

1 Jan 12, 2022
Forward RSS feeds to your email address, community maintained

Getting Started With rss2email We highly recommend that you watch the rss2email project on GitHub so you can keep up to date with the latest version,

248 Dec 28, 2022
Final project for ENGG 5402 Advanced Robotics in CUHK

Final project Final project Update Foundations Ubuntu virtual machine Ubuntu How to use Github to keep tracking the change of code version? Docker Set

Junjia Liu 8 Aug 01, 2022
Here is my Senior Design Project that I implemented to graduate from Computer Engineering.

Here is my Senior Design Project that I implemented to graduate from Computer Engineering. It is a chatbot made in RASA and helps the user to plan their vacation in the Turkish language. In order to

Ezgi SubaลŸฤฑ 25 May 31, 2022
importlib_resources is a backport of Python standard library importlib.resources module for older Pythons.

importlib_resources is a backport of Python standard library importlib.resources module for older Pythons. The key goal of this module is to replace p

Python 36 Dec 13, 2022
1. ๋„ค์ด๋ฒ„ ์นดํŽ˜ ๋Œ“๊ธ€์„ ๋นจ๋ฆฌ ๋‹ค๋Š” ๊ธฐ๋Šฅ

naver_autoprogram ๊ธฐ๋Šฅ ์„ค๋ช… ๋„ค์ด๋ฒ„ ์นดํŽ˜ ๋Œ“๊ธ€์„ ๋นจ๋ฆฌ ๋‹ค๋Š” ๊ธฐ๋Šฅ ๋„ค์ด๋ฒ„ ์นดํŽ˜ ์ž๋™ ์ถœ์„ ์ฒดํฌ ๊ธฐ๋Šฅ ๋™์ž‘ ๋ฐฉ์‹ ์นดํŽ˜ ๋Œ“๊ธ€ ๊ธฐ๋Šฅ ๊ธฐ๋ณธ ๋™์ž‘์€ ์ฃผ๊ธฐ์ ์ธ ์Šค์ผ€์ฅด ๋™์ž‘์œผ๋กœ ํ•ด๋‹น ์นดํŽ˜ ID ์™€ ํŠน์ • API ์ฃผ์†Œ๋กœ ๋Œ€์ƒ์ด ์ƒˆ๊ธ€์„ ์ž‘์„ฑํ–ˆ๋Š”์ง€ ์ฒดํฌ. ํ•ด๋‹น ๋Œ€์ƒ์ด ์ƒˆ๊ธ€ ๋“ฑ

1 Dec 22, 2021
An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Neradoc 5 May 01, 2022
This is a modified variation of abhiTronix's vidgear. In this variation, it is possible to write the output file anywhere regardless the permissions.

Info In order to download this package: Windows 10: Press Windows+S, Type PowerShell (cmd in older versions) and hit enter, Type pip install vidgear_n

Ege Akman 3 Jan 30, 2022
Create N Share is a No Code solution which gives users the ability to create any type of feature rich survey forms with ease.

create n share Note : The Project Scaffold will be pushed soon. Create N Share is a No Code solution which gives users the ability to create any type

Chiraag Kakar 11 Dec 03, 2022