Datapane is an API-driven product that provides client libraries / commandline applications that talk to an API server to handle and process datasets.
This challenge involves building a small API server and corresponding command-line client, both in Python, that allow uploading and processing CSV files, such as those included in the repo.
For this task we'll be building a very simple Python 3-based REST API server and command-line client.
You may use any Python libraries and technologies of your choice, for instance FastAPI, etc, to build the API server.
The API server should have a single root endpoint, /dataset/
, that allows list and CRUD operations over a dataset object via the following API / HTTP verbs:
GET /datasets/
- list the uploaded datasetsPOST /datasets/
- creates a dataset. This endpoint takes a CSV file as input, and stores it somewhere/how on the server as a pandas dataframe. A reference id to this created object is returned by the endpoint.GET /datasets/<id>/
- return the file name, and size of the dataset objectDELETE /datasets/<id>/
- delete the dataset objectGET /datasets/<id>/excel/
- export the dataset as an excel fileGET /datasets/<id>/stats/
- return the the stats generated by runningdf.describe()
on the pandas dataframe as a json objectGET /datasets/<id>/plot/
- generate and return a PDF containing a list of histograms of all the numerical columns in the dataset
The client app should be a fully standalone command-line python application, that is easily installable and runnable. The app should provide command-line arguments that correspond and support each of the API actions above - how you structure the command line arguments and what you call them is left up to yourselves.
list
- list the uploaded datasetsclear
- delete all the datasetscreate name-of-dataset.csv
- creates a dataset. This endpoint takes a CSV file as input, and stores it somewhere/how on the server as a pandas dataframe. A reference id to this created object is returned by the endpoint.delete name-of-dataset.csv
- delete the dataset objectinfo name-of-dataset.csv
- return the file name, and size of the dataset object
- A Python 3 framework that supports generating JSON APIs (FastAPI)
- Build systems, tools, and scripts of your choice, e.g. poetry, setup.py, docker, etc.
- Any libraries you may find useful to help your task, we prioritise using existing libraries to accomplish tasks rather than building in-house
- Multi-user support and log-in is NOT required for this project
- Instructions should be provided on how to build / bundle / start the system
- You should aim to use the latest Python language features, ecosystem, tooling, and libraries where possible
- As CSVs can be untrusted, you should consider running the CSV importing within a container / sandbox
-
Start the server
cd name-of-server-dir
uvicorn server:app
-
Start the client
cd name-of-client-dir
python client.py
-
Docs
- 'localhost:8000/docs' for the server