Both raw and processed data is kept in the datawarehouse
folder. Which however isn't committed due to data privacy.
Install the required libraries using the following command
python3 -m pip install -r requirements.txt
You can run the dashboard locally after installing the libraries and visiting the hosted address (e.g. http://127.0.0.1:8050/) using a browser (Google Chrome preferred).
gunicorn app:server --bind=0.0.0.0:8050
To deploy this dashboard to a cloud server like AWS use the following steps:
-
Create an EC2 instance. Run and connect to it.
-
Install Git and clone this repository:
git clone https://github.com/UVA-MLSys/Financial-Aid.git sudo yum update -y sudo yum install git -y
-
Environment: Create a virtual env and activate it. Install the required libraries:
sudo apt install python3.12 python3 -m venv .venv source .venv/bin/activate python3 -m pip install --upgrade pip pip install -r requirements.txt
-
Data: Upload the
Merged.csv
file in thedatawarehouse
folder. Since there is no drag and drop, once easy way is to upload to Google drive, get a share link and use the file id from there to download inside thedatawarehouse
folder:gdown shared_file_id
. -
In
app.py
replace host address127.0.0.1
with0.0.0.0
. The port 8050 is fine. -
Network: Deploy the app with
python app.py
. This runs the app inside the instance and can be accessed though thehttp://ec2_ip_address:port
as long as the EC2 network security group contains an inbound rule allows TCP connections to that port (8050) for incoming IPs. -
Scale: To scale up use binding through gunicorn. For example,
gunicorn --workers 3 app:server --bind=0.0.0.0:8050
creates 3 worker processes for deployment (maximum 3 people will be able to connect at a time). To upgrade the EC2 instance, stop the instance first, then change the instance type. -
Persistence: To keep the server running even when you are not connected to the EC2 console:
screen gunicorn --workers 3 app:server --bind=0.0.0.0:8050 &
.- As long as the EC2 is running, the server will run in background.
- Use
ps -X
to find the running server processes.kill process_id
if you want to terminate the run and deploy a new version.
-
If you want to host to an address without having to add port number
8050
at the end, you have to run it on default port 80.- This is not permitted by default and you'll receive permission errors.
- Install
nginx
and configure it at a mirror proxy server to redirect port gunicorn8050
traffic to nginx port80
.
-
To deploy to an
https
server you'll need a- Domain that can give you a SSL certificate
- Target group that includes your EC2 instance and the port it must redirect.
- Load balancer for http and https listeners attached to the target group.
- Route 53 records for your domain to catch traffic for your load balancer and give a nice domain name as alias to the load-balancer address.