The goal of this project was to gather and analyze data on the top computer science institutions and their faculties around the world, as ranked by the website CS Rankings which is a metrics-based ranking of top computer science institutions around the world. We gathered our data into two files. The first data we gathered includes information about university rankings, the count of geometric mean papers published across all areas, and the number of faculty members who have published papers in our areas of concern (Computer Architecture, Computer Networks, Computer Security, Operating Systems, Programming Languages, and Software Engineering). The other data file containted the names of the faculties of our conceren universities along with university name, pubs and adj. We took the ranking of the whole world from 2000 to 2022 and found approximately 500 universities.
To gather the data, we used a Python script and the Selenium library to scrape the CS Rankings website. The example of the raw data files obtained from the website are shown below:
We used our scraped data to answer the following questions:
- What are the top 10 universities in terms of ranking?
- What are the top 10 countries with the most number of publications (i.e. count score)?
- What are the top 10 countries with most universities?
- Is there any correlation between the ranking of universities and the count and faculty number?
- Which faculties had the most number of publications?
We used Tableau to visualize and analyze our data. Some of our findings include:
- American universities have the highest ranks, with Carnegie Mellon University having the highest rank.
- America has highest number of publication score with other countries at weighing at one sixth of America's publication points.
- America takes the first position by a huge margin with 172 universities, followed by Germany with 57 universities.
- There is a strong correlation between the ranking and the count score but relatively weaker correlation between rank and the number of faculty members.
- From the list of top faculties based on publications, two from ETH Zurich and Univ. of California - Santa Barbara made into the list while the others were from different universities.
Check the interactive Tableau dashboard to get more information of each datapoint
Here is the second interactive Tableau dashboard
-
Open the command prompt (Windows) or terminal (Linux/Mac) and navigate to the desired directory.
-
Clone the repository on your computer using the following command:
git clone https://github.com/AbrarAdnan/Data-Driven-Ranking-of-Top-CS-Universities.git
- Navigate to the downloaded project folder\
cd Data-Driven-Ranking-of-Top-CS-Universities
- Initialize and activate the virtual environment after navigating into the project folder
On Windows:
virtualenv venv
venv\Scripts\activate
On Mac/Linux:
virtualenv --no-site-packages venv
source venv/bin/activate
- Install Dependencies
pip install -r requirements.txt
- Run the scraper
python scraper.py
(OPTIONAL) You can add an argument to choose your desired browser for scraping.
For example: python scraper.py --browser firefox
You can put firefox chrome or edge as a choice but firefox has faster scraping speed and it's used by default
- After the script has finished running, it will save the scraped data in a file called best_uni_list.csv in the project directory.
- The chromedriver file is included in the repository
- The code needs python to run. Download python for Windows, Linux or MAC OS.
- While running the code, a window of Chrome will appear. You can see the website it is working on in real time. You can also check the console/terminal for additional output messages to get a better understanding of what is happening in real time.
- The script will take approximately 46 minutes on firefox to run and produce output. Time may vary depending on the browser.
- While loading the csv file into tableau, Taiwan needs to be identified manually or it'll not be recognized with Tableau's country database.