GitHub - kdurril/umdcoursecatalog: This scrapes the course catalog of the University of Maryland at College Park

This scrapes the course catalog of the University of Maryland at College Park. The University has no API for its catalog, so this tool gathers the information. The information is useful for aggregate analysis of courses, departments, and the University.

The tool is structured for output as CSV files with semester and course number as a composite primary key for sections for an RDBMS.

This scrapes https://ntst.umd.edu/soc/. This site lists catalogs beginning in Spring 2013. For semesters prior to spring 2013, see https://www.sis.umd.edu/bin/seats?term=201208. This crawler does not cover the older material.

Using the tool See http://doc.scrapy.org/en/latest/index.html for official documentation of Scrapy.

Set the term prior to using this tool in each spider file. The format for a term is YearSemester such as 201408 for fall 2014. The semester numbers are as follows: fall = 08 spring = 01 winter = 12 - note that winter term for January 2015 courses is 201412 summerI = 05 summerII = 07

Adjust the spiders to adjust crawling behavior Adjust items.py to add items to pipeline.

It has 3 spiders: testudo_crawler_dept testudo_crawler_courses testudo_crawler_section

Output files should be: departments.csv courses.csv sections.csv

From command line: scrapy crawl testudo_dept -o departments.csv -t csv scrapy crawl testudo_courses -o courses.csv -t csv scrapy crawl testudo_sections -o sections.csv -t csv

Know issues: Undergraduate honors courses are added to the base course records. Courses that end with letter "h" are honors courses. The final 0101 section of a standard course with an honors section is usually the honors course.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
mytestudo		mytestudo
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

kdurril/umdcoursecatalog

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages