Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script to fetch new professors/courses #38

Merged
merged 55 commits into from
Jan 30, 2023
Merged
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
724f641
add script to fetch new professors/courses
nsandler1 Nov 3, 2022
d181c58
convert string to semester once
nsandler1 Nov 3, 2022
67a614b
don't rely on api response ordering
nsandler1 Nov 3, 2022
f99cdaf
syntax
nsandler1 Nov 4, 2022
73be408
don't auto capitalize professor names
nsandler1 Nov 7, 2022
8bc72b8
remove unnecessary variable
nsandler1 Nov 9, 2022
2db329b
removed try/catch
nsandler1 Nov 10, 2022
6483460
replace print with comment
nsandler1 Nov 10, 2022
76b27b1
Merge branch 'fuzzy-professors' into update-courses-no-descriptions
nsandler1 Nov 10, 2022
1253635
Merge branch 'professor-alias' into update-courses-no-descriptions
nsandler1 Nov 10, 2022
0f607d0
Revert "Merge branch 'professor-alias' into update-courses-no-descrip…
nsandler1 Nov 11, 2022
8a216f8
Revert "Merge branch 'fuzzy-professors' into update-courses-no-descri…
nsandler1 Nov 11, 2022
e4b4d6b
reorder statements
nsandler1 Nov 10, 2022
d3a40bb
This reverts commit 0f607d0f2eb9574bdfb0a303936bd982472a674d.
nsandler1 Nov 11, 2022
c909d54
Merge branch 'fuzzy-professors' into update-courses-no-descriptions
nsandler1 Nov 11, 2022
23021f0
Merge branch 'professor-alias' into update-courses-no-descriptions
nsandler1 Nov 11, 2022
cfcee1c
make migrations
nsandler1 Nov 11, 2022
b7658c3
Merge branch 'professor-alias' into update-courses-no-descriptions
nsandler1 Nov 11, 2022
a88ddf5
update Professor.find_similar() to match `fuzzy-professors`
nsandler1 Nov 11, 2022
cc1a118
use new method to find similar professors
nsandler1 Nov 11, 2022
b8520b3
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Nov 13, 2022
0ff4bb7
Merge branch 'fuzzy-professors' into update-courses-no-descriptions
nsandler1 Nov 13, 2022
31ccacc
fix api/serializers
nsandler1 Nov 13, 2022
33743d2
fix home/utils
nsandler1 Nov 14, 2022
4284f65
fix home/views/professor
nsandler1 Nov 14, 2022
852b3c1
Merge branch 'master' into update-courses-no-descriptions
tybug Nov 15, 2022
66dc6af
fix rejected professors being associated with recent recent semester
nsandler1 Nov 23, 2022
0c0bb95
Merge branch 'update-courses-no-descriptions' of https://github.com/p…
nsandler1 Nov 23, 2022
ea6f56a
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Dec 11, 2022
0315abb
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Dec 11, 2022
0744016
update slugging process
nsandler1 Dec 11, 2022
708a0ad
simplify Instructor: TBA case
nsandler1 Dec 11, 2022
f128b36
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Dec 19, 2022
b7cfb4e
fix reverted changes
nsandler1 Dec 25, 2022
850f0f0
swap imports
nsandler1 Dec 25, 2022
2524dcb
remove duplicate function
nsandler1 Dec 25, 2022
24cc464
add back newline
tybug Dec 27, 2022
5767e0c
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Jan 6, 2023
862d6a3
delcare professor variable in all cases
nsandler1 Jan 8, 2023
b53edff
use regex to catch cases
nsandler1 Jan 9, 2023
b654c35
customise help text
nsandler1 Jan 9, 2023
0f0e029
check professorAlias
nsandler1 Jan 10, 2023
cb47dda
fix creating duplicate professors
nsandler1 Jan 10, 2023
2366b02
move print statement
nsandler1 Jan 10, 2023
ac1f79d
avoid duplicate entries per semester
nsandler1 Jan 10, 2023
c870b60
Merge branch 'professor-course-dups' into update-courses-no-descriptions
nsandler1 Jan 10, 2023
89a1add
require migration checks
nsandler1 Jan 10, 2023
762c516
update help message
nsandler1 Jan 10, 2023
d22c117
Merge branch 'professor-course-dups' into update-courses-no-descriptions
nsandler1 Jan 11, 2023
6d1d8ba
check for null rows before creating new row
nsandler1 Jan 11, 2023
f53dbb5
reduce number of database hits
nsandler1 Jan 11, 2023
7e8c940
strip professor and course names
nsandler1 Jan 11, 2023
f27a34e
Merge branch 'master' into update-courses-no-descriptions
nsandler1 Jan 14, 2023
e179972
don't require migrations check
nsandler1 Jan 14, 2023
5b68090
add comments
nsandler1 Jan 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 162 additions & 0 deletions home/management/commands/updatecourses.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
import re
import requests
from datetime import datetime

from django.core.management import BaseCommand
from argparse import RawTextHelpFormatter

from home.models import Course, Professor, ProfessorCourse, ProfessorAlias
from home.utils import Semester

class Command(BaseCommand):
help = '''Updates the database with new courses and professors during the provided semester.
The semester argument must be in the numerical form YEAR+SEASON (see ** for exception).
The season codes are as follows:
Spring -> 01
Summer -> 05
Fall -> 08
Winter -> 12
EXAMPLE: Spring 2023 = 202301

** NOTE: Starting from Winter 2021, the values for winter semesters are off by one year. Winter 2021 is actually 202012, not 202112
'''

def __init__(self):
super().__init__()
self.total_num_new_courses = 0
self.total_num_new_professors = 0
self.courses = Course.unfiltered.all()
self.verified_professors = Professor.verified.all()
self.non_rejected_professors = Professor.unfiltered.exclude(status=Professor.Status.REJECTED)
self.aliases = ProfessorAlias.objects.all()
self.professor_courses = ProfessorCourse.objects.all()

def create_parser(self, *args, **kwargs):
parser = super(Command, self).create_parser(*args, **kwargs)
parser.formatter_class = RawTextHelpFormatter
return parser

def add_arguments(self, parser):
parser.add_argument("semesters", nargs='+')
tybug marked this conversation as resolved.
Show resolved Hide resolved

def handle(self, *args, **options):
t_start = datetime.now()
semesters = [Semester(s) for s in options['semesters']]
print(f"Inputted Semesters: {', '.join(s.name() for s in semesters)}")

for semester in semesters:
kwargs = {"semester": semester, "per_page": 100, "page": 1}
course_data = requests.get("https://api.umd.io/v1/courses", params=kwargs).json()

# if no courses were found during semester, skip.
if "error_code" in course_data[0].keys():
print(f"umd.io doesn't have data for {semester.name()}!")
continue

print(f"Working on courses for {semester.name()}...")

while course_data:
# for every course taught during `semester`...
for umdio_course in course_data:
course = self.courses.filter(name=umdio_course['course_id'].strip("\n\t\r ")).first()

# if we don't have the course, create it.
if not course:
course = Course(
name=umdio_course['course_id'].strip("\n\t\r "),
department=umdio_course['dept_id'].strip("\n\t\r "),
course_number=umdio_course['course_id'].strip("\n\t\r ")[4:],
title=umdio_course['name'].strip("\n\t\r "),
credits=umdio_course['credits'].strip("\n\t\r "),
description=umdio_course["description"].strip("\n\t\r ")
)

course.save()
self.total_num_new_courses += 1

print(course)
# collect all the professors that taught this course during `semester`
self._professors(course, semester)

kwargs["page"] += 1
course_data = requests.get("https://api.umd.io/v1/courses", params=kwargs).json()

print(f"\n** New Courses Created: {self.total_num_new_courses} **")
print(f"** New Professors Created: {self.total_num_new_professors} **")

runtime = datetime.now() - t_start
print(f"Runtime: {round(runtime.seconds / 60, 2)} minutes")

def _professors(self, course: Course, semester: Semester):
kwargs = {"course_id": course.name}
umdio_professors = requests.get("https://api.umd.io/v1/professors", params=kwargs).json()

# if no professors were found for `course`, exit function.
if isinstance(umdio_professors, dict) and 'error_code' in umdio_professors.keys():
return

# for every professor that taught `course`...
for umdio_professor in umdio_professors:
professor_name = umdio_professor['name'].strip("\n\t\r ")
if re.search("instructor:?\s*tba", professor_name.lower()):
continue

professor = self.non_rejected_professors.filter(name=professor_name)

# if there's only one matching professor, use that professor.
if professor.count() == 1:
professor = professor.first()
else:
alias = self.aliases.filter(alias=professor_name)

# if there's more than one matching professor but
# we have an alias that narrows the query down to one,
# use the professor associated with that alias.
if professor.count() > 1 and alias.count() == 1:
professor = alias.first()
Comment on lines +112 to +116
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope that alias.count() is always one. If we have two aliases for the same name, that's a problem. We should add a unique=True requirement to the alias' name field.

else:
# Otherwise, we either don't have this professor or we couldn't
# narrow down the query enough. So, create a new professor and
# attempt to automatically verify it following a process similar
# to that in admin.py.
professor = Professor(name=professor_name, type=Professor.Type.PROFESSOR)
similar_professors = Professor.find_similar(professor.name, 70)
split_name = professor.name.strip().split()
new_slug = split_name[-1].lower()
valid_slug = True

if self.verified_professors.filter(slug=new_slug).exists():
new_slug = f"{split_name[-1]}_{split_name[0]}".lower()
if self.verified_professors.filter(slug=new_slug).exists():
valid_slug = False

# if there are no similarly named professors and there's no
# issues with the auto generated slug, verify the professor.
if len(similar_professors) == 0 and valid_slug:
professor.slug = new_slug
professor.status = Professor.Status.VERIFIED

professor.save()
self.total_num_new_professors += 1

# for every course taught by `professor`...
for entry in umdio_professor['taught']:
# we only care about `course` taught during `semester`.
if entry['course_id'] == course.name and Semester(entry['semester']) == semester:
professorcourse = self.professor_courses.filter(
course=course,
professor=professor
)

# if only one professorcourse record and it doesn't
# have a recent semester, update that one record.
if professorcourse.count() == 1 and not professorcourse.first().recent_semester:
professorcourse.update(recent_semester=semester)

# if there's no professorcourse entries at all that match
# the prof/course combo or if there are matching records but
# none of them have recent semester = `semester`, create a new
# professor course entry.
elif professorcourse.count() == 0 or not professorcourse.filter(recent_semester=semester).exists():
ProfessorCourse.objects.create(course=course, professor=professor, recent_semester=semester)
break