Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - density plot of class times #57

Open
fractalbach opened this issue Jun 8, 2018 · 11 comments
Open

Feature request - density plot of class times #57

fractalbach opened this issue Jun 8, 2018 · 11 comments
Labels
enhancement New feature or request future

Comments

@fractalbach
Copy link

fractalbach commented Jun 8, 2018

Would be useful to visualize /quantify when most classes take place, and when most people are free.

A way to filter the most common classes as well.

Would be useful for planning the best times for events/workshops/etc. Would be able to see when ppl are most likely available without having to do surveys .

Graphs

Going to update this with more ideas while more ideas are thought of. Can be used later when making graphs.

Data Sets

  • S_all : set of all classes
  • S_common : set of "common" classes, ie. those required for major and ge

X-axis

  • X_axis = minute 0 -> minute max (maybe 1 week? 5-days?)
  • x = t = iterator of 5 minute increments

Y-axis

  • Y_classes_in_session : number of classes in set S that are 'in session' at time t
  • Y_students_in_session : sum of (number of students currently enrolled in (each class that is currently in session))
let x_axis be an iterator where t_next = t + 5
let class[start] be the start time of the class in minutes since t = 0
let class[end] be the end time of the class in minutes since t = 0 

for each value t in x_axis : 
    classes_in_session[t] = {class ∈ (Set of all classes) such that (class[start] < t < class[end])}
    y[t] 🡐 0
    for each class in classes_in_session[t]:
        y[t] 🡐 y[t] + class[students_enrolled]

Note that this does not account for students who didn't show up to class ;)

programs

There could be an Updater, which fetches the data from the API and saves it into a Json file with filename based on date/time, into a folder called cache

There could be another program called data pre proccessor, which does any intermediate calculations. It outputs a new file with the data in such a way that it can be used directly by the grapher.

File_renamer could be a program that generates a list of all the cache and data filenames, which could be used by the HTML file to determine where all the data is. This might be helpful because filenames would be based on date/times, and it will be hard to predict what the next one is called. (Alternatively, make the filenames easily predictable, like data1, data2, data3 ... , and save the timestamp in the content)

The graph.html can then simply load the prepared data, and display it.

how to use

One possibility is to just have the updater call the data proccessor directly.

./program <cache directory> <processed data directory> 

Externally, you would just call the program and give the directory names where you want to store the output files (the filenames will be automatically generated).

@phi-line
Copy link
Collaborator

phi-line commented Jun 8, 2018

This is a really good idea! I would love to see a plot of this as well.

You are welcome to make an app that uses the API to generate such a plot. Something like this extends the scope of the API itself but you can use the batch endpoint to get all courses and generate a plot based on the the time range.

@fractalbach
Copy link
Author

ooo could even use the student count field as well. Which would mean you could also add another plot for the y-axis: number of students in class, or a % of students in class

@fractalbach
Copy link
Author

fractalbach commented Jun 8, 2018

https://plot.ly/javascript/bar-charts/

Could use stacked bar graph to show different classes
https://plot.ly/javascript/bar-charts/#stacked-bar-chart

@fractalbach
Copy link
Author

Would also be interesting to see the change in seats available over time while sign-ups are open

@fractalbach
Copy link
Author

fractalbach commented Jun 8, 2018

These values: Section Capacity and Section Actual, would definitely be needed, is there currently an easy way to get these?
capture

You can find it by following:

  • registration tools
  • Searchable Schedule of Classes
  • (pick a term, like summer 2018)
  • (pick subject, like cs)
  • View Sections (on any of the classes)

Alternatively, see what you can find at https://banssb.fhda.edu/
....
😄like this 😉

@phi-line
Copy link
Collaborator

phi-line commented Jun 8, 2018

Im working hard to get those three values, right now the API only lists rem. The advanced data holds all 6 of these fields. I'm currently working on an issue to get them.

To answer your question it is hard to get. It's an authenticated request that needs to be sent over to MyPortal. In order to do that I need to spoof a login and then scrape cookies to process the request for the data

@fractalbach
Copy link
Author

fractalbach commented Jun 8, 2018

Follow link above with emojis

I was able to reach it on my phone (never have logged in) while in incognito mode.

@phi-line
Copy link
Collaborator

phi-line commented Jun 8, 2018

Hmm maybe I should be going through https://banssb.fhda.edu/ instead. I'll dig around - thanks for the tip :)

@phi-line
Copy link
Collaborator

This kind of density plot is known as a Kernel Density Estimator. They are a powerful alternative to a histogram since they don not rely on bins. Sci-kit learn docs give this example image:
image

A major problem with histograms, however, is that the choice of binning can have a disproportionate effect on the resulting visualization. Consider the upper-right panel of the above figure. It shows a histogram over the same data, with the bins shifted right. The results of the two visualizations look entirely different, and might lead to different interpretations of the data.

Intuitively, one can also think of a histogram as a stack of blocks, one block per point. By stacking the blocks in the appropriate grid space, we recover the histogram. But what if, instead of stacking the blocks on a regular grid, we center each block on the point it represents, and sum the total height at each location? This idea leads to the lower-left visualization. It is perhaps not as clean as a histogram, but the fact that the data drive the block locations mean that it is a much better representation of the underlying data.

This visualization is an example of a kernel density estimation, in this case with a top-hat kernel (i.e. a square block at each point). We can recover a smoother distribution by using a smoother kernel. The bottom-right plot shows a Gaussian kernel density estimate, in which each point contributes a Gaussian curve to the total. The result is a smooth density estimate which is derived from the data, and functions as a powerful non-parametric model of the distribution of points.

In my experiment in music analysis - Oolong, I used a KDE to create a density plot of a combined dataset of 'feature' scratterplots. The resulting model looks like this with the high density of features shown in yellow and the least density of features in blue.

mss-house_nearest

Honestly, this method is way overkill for the use-case you described but I thought you would like to know more about density estimations :)

@fractalbach
Copy link
Author

fractalbach commented Jun 10, 2018

Yes, very pretty graphs.

In the case of students in classes we actually aren't taking a random sample because we have exact information.

However, if we are looking at the points over time, and we only have partial information (which we probably will), then some smoothing might be useful when we want to find "estimated % of students not in class at 3pm next Tuesday" .

I think a bar graph would probably be the right thing to use, since density graph implies we are taking a function of a random variable.

(An example of random variable would be number of students who actually showed up to class, and a random sample would be if we go around and record how many students are in the class, and compare to the number enrolled)

....

Although... If we sum all of the students together, and just say plot number of students who could be in class over time divided into intervals... Then arguably it would be a histogram, and doing the smoothing would make sense.

We can try both :D

@fractalbach
Copy link
Author

@phi-line phi-line added enhancement New feature or request future labels Jun 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request future
Projects
None yet
Development

No branches or pull requests

2 participants