Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Course automation (Final submission): Essay bibliography check & summary #1117

Merged
merged 19 commits into from
Apr 29, 2021
Merged

Course automation (Final submission): Essay bibliography check & summary #1117

merged 19 commits into from
Apr 29, 2021

Conversation

amarhod
Copy link

@amarhod amarhod commented Apr 3, 2021

Course automation: Essay bibliography check & summary

Members

Amar Hodzic (amarh@kth.se)

Natan Teferi Asegehegn (ntas@kth.se)

Proposal #1008

We would like to create a Github Action that produces a PR comment with the number of references used as well as an excerpt from the reference list. This would simplify the TAs job in getting an overview without having to open the file and scroll down.
We are aware of another proposal that summarises essays but it has no mention of references.

What the Action should do:

  • When a PR uses a specific label (e.g. essay), create a comment on that PR
  • The comment should have a reference count
  • The comment should include the reference list
  • The reference list should be stripped down based on verbosity level set in the code (e.g. remove "accessed date", URL etc)

Criterias it should fulfill:

  • Done before April 6
  • The task produces a PR comment
  • The automation task is reusable (possibly in other courses that use Github)
  • The task runs on a standard platform (Github Action)
  • The code for the task is available and well documented

Final solution

The action was created with Docker and Python making it easy to integrate with a repository. The action is well documented, both with a good README as well as docstrings in the code. Pair-programming was used for the whole development.

We tested the action on all the PDF files from last years Essay category (attic/2020/contributions-2020/essay/) and 31 out of 34 PDF files were parsed and summarized correctly.
Out of the 4 that could not be parsed, one was a copy of a medium article without references, one only had URLs as bullet points in reference list, one did not have a reference list, and one used Harvard reference system. Meaning that the system in reality only failed with the Harvard system which could be questionable to use as a computer scientist and in this course.
A PR comment summary example can be found in our test PR.

We chose not to do multiple verbosity levels for stripping down references. The problem is that PDF is a document format designed to be printed, not to be parsed. Inside a PDF document, the text is not necessarily in a particular order. Meaning that it would be hard to make a generalisation for picking out parts of a reference when the parser does not detect the same attributes (e.g. URL) between references all the time. We felt that it was out of the scope given the effort we already had put in. However, this could be attempted in future works (i.e. as a task for next year).

The trigger we have chosen in the workflow is to run the job on PR that has the label essay. If the student commits an updated or new PDF in the same PR, the label needs to be removed and added again to trigger the action to run again. However, you may change the triggering as your prefer.

The public repository for the action can be found here.

@amarhod
Copy link
Author

amarhod commented Apr 13, 2021

Does it look good to you @SophieHYe?

@amarhod
Copy link
Author

amarhod commented Apr 28, 2021

The change to use a released version of the action instead of the main branch has been tested here.

@SophieHYe
Copy link

Nice work. I am now merging your PR. Thanks.

@SophieHYe SophieHYe merged commit 2f11212 into KTH:2021 Apr 29, 2021
@SophieHYe SophieHYe self-assigned this Apr 29, 2021
monperrus added a commit that referenced this pull request Sep 15, 2022
…ary (#1117)

* doc: Course automation proposal

* Add GitHub action that automatically counts task registrations (#918)

* Add tutorial proposal (#1015)

Co-authored-by: LaraRos <rostami.lara@gmail.com>

Co-authored-by: LaraRos <rostami.lara@gmail.com>

* Add essay readme (#1020)

* Executable Tutorial Proposal: Setting up a Jenkins CI/CD pipeline for deploying to Docker Hub (#1028)

* Executable tutorial proposal: integrate TeamCity with Docker (#1025)

Co-authored-by: César Soto Valero <cesarsotovalero@gmail.com>

* Essay: Comparison of Kubernetes and Nomad (#1023)

Co-authored-by: Dina Lerjevik <lerjevik@kth.se>

* Essay proposal: BDD in DevOps (#1032)

* Update README.md (#1039)

* doc: remove confusion about feedback on videos

* Update README.md

* feat: Added workflow file for bibliography summary action #1008

* doc: Final submission update #1008

* doc: Fixed typos

* fix: Changed workflow to use v1 release instead

Co-authored-by: Long Zhang <zhanglong3030@qq.com>
Co-authored-by: Markus Wesslén <markus.wesslen@gmail.com>
Co-authored-by: LaraRos <rostami.lara@gmail.com>
Co-authored-by: Justin Arieltan <agriad1@yahoo.com>
Co-authored-by: Christopher Gustafson <christopher.gustafson@outlook.com>
Co-authored-by: Chen, Zidi <51125655+Chen-Zidi@users.noreply.github.com>
Co-authored-by: César Soto Valero <cesarsotovalero@gmail.com>
Co-authored-by: dmariel <34478937+dmariel@users.noreply.github.com>
Co-authored-by: Dina Lerjevik <lerjevik@kth.se>
Co-authored-by: anorangesky <35503355+anorangesky@users.noreply.github.com>
Co-authored-by: heeenkie <35926672+heeenkie@users.noreply.github.com>
Co-authored-by: Martin Monperrus <martin.monperrus@gnieh.org>
Co-authored-by: Sophie H Ye <he_ye_90s@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants