Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge & Notes #2

Open
WGierke opened this issue Oct 28, 2016 · 0 comments
Open

Challenge & Notes #2

WGierke opened this issue Oct 28, 2016 · 0 comments

Comments

@WGierke
Copy link
Owner

WGierke commented Oct 28, 2016

Challenge

  • classify/label repos automatically
  • analyze relevant features
  • document design thoughts and training approach

Documentation Structure

  1. Data Exploration and Prediction Model
  • analyze and document relevant features
  • document how to avoid overfitting
  • explain why we've decided to use the features
  • explain how we've developed the prediction model
  1. Automated Classification
  • implement the app that takes the input format and creates the output format
  • either 1) prompt for the training data to use or 2) directly include the learned model
  1. Validation
  • validate with Appendix B
  • create a boolean matrix with our estimated label and the predicted one
  • compute recall per category
  • compute precision per category
  • dicuss quality of results and whether higher yield or higher precision is more important
  1. Extension
  • use the model for a nice app
  1. Furthermore
  • document 3 repos where we think our model will yield better results
  • install and user manual
  • document decisions we made for features, algorithms, data structures, software development tools and practices

Notes

Examples for DATA-Repositories
openaddresses / openaddresses
unitedstates / congress-legislators
OpenExoplanetCatalogue / open_exoplanet_catalogue
Chicago / food-inspections-evaluation
GSA / data
cernopendata / opendata.cern.ch
benbalter / congressional-districts

Extension

"Improve yourself"

  • Login with Github
    -> Stats of your own repos e.g. 30% Data, 70% Software
    -> Stats of repos your friends recently starred
    |-Data-| Software | Homework | ...|
    -> Stats of trending repos
    |-Data-| Software | Homework | ...|recently

Sources:

@WGierke WGierke changed the title Notes Challenge & Notes Oct 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant