-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
51 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# sr-detection | ||
|
||
The goal of the study is to create a model that, by looking at the README file | ||
and meta-information, can identify GitHub ``sample repositories'' (SR), that | ||
mostly contain educational or demonstration materials supposed to be copied | ||
instead of reused as a dependency. | ||
|
||
**Motivation**. During the work on [CaM] project, we were required to | ||
[filter out repositories with samples][cam-issue]. No readily available | ||
technique or tool existed that could perform that function, so we conducted | ||
research on this very subject. | ||
|
||
The repository structured as follows: | ||
|
||
* [sr-data](/sr-data), module for collection, preprocessing and preparing the | ||
SR data. | ||
* [sr-train](/sr-train), module for training ML models. | ||
* [sr-detector](sr-detector), trained and reusable model for SR detection. | ||
* [sr-paper](/sr-paper), LaTeX source for a paper on SR detection. | ||
|
||
## How to contribute | ||
|
||
It's a Python project, so make sure that you have [Python 3.11+] on your | ||
system, fork this repository, make changes, send us a [pull request][guidelines]. | ||
We will review your changes and apply them to the `master` branch shortly, | ||
provided they don't violate our quality standards. To avoid frustration, before | ||
sending us your pull request please run full build: | ||
|
||
```bash | ||
poetry build | ||
``` | ||
|
||
[CaM]: https://github.com/yegor256/cam | ||
[cam-issue]: https://github.com/yegor256/cam/issues/227 | ||
[guidelines]: https://www.yegor256.com/2014/04/15/github-guidelines.html | ||
[Python 3.11+]: https://www.python.org/downloads/release/python-3110 |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# paper | ||
|
||
[![latexmk](https://github.com/h1alexbel/sr-detection/actions/workflows/latexmk.yml/badge.svg)](https://github.com/h1alexbel/sr-detection/actions/workflows/latexmk.yml) | ||
|
||
This paper reveals samples-filter models performance and how they were trained. | ||
|
||
Auto-generated PDF is here: [paper.pdf](https://github.com/h1alexbel/sr-detection/blob/gh-pages/paper.pdf) | ||
|
||
To build a paper, run this: | ||
|
||
```bash | ||
latexmk paper.tex -pdf | ||
``` | ||
|
||
Feel free to contribute, via pull requests. |
Empty file.