About Zero

Zero is a simple tool for creating a dataset based on a known corpus and desired keywords.<br> To successfully create a dataset, it is necessary to define the corpus, output file, label and keyword/s.

- Input corpus

A directory containing one or more .txt documents needs to be selected. Preferably, the document is utf-8 encoded. Also, to avoid memory problems, it is recommended that the selected directory contains more smaller documents than one large one.

- Output file

The output file must be .CSV format utf-8 encoded, comma delimited.

- Label

By defining the label, the class is defined, ie. affiliation of sentences containing the desired keyword.

- Keyword/s

For a keyword, it is possible to enter one or more words. Each word must be separated by a punctuation mark (preferably a comma)

Screenshots

To-Do

Duplicate keyword detection
Code adaptation for big data

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ignore		ignore
README.md		README.md
main.py		main.py
mainGUI.py		mainGUI.py
src_rc.py		src_rc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Zero

- Input corpus

- Output file

- Label

- Keyword/s

Screenshots

To-Do

About

Releases

Packages

Languages

user0706/Zero

Folders and files

Latest commit

History

Repository files navigation

About Zero

- Input corpus

- Output file

- Label

- Keyword/s

Screenshots

To-Do

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages