trying with .rst...

zeeguu · Jul 10, 2020 · ca71a78 · ca71a78
1 parent b445726
commit ca71a78
Show file tree

Hide file tree

Showing 3 changed files with 90 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,5 @@
-Statistics about word frequency in different languages based on a corpus of 
+
+Statistics about word frequencies in different languages based on a corpus of 
 movie subtitles as extracted by the Frequency Words (https://github.com/hermitdave/FrequencyWords) project.
 
 Currently supported languages: 

diff --git a/README.rst b/README.rst
@@ -0,0 +1,85 @@
+Statistics about word frequencies in different languages based on a
+corpus of movie subtitles as extracted by the `Frequency Words`_
+project.
+
+Currently supported languages:
+
+::
+
+   "da", "de", "el", "en", "es", "fr", "it", "nl", "no", "pl", "pt", "ro", "zh-CN"
+
+Usage Examples
+~~~~~~~~~~~~~~
+
+Getting the info about a given word
+'''''''''''''''''''''''''''''''''''
+
+::
+
+   >> from wordstats import Word
+   >> print (Word.stats('bleu', 'fr'))
+   bleu: (lang: fr, rank: 1521, freq: 9.42, imp: 9.42, diff: 0.03, klevel: 2)
+
+Comparing the difficulty of two German words
+''''''''''''''''''''''''''''''''''''''''''''
+
+::
+
+   >> from wordstats import Word
+   >> Word.stats('blauzungekrankenheit','de').difficulty > Word.stats('blau','de').difficulty
+   True
+
+Top 10 most used words in Dutch
+'''''''''''''''''''''''''''''''
+
+::
+
+   >> from wordstats import LanguageInfo
+   >> Dutch = LanguageInfo.load('nl')
+   >> print(Dutch.all_words()[:10])
+   ['ik', 'je', 'het', 'de', 'dat', 'is', 'een', 'niet', 'en', 'van']
+
+Words common across all the languages
+'''''''''''''''''''''''''''''''''''''
+
+Given that the corpus is based on subtitles, some common names have
+sliped in. The ``common_words()`` function returns a list.
+
+::
+
+   >> from wordstats.common_words import common_words
+   >> for each in common_words():
+   >>     if len(each) > 9:
+   >>         print(each)
+   washington
+   christopher
+   enterprise
+
+Words that are the same in Polish and Romanian
+''''''''''''''''''''''''''''''''''''''''''''''
+
+::
+
+   >> from wordstats import LanguageInfo
+   >> Polish = LanguageInfo.load("pl")
+   >> Romanian = LanguageInfo.load("ro")
+   >> for each in Polish.all_words():
+   >>     if each in Romanian.all_words():
+   >>         if len(each) > 5 and each not in common_words():
+   >>             print(each)
+   telefon
+   moment
+   prezent
+   interes
+   ...
+
+Installation
+~~~~~~~~~~~~
+
+::
+
+   pip install wordstats
+
+.
+
+.. _Frequency Words: https://github.com/hermitdave/FrequencyWords
diff --git a/setup.py b/setup.py
@@ -22,21 +22,21 @@ def package_files(directory):
 
 extra_files = package_files('wordstats/language_data/')
 
-with open('README.md') as f:
+with open('README.rst') as f:
     long_description = f.read()
 
 setuptools.setup(
     name="wordstats",
     packages=setuptools.find_packages(),
-    version="1.0.3",
+    version="1.0.4",
     license="MIT",
     description="Multilingual word frequency statistics for Python based on subtitles corpora",
     long_description=long_description,
     long_description_content_type='text/markdown',
     author="Mircea Lungu",
     author_email="me@mir.lu",
     url="https://github.com/zeeguu-ecosystem/Python-Wordstats",
-    download_url="https://github.com/zeeguu-ecosystem/Python-Wordstats/archive/v_1.0.3.tar.gz",
+    download_url="https://github.com/zeeguu-ecosystem/Python-Wordstats/archive/v_1.0.4.tar.gz",
     include_package_data=True,
     zip_safe=False,
     keywords="natural language processing, multilingual",