Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

bltlab / mot Public

Notifications You must be signed in to change notification settings
Fork 4
Star 25

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: bltlab/mot

Releases · bltlab/mot

V1.10

28 Oct 18:13

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V1.10 Latest

Latest

Scrape up to Oct 1, 2024

Assets 55

amh_amharic_voanews.tgz

30.7 MB 2024-10-28T18:15:36Z
aze_amerikaninsesi.tgz

166 MB 2024-10-28T18:15:36Z
bam_voabambara.tgz

3.71 MB 2024-10-28T18:15:36Z
ben_voabangla.tgz

94.8 MB 2024-10-28T18:15:36Z
bod_voatibetan.tgz

41.3 MB 2024-10-28T18:15:36Z
bos_ba_voanews.tgz

147 MB 2024-10-28T18:15:37Z
cmn_voacantonese.tgz

296 MB 2024-10-28T18:15:38Z
cmn_voachinese.tgz

710 MB 2024-10-28T18:15:39Z
ell_gr_voanews.tgz

41.2 MB 2024-10-28T18:15:42Z
eng_editorials_voa.tgz

20.1 MB 2024-10-28T18:15:45Z
Source code (zip)

2024-10-28T14:24:27Z
Source code (tar.gz)

2024-10-28T14:24:27Z

All reactions

V1.9

09 Apr 03:11

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V1.9

Scrape up to April 1, 2024
Better filtering out of  and variants

Assets 55

Loading

All reactions

V1.8

20 Nov 19:21

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

V1.8

Added scraping from April 2023 to November 15 2023

Assets 55

Loading

All reactions

1.7

04 May 23:14

cpalenmichel

Compare

Choose a tag to compare

Loading

1.7

Additional data scraped from October 2022 to end of April 2023

Assets 54

Loading

All reactions

v1.6

11 Oct 22:40

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.6

Fix issue in French and Spanish sentence segmentation relating to candidate sites.
Add tokenization for remaining languages
bod is tokenized with botok https://github.com/OpenPecha/Botok/tree/docs
Remaining languages are tokenized with utoken https://github.com/uhermjakob/utoken

Assets 54

Loading

All reactions

v1.5

02 Sep 20:08

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.5

Added segmentation for remaining languages
Improvements to some of the existing segmentation models
Both cases of under-segmentation and over-segmentation were found and addressed

Assets 53

Loading

All reactions

v1.4

08 Jul 20:22

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.4

Updated scrape through July 1st, 2022
Fix missing yue documents
Change yue to cmn and voacambodia from khm to eng
Authors extraction from metadata improved
Paragraph splits extraction improved

Assets 53

Loading

All reactions

v1.3

16 Jun 18:24

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.3

Release 1.3 with updated scrapes through the end of May 2022.

Assets 53

Loading

All reactions

v1.2

12 May 15:32

cpalenmichel

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.2

Added segmentation for all languages except: ben, bod, kat, kur
Better publication date coverage
Remove zero-width space in segmentation and tokenization output for Thai, Lao, Khmer (zero-width space is kept in the original text in paragraphs
Release as described in camera-ready LREC 2022 paper

Assets 53

Loading

All reactions

v1.1

24 Mar 01:41

cpalenmichel

Compare

Choose a tag to compare

Loading

v1.1

Additional scraping from January 2022 to March 1, 2022.
Fix for Cantonese segmentation
Add segmentation for Portuguese and Urdu
Added source code

Assets 52

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.