Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Corpus Build Times #9

Closed
Headline opened this issue Feb 26, 2020 · 5 comments
Closed

Long Corpus Build Times #9

Headline opened this issue Feb 26, 2020 · 5 comments

Comments

@Headline
Copy link

Hello,

Thanks for making this - it has worked very well in the year or so I've been using it.

This project of mine started as a small-scale tinkering project, but now the data consumed by the Markov chain has gotten huge.

2020-02-26T02:41:08.942Z [UtilBot] debug: MarkovStore#parseFile: Parsed N/A with 106836 lines. [Queue: 0]
2020-02-26T02:48:34.196Z [UtilBot] info: MarkovStore#buildCorpus: Markov chain built for 335290997317697536 with 106836 lines

The parsing of this data took 7 minutes, and I'm curious of ways to improve the amount of time it takes to generate the markov chain.

Thanks!

@scambier
Copy link
Owner

Hello!

First, I'm glad to know that you found my module useful, thank you.

The buildCorpus() method is certainly far from being optimized. I could try to optimize it, but if I can't reduce the time complexity - and I don't know if I can -, you won't notice any significant improvement.

Instead, what about utility methods to import/export the built corpus? I don't know how often you're calling buildCorpus(), or if your corpus changes often, but you could build it once for the day, save the built result, and re-use it later?

@Headline
Copy link
Author

Instead, what about utility methods to import/export the built corpus?

Ah; I'm sure this is almost exactly what I'm looking for. Basically I only build the corpus once, and then add to it as more chat messages come in. Being able to export a corpus after every modification would dramatically help reduce load times.

@scambier
Copy link
Owner

Hey @Headline, sorry for the delay but I didn't have much time to dedicate to this project lately.

However, I published a version 3.0.0-beta.1 with a few changes. I've also added undocumented (at the moment) methods to .export() and .import(data)

I hope it will help you before I can finalize this properly.

@scambier
Copy link
Owner

This feature is now documented, tested and published (along other breaking changes) under 3.0.0

@Headline
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants