Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink the "database"/data structure #26

Closed
johanneszab opened this issue Feb 3, 2019 · 2 comments
Closed

Rethink the "database"/data structure #26

johanneszab opened this issue Feb 3, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@johanneszab
Copy link
Contributor

Currently, for each blog there are several files saved in a folder called \Index that contain all the relevant blog information and settings used by TumblThree as well as the downloaders state.

Initially I don't wanted to have the overhead of a real database and for a simple application this seem sufficient. But maybe there is a more convenient way to handle this which also offers transactionality. Currently those files (as well as the settings) are quite seldom updated/saved (i.e. settings: Only if the application is shutting down; blog: if the download is finished or canceled). Maybe there is a way to improve the download state consistency in case of failures.

@elipriaulx elipriaulx added this to the Renew Stability milestone Feb 3, 2019
@elipriaulx
Copy link
Contributor

It would be great while considering this problem, to consider the format of other blogs and content sites that might be nice to cater for in the future, and also the possibility of abstracting a library to read a database out into a nuget package or something so that it can be more easily processed further. There was mention of generating a Tumblr like site offline of content in an issue somewhere; if the format was easy for others to take and act upon, then this sort of thing would become really easy.

@elipriaulx elipriaulx added the enhancement New feature or request label Mar 22, 2019
@thomas694
Copy link
Contributor

In 617b2d2 the database files of the currently crawled blogs are saved every two minutes to prevent loss of all the filenames of the files which have been downloaded in this session. After a crash these files would have been downloaded again, because their filenames are not in the database.

@thomas694 thomas694 removed this from the Renew Stability milestone Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants