Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VFS 2.0 #90

Merged
merged 13 commits into from
May 3, 2022
Merged

Add VFS 2.0 #90

merged 13 commits into from
May 3, 2022

Conversation

westito
Copy link
Contributor

@westito westito commented Apr 23, 2022

I created a new, improved files system. First I tried to fix the original, but after I while I realised creating a separate one will be better and more backward compatible. However, I added a path validator with exception to temp files.

When I first try WASM with IndexDBFileSystem, the data wasn't saved. There was no error, and I have no clue what's going on. After hours of investigation I found that the persistanceRoot is not prepended to path! I used persistanceRoot other than '/'. I thought provide 'test_database' in WasmDatabase(path:) will open database as /db/test_database. The second problem is IndexDBFileSystem not checks path validity. Stores file in memory, but writing to IndexDB silently fails. This applies for temp files too, without workaround. Temp file path always starts with /tmp. With custom persistanceRoot temp files will never be saved to IndexedDB.

Copy link
Owner

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff, thanks for your contributions 👍

I'm wondering if we have to introduce a separate class for the v2 API, do you think there's a way to add these methods to the existing class?

sqlite3/lib/src/wasm/file_system.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/file_system.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/file_system.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/file_system.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/file_system_v2.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/js_interop.dart Outdated Show resolved Hide resolved
sqlite3/lib/src/wasm/js_interop.dart Outdated Show resolved Hide resolved
@westito
Copy link
Contributor Author

westito commented Apr 23, 2022

I made some changes to JS interop to make it simpler. But I can't skip mapping, because promiseToFuture can't handle List<> type argument, so promiseToFuture<List<DatabaseName>>() not working.
What I found:
dart-lang/sdk#39627 (comment)

@westito
Copy link
Contributor Author

westito commented Apr 24, 2022

I tried to add a legacy option to VFS2 and leave load() as deprecated function. But beyond persistanceRoot prepending, VFS2 use path normalisation. A lot of if (legacy) statements will make the code trashy. I think better replace the original implementation without backward compatible. Or create a separate V2 class.

@westito
Copy link
Contributor Author

westito commented Apr 24, 2022

Can we enable WAL mode in SQLite? We could achieve a better performance on this VFS I think

@westito
Copy link
Contributor Author

westito commented Apr 24, 2022

How SQLite handles data corruption? When bytes write with VFS, IndexedDB may can't persist because works async. SQLite think it is written, database will be corrupted next load. Am I right? Can we handle this, or simply warn users do not rely on data saved to SQLite? I am using SQLite for cache only, so corruption not a problem. App re-downloads database in this case.

@simolus3
Copy link
Owner

Can we enable WAL mode in SQLite?

My idea was to keep the sqlite3 build small which is why I disabled it. I can enable it if it turns out to improve performance.

How SQLite handles data corruption?

I don't think we can really handle this: Even though IndexedDB provides ACID guarantees, the VFS API from sqlite requires us to be synchronous and IndexedDB is async, so we can't rely on that. The assumption is that writes will probably go through because they're fast, but as you said this isn't a safe assumption to make.

If you need synchronization I'd probably recommend using shared workers.

We could call asynchronous APIs by running them in a web worker and using atomics to have the main script wait for them synchronously. That sounds complicated though.

@westito
Copy link
Contributor Author

westito commented Apr 24, 2022

That's awesome idea. With web compiler, FVS can be compiled to a single js file and loaded as shared worker. This way, it isn't necessary to compile to js and load the whole sqlite (or Drift) as shared worker.

@westito
Copy link
Contributor Author

westito commented Apr 25, 2022

I'm working on a block based storage. Works like a charm. I still have to clean the code and write tests

Képernyőfotó 2022-04-25 - 13 31 59

@simolus3
Copy link
Owner

No pressure, but how's the block storage coming along? I feel Iike that approach would scale much better for large databases, so please let me know if there's anything I can do to help!

@westito
Copy link
Contributor Author

westito commented Apr 30, 2022

Sorry, I didn't had much time to finish in this week. Tomorrow I'll try to.

@simolus3
Copy link
Owner

No worries, thanks for the update!

@westito
Copy link
Contributor Author

westito commented May 2, 2022

Finally done. Works with current tests and in real application. Need some edge-cased tests, update comments. Journal is written very often. Should give it a try to use WAL mode. I'm thinking on a sync version with shared worker as I mentioned before.

@simolus3
Copy link
Owner

simolus3 commented May 2, 2022

Nice work! I'm taking a look at it now. Can you help me understand why we're storing each file in a separate object store? I'm trying to refactor it so that we'd have one object store with file names + attributes and one with blocks, with each block referencing a file id and an offset. Do you see any problem with that right away? Maybe I'm missing something.

I'm thinking on a sync version with shared worker as I mentioned before.

This would be awesome! Let me know if I can help with that, I've also thought about adding a worker-backed file system to this package. But either way let's do that in a separate PR please :D

@westito
Copy link
Contributor Author

westito commented May 2, 2022

Storing files in separate objectStores seems much simpler. Writing or inserting blocks can be faster in dedicated objects (much less key lookup and index rebuild) and modifying metadata on each write can doubles the write time. Because of async operation, this doesn’t seem like a big deal, but slower write speed is more likely to case database corruption. Write speed will also be important in the shared worker version because of synchronous writing. I think using single object store + metadata makes things more complicated without adding benefits. Creating new file could be faster, but it's not a point here I think. But this is your library, its up to you how things going :)

@westito
Copy link
Contributor Author

westito commented May 2, 2022

Can you compile a new sqlite.wasm? tried on Mac without success. I copied and committed the latest wasm to sqlite3/example/web/sqlite3.wasm for tests in this PR. I think this should leave there and updated when sqlite3 source changes. I can't run this PR now because develop merged and my wasm is old.

@simolus3
Copy link
Owner

simolus3 commented May 2, 2022

Of course, you can grab the latest sqlite3.wasm from https://storage.googleapis.com/simon-public-euw3/assets/sqlite3.wasm (the same file is used for tests with GitHub actions).

@simolus3 simolus3 mentioned this pull request May 3, 2022
@simolus3
Copy link
Owner

simolus3 commented May 3, 2022

I've pushed some changes to only use two object stores (one for file names and one for blobs). I've also moved the block storage into the IndexedDB file system, IMO there's no big advantage of also having that in the in-memory file system. I've created the AsynchronousIndexedDbFileSystem class which is essentially an asynchronous version of the file system interface. The IndexedDbFileSystem now wraps an async FS. I hope that this distinction will make it easier to write a say WorkerBasedFileSystem in the future where a worker is using an async FS and uses atomics to synchronize calls.

I've removed some path normalization logic you added because I didn't fully understand it. I think that sqlite3 will call our path_normalize function before using the file system, so it shouldn't be necessary? Is this to make API misuse harder? Might be reasonable to have it, but then we should use it consistently across all file system implementations.

@simolus3
Copy link
Owner

simolus3 commented May 3, 2022

I'll merge this and get this released tomorrow. Thanks for your continued great work on this package 🚀

@simolus3 simolus3 merged commit f624b18 into simolus3:main May 3, 2022
@westito
Copy link
Contributor Author

westito commented May 4, 2022

Thank you for improve my code. Yes, it's unnecessary having blocks in-memory. It was easier for me to test if I modified the original code as little as possible. I see you wrote your own cursor handler. I did not use the original because it discarded the transaction. The original implementation is very buggy. I don't know if it's worth dealing with and contributing to DartSDK.
I had a more general VFS in mind, so this path normalization idea came up.

@simolus3
Copy link
Owner

simolus3 commented May 4, 2022

The original implementation is very buggy.

Yeah I feel like exposing IndexedDB cursors as streams was an unfortunate design choice as pauses or not immediately reacting to new rows will close the transaction. So it's very easy to cause deadlocks with that API.

I had a more general VFS in mind, so this path normalization idea came up.

I think I want the VFS API to mainly be used by sqlite3 internally at the moment, and for that it's preferable to avoid path normalization logic for every write. It's not exactly user friendly, I'll see if it makes sense to polish that API with path normalization and expose it more prominently.

@westito
Copy link
Contributor Author

westito commented May 4, 2022

There is an error when I try to run in my app. I can't understand exactly the problem. The getKey returns with this exception. Not immediately, after some inserts. Maybe when transaction commits.

Képernyőfotó 2022-05-04 - 10 33 16

Képernyőfotó 2022-05-04 - 10 33 08

Képernyőfotó 2022-05-04 - 10 32 57

Képernyőfotó 2022-05-04 - 10 27 27

@simolus3
Copy link
Owner

simolus3 commented May 4, 2022

Yeah I just spent some time debugging this too :D I think it's because we don't reset cached file ids after a file is deleted, and the journal file gets deleted and re-created a lot. I've fixed this locally, but I need to write some tests to ensure this is working as intended.

@westito
Copy link
Contributor Author

westito commented May 4, 2022

I can send you a sample project. It is working when Drift loaded as worker. But run in main application this exception comes up. It is working with the previous non-block VFS.

@simolus3
Copy link
Owner

simolus3 commented May 4, 2022

It is working when Drift loaded as worker.

I think you're just not seeing the error messages in the main log then, for me a similar error showed up when inspecting the worker. Also, workers are completely broken because instantiating a KeyRange will internally check the window to see which property to use. This doesn't work in a worker of course, so I had to use package:js to create KeyRanges...

I believe I have fixed this in 94d9652, can you try that state in your project and see if that fixes the errors for you as well?

@westito
Copy link
Contributor Author

westito commented May 5, 2022

Yeah! It's working like a charm now :)

@westito westito deleted the vfs2 branch May 19, 2022 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants