-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database-side filterPrecursorMz #28
Comments
Hm, I thought that I'm internally converting the |
On regular MassBank with <100k entries (the SQL query going from one docker to the other, on the same system)
On my LipidBlast MassBank with ~600k entries, this would take 3 minutes. In my understanding, this uses the base My workaround caches the precursor m/z client-side. This still takes the same amount of time to fetch the precursors (~45sec for 100k MassBank, 3 min for 600k lipidblast, see below), but I have to do it once and not 3000 times (for every feature I search). The subsetting to select the appropriate candidates for every query is fast, of course.
However it would be much faster and IMO more elegant to just filter precursors database-side (and form the intersection of |
The "faster" part is at least my assumption. I still have to test it with a mockup. |
Without any annoying temp table magic on the R side, but an extra view on the MassBank side which can be cached (because no subselect) and so runs fast:
|
Honestly, I don't have a solution for this at present. I agree that filtering on the database side is faster, but you (or at least I as developer of
Still, as I said, I understand you completely. I'll have a thought on moving the Regarding the view, I would then prefer to change the original view that we added and do the cast directly there (reporting the precursor_mz as a casted value and the "original" value as precursor_mz_text) |
Hi,
|
This doesn't work as desired, unfortunately: the original view has a number of subselects and therefore doesn't cache, making it take 0.4 sec per view if I add the cast. The |
Oh how embarassing :) Thanks for reminding me 👍 |
Then feel free to implement that and make a PR once you're happy with it. And surely we can ask Rene to add this view. |
Working on this. The proposed method works well and very fast on 100k records; however for 1e6 records it takes almost 0.5s (mostly network time, since the SQL |
Implementation is done (https://github.com/meowcat/MsBackendMassbank), with a parameter for |
Thanks for the update. Looking at your code I was wondering if it would not be easier to simply store the So, I'm OK with the additional option to cache precursor m/z, but I would suggest to add them to the |
Hi,
|
I think memory will not be such an issue - at present I always change the backend from |
Hi, I have implemented the The current variant works with the pre-existing For <100k records
However in practical terms even the slow variants are still acceptable. More optimization can be considered:
I added a unit test to check that caching works. Note: if we convert to DOUBLE on the DB side, the test will fail, because R will convert |
Great (and sorry for the late reply)! Could you eventually make a pull request with the caching of precursor m/z in |
Yes, will do as soon as #29 is fixed so I can actually verify that my unit tests pass. |
I have a fix in. Just waiting that the GH actions run through. |
It's merged into the master branch |
Hi,
Since precursor values are only stored as text in the MassBank database, they cannot be used directly for queries. This makes
filterPrecursorMz
unacceptably slow with large databases in the current implementation.My current workaround is client-side:
Still this carries the overhead of transferring all precursor m/z values when loading the DB (I could improve this by using SQL rather than
spectraData
, which pulls them individually by spectrum ID).Ideas:
VIEW
:CREATE VIEW msms_spectrum_precursor AS SELECT *, CAST(precursor_mz_text AS FLOAT) AS precursor_mz FROM msms_spectrum
, which could then be used to query precursor values. Of course this does not help speed much, since the CAST is done on every call. (Either way this would have to be done on the MassBank side.)CREATE TEMPORARY TABLE
) with precursor m/z when accessing the DB.What do you think, do you have other suggestions?
The text was updated successfully, but these errors were encountered: