-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Molecule search unable to find some CAS number. #76
Comments
When you enter stuff in the form we try to interpret it as a SMILES string, and if that fails we assume it's a name, requiring a database lookup, which we do using We get the SMILES string via You can find a bit more info about how it was resolved if you visit For example If you want to force it to use a specific resolver you can request it like this There is no algorithm to get from CAS numbers to species, so if it's not in the database that http://cactus.nci.nih.gov/ used, then there's not a lot we can do. You could try contacting them to ask where they got their CAS numbers and if they ever update, but I'm guessing the species you list are not "new discoveries", so being out of date is probably not the issue. If you have CAS numbers for your species then you must have gotten them from somewhere. If that place can give you the SMILES string, use that, because we can interpret any (valid) smiles without a database lookup. InChI would also work; it is still resolved via http://cactus.nci.nih.gov/chemical/structure/ but should be robust because it is algorithmic and doesn't require a database hit. Summary: use InChI or SMILES not CAS numbers whenever possible, because CAS numbers need a big database and are not unique (although I know a lot of NIST kinetics database uses CAS numbers...) |
You are right it is not new discoveries. All my molecules came from a thermo.dat file, I was very suprised to get almost a CAS number for every species. So you can imagine there is nothing else (inchi or smiles provided) to identify better the molecule. Thanks for the explanation of how the form works. I will try some mote test with cactus website if I can reproduce the second list behavior. I thought CAS # was the safest way to describe molecule but as you showed me here it is not as robust as algorithmic approach. I think I will put inchi or smiles in my thermo.dat files now. |
Closing stale issue. BTW, see #258 - there is a somewhat more robust API for accessing CAS numbers. |
Just as information, I don't know If we can really to something to fix it. Like this it will be a least documented.
It is just to point out some strange behavior in the molecule search tool.
Some CAS number are just not recognized but as suggested @connie it is because they are not listed in NIST and web tool is based on NIST database.
I found three different cases:
First list: (not working but does not exist on nist so it makes sense)
2143-69-3
1981-80-2
6067-68-1
15552-77-9
67152-18-5
108179-96-0
86181-68-2
687-97-4
2810-61-9
63707-54-0
309966-76-5
Most of them are radicals, I don't know if it is a reason why their representations are hard to find in litterature. I finally found most of them thanks to their name and "CAS #" in Burcat's database.
Second list: (not recognized at the beginning and then find it after drawing the molecule)
2143-58-0 (exists on NIST but no representation given)
436-51-4 (does not exit on NIST)
Third list:
53561-65-2 (exist on NIST and not found by the tool)
The second list behavior is really strange:
The second list is species unrecognized at the beginning, but after giving the adjacency list by hand, the tool displayed some informaiton on the molecule instead of an error (as it displayed for the first list). And I was very surprised to find my CAS number in those information. And of course after that the CAS number is now recognized. After finding this for the first molecuel, I tried to import 3 times a CAS number before trying to draw the molecule by hand. And I found the same behavior a second time with: (436-51-4).
The third list is just because were are not dynamically linked to nist database. So it is not really a problem neither.
The text was updated successfully, but these errors were encountered: