-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #56
Comments
@robstewart57 what do you think about this proposal? I would like to keep only Megaparsec for the parsing, as it is now very fast and robust. |
@wismill the idea of #36 was to generalise rdf4h across numerous parsers, specificlaly attoparsec and parsec. E.g.
This functionality comes from this PR: #36 The motivation of this flexibility between parsers is that each of they have trade offs. E.g. "Megaparsec vs Attoparsec" : https://github.com/mrkkrp/megaparsec#megaparsec-vs-attoparsec
This is a realistic assumption when working with RDF data, which might be 10s/100s MegaBytes or millions of triples. There's megaparsec instances in parsers-megaparsec that we could use: https://hackage.haskell.org/package/parsers-megaparsec The issue arises when attoparsec, parsec and megaparsec instances of the typeclasses in the parsers have different parsing semantics (which presumably shouldn't happen), meaning the rdf4h tests against w2c-tests might pass for one instance of the parsers typeclasses, e.g. parsec, but fail for others, megaparsec. |
Ok, but at least we could drop I think we should also provide a way to stream parsing results. As there are several framework for this, I wonder if we should provide new packages such as: |
I would also really like to see this! |
Now that
rdf4h
has a complete support for NTriples & Turtle, it may be a good time to focus on performance:As mentioned in #35 and #44, there are several places where we could improve the parsers. I think it would be a good idea to keep only one modern parser library (
attoparsec
ormegaparsec
) to keep the implementation simple and make it more efficient.I think the handling of prefixes in
UNode
is not satisfying. For instance, several important operations requireexpandTriples
which is very expensive. I propose that we removeexpandTriples
and make use of a smart constructorunode :: Text -> Either IRIError UNode
for IRI (currently merely a constructor synonym) that ensure the IRI is a valid absolute IRI. Then have a functionmkIRI
that accept a namespace (or a prefix mapping using a new type class) to create IRIs, from a relative IRI or a prefixed IRI (seeexpandURI
).Edit: change the proposed signature of
unode
to use Either rather than Maybe.The text was updated successfully, but these errors were encountered: