-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsers based parsers #36
Parsers based parsers #36
Conversation
This is great, thank you! There are additional 6 tests that fail as a result of these commits, so I've pushed them to a
I see that there's an attoparsec instance for the |
Oops, that must've been problem with one of the more recent commits, I'll see if I can track it down. |
I can't seem to reproduce these test failures, I assume running (I made a small change to the tests suite code to print each triple on a new line to help dinging differences) |
Hmm, that's a strange output. I'm surprised to see:
That is, the W3C tests aren't being run. The data files are in the $ git submodule update --init --recursive
$ git submodule foreach git pull origin gh-pages
$ stack test --test-arguments="--pattern parser-w3c-tests" I get:
|
I've fixed those 6 test failures: |
The NTriples and Turtle parsers have been migrated to the parsers library. A key advantage is that it opens up the parsec and attoparsec instances "for free". This migration was added in: robstewart57#36 Here are results from running: $ stack bench --benchmark-arguments 'rdf4h/parsers' Benchmark rdf4h-bench: RUNNING... benchmarking rdf4h/parsers/ntriples-parsec time 323.8 ms (317.8 ms .. 331.3 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 317.3 ms (313.9 ms .. 319.8 ms) std dev 3.256 ms (3.257 μs .. 3.947 ms) variance introduced by outliers: 16% (moderately inflated) benchmarking rdf4h/parsers/ntriples-attoparsec time 243.0 ms (233.0 ms .. 250.2 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 242.8 ms (239.4 ms .. 245.9 ms) std dev 3.566 ms (2.099 ms .. 5.014 ms) variance introduced by outliers: 16% (moderately inflated) benchmarking rdf4h/parsers/turtle-parsec time 349.1 ms (329.8 ms .. 367.2 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 345.9 ms (342.6 ms .. 348.0 ms) std dev 3.203 ms (0.0 s .. 3.692 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking rdf4h/parsers/turtle-attoparsec time 310.0 ms (303.6 ms .. 317.2 ms) 1.000 R² (0.998 R² .. 1.000 R²) mean 308.2 ms (306.1 ms .. 311.1 ms) std dev 3.019 ms (749.2 μs .. 3.740 ms) variance introduced by outliers: 16% (moderately inflated)
I've added the parsec and attoparsec instances for Turtle and NTriples parsers to the criterion benchmarking suite. The results are interesting when using the attoparsec instance instead of parsec:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review summary: I'm an idiot =)
(do rdfContent <- T.pack <$> readFile "bills.099.actions.rdf" | ||
fawltyContentTurtle <- T.pack <$> readFile "data/ttl/fawlty1.ttl" | ||
fawltyContentNTriples <- T.pack <$> readFile "data/nt/all-fawlty-towers.nt" | ||
let (Right rdf1) = | ||
parseString (XmlParser Nothing Nothing) rdfContent | ||
let (Right rdf2) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not just replace all uses of rdf2 with rdf1? they are guaranteed to be identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, they're different types! never mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have type sigs here instead of at their use site.
That's great news, I'm glad you managed to fix those failures. Have you confirmed that the attoparsec implementation passes all the tests? this isn't guaranteed since parsec and attoparsec may implement different semantics (though your liberal use of try should avoid that problem - it might be worth trying its usage if possible). Also, would you be able to add the instructions for running the tests to the README? I was running them by downloading the W3C tests from the W3C directly and putting them in the ref-tests directory, which seemed to work, but was apparently wrong. |
Good point. I've separated out the tests for Turtle and NTriples parsing, to separately report parsec and attoparsec results. Here they are: NTriples
$ stack test --test-arguments="--pattern parser-w3c-tests-ntriples-parsec"
...
All 68 tests passed (1.41s)
$ stack test --test-arguments="--pattern parser-w3c-tests-ntriples-attoparsec"
...
27 out of 68 tests failed (1.56s) Turtle
$ stack test --test-arguments="--pattern parser-w3c-tests-turtle-parsec"
...
20 out of 291 tests failed (1.48s)
$ stack test --test-arguments="--pattern parser-w3c-tests-turtle-attoparsec"
...
110 out of 291 tests failed (1.61s) SummarySo, indeed their semantics are different. I haven't looked into the failure reports to see why. I've exposed I've merged all of your commits, and my recent commits, into the |
Done. |
I mentioned on reddit that there might be some advantages to using Ed Kmett's parsers package. This PR replaces all explicit uses of Parsec with the type class CharParsing, and then instantiated with Parsec (and ParsecT over State because ParsecT's MonadState instance is practically useless...).
The advantage of making this change is that it will make changing parser libraries easy in the future - if you want the nice error messages trifecta offers, only the calls to runParser need to be replaced (with some other fairly trivial changes, like changing the error type returned from Parsec's).
I've also put some effort into making the code more idiomatic - avoid monadic functions when functor and applicative will do, make some things shorter, and generally follow most of the suggestions from hlint.
There are still a few places where the code can be made shorter but I think this does a good job improving readability.