Parsers based parsers #36

axman6 · 2016-11-18T01:07:51Z

I mentioned on reddit that there might be some advantages to using Ed Kmett's parsers package. This PR replaces all explicit uses of Parsec with the type class CharParsing, and then instantiated with Parsec (and ParsecT over State because ParsecT's MonadState instance is practically useless...).

The advantage of making this change is that it will make changing parser libraries easy in the future - if you want the nice error messages trifecta offers, only the calls to runParser need to be replaced (with some other fairly trivial changes, like changing the error type returned from Parsec's).

I've also put some effort into making the code more idiomatic - avoid monadic functions when functor and applicative will do, make some things shorter, and generally follow most of the suggestions from hlint.

There are still a few places where the code can be made shorter but I think this does a good job improving readability.

…y etc.

robstewart57 · 2016-11-19T11:56:12Z

This is great, thank you!

There are additional 6 tests that fail as a result of these commits, so I've pushed them to a parsers-based-parsers branch. If you'd like to submit more PRs, please point them to that branch until those 6 tests pass. These are the newly failing tests:

  SPARQL_style_prefix:                                                FAIL
    Exception: ParseFailure "(line 1, column 7):\nunexpected \" \"\nexpecting subject resource"
    CallStack (from HasCallStack):
      error, called at src/Data/RDF/Query.hs:93:19 in rdf4h-3.0.1-CjcbQra4TThLE7wwhzXueu:Data.RDF.Query

  SPARQL_style_base:                                                  FAIL
    Exception: ParseFailure "(line 1, column 5):\nunexpected \" \"\nexpecting subject resource"
    CallStack (from HasCallStack):
      error, called at src/Data/RDF/Query.hs:93:19 in rdf4h-3.0.1-CjcbQra4TThLE7wwhzXueu:Data.RDF.Query

  turtle-syntax-base-02:                                              FAIL
    Exception: HUnitFailure (Just (Location {locationFile = "testsuite/tests/W3C/W3CAssertions.hs", locationLine = 39, locationColumn = 3})) "unable to parse, reason:\nLeft (ParseFailure \"(line 1, column 5):\\nunexpected \\\" \\\"\\nexpecting subject resource\")

  turtle-syntax-base-04:                                              FAIL
    Exception: HUnitFailure (Just (Location {locationFile = "testsuite/tests/W3C/W3CAssertions.hs", locationLine = 39, locationColumn = 3})) "unable to parse, reason:\nLeft (ParseFailure \"(line 1, column 5):\\nunexpected \\\" \\\"\\nexpecting subject resource\")"

  turtle-syntax-prefix-02:                                            FAIL
    Exception: HUnitFailure (Just (Location {locationFile = "testsuite/tests/W3C/W3CAssertions.hs", locationLine = 39, locationColumn = 3})) "unable to parse, reason:\nLeft (ParseFailure \"(line 1, column 7):\\nunexpected \\\" \\\"\\nexpecting subject resource\")"

  turtle-syntax-prefix-03:                                            FAIL
    Exception: HUnitFailure (Just (Location {locationFile = "testsuite/tests/W3C/W3CAssertions.hs", locationLine = 39, locationColumn = 3})) "unable to parse, reason:\nLeft (ParseFailure \"(line 1, column 7):\\nunexpected \\\" \\\"\\nexpecting subject resource\")"

I see that there's an attoparsec instance for the Parsing type class. I'll set up parsing W3C tests and also parsing benchmarks for parsec and attoparsec instances for both the Turtle and NTriples parsers (on this parsers-bases-parsers branch).

axman6 · 2016-11-20T02:25:07Z

Oops, that must've been problem with one of the more recent commits, I'll see if I can track it down.

axman6 · 2016-11-20T03:14:41Z

I can't seem to reproduce these test failures, I assume running stack test should be enough to run the whole test suite? I've downloaded the W3C test suites that need to be needed, and looking at the output, there's no difference between master and my branch (baring differences in test timing).
master-test.txt
parsers-test.txt

(I made a small change to the tests suite code to print each triple on a new line to help dinging differences)

robstewart57 · 2016-11-20T14:35:52Z

Hmm, that's a strange output. I'm surprised to see:

parser-w3c-tests
test-rdf4h: query empty: subject mf:node & predicate mf:name in:

That is, the W3C tests aren't being run. The data files are in the gh-pages branch (I don't know why). Could you run:

$ git submodule update --init --recursive
$ git submodule foreach git pull origin gh-pages
$ stack test --test-arguments="--pattern parser-w3c-tests"

I get:

70 out of 521 tests failed (1.41s)

Fixes 383e897

robstewart57 · 2016-11-20T15:27:44Z

I've fixed those 6 test failures:

146db62

The NTriples and Turtle parsers have been migrated to the parsers library. A key advantage is that it opens up the parsec and attoparsec instances "for free". This migration was added in: robstewart57#36 Here are results from running: $ stack bench --benchmark-arguments 'rdf4h/parsers' Benchmark rdf4h-bench: RUNNING... benchmarking rdf4h/parsers/ntriples-parsec time 323.8 ms (317.8 ms .. 331.3 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 317.3 ms (313.9 ms .. 319.8 ms) std dev 3.256 ms (3.257 μs .. 3.947 ms) variance introduced by outliers: 16% (moderately inflated) benchmarking rdf4h/parsers/ntriples-attoparsec time 243.0 ms (233.0 ms .. 250.2 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 242.8 ms (239.4 ms .. 245.9 ms) std dev 3.566 ms (2.099 ms .. 5.014 ms) variance introduced by outliers: 16% (moderately inflated) benchmarking rdf4h/parsers/turtle-parsec time 349.1 ms (329.8 ms .. 367.2 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 345.9 ms (342.6 ms .. 348.0 ms) std dev 3.203 ms (0.0 s .. 3.692 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking rdf4h/parsers/turtle-attoparsec time 310.0 ms (303.6 ms .. 317.2 ms) 1.000 R² (0.998 R² .. 1.000 R²) mean 308.2 ms (306.1 ms .. 311.1 ms) std dev 3.019 ms (749.2 μs .. 3.740 ms) variance introduced by outliers: 16% (moderately inflated)

robstewart57 · 2016-11-20T16:13:36Z

I've added the parsec and attoparsec instances for Turtle and NTriples parsers to the criterion benchmarking suite.

7416501

The results are interesting when using the attoparsec instance instead of parsec:

25% speedup for NTriples parsing
11% speedup for Turtle parsing

$ stack bench --benchmark-arguments 'rdf4h/parsers'

Benchmark rdf4h-bench: RUNNING...
benchmarking rdf4h/parsers/ntriples-parsec
time                 323.8 ms   (317.8 ms .. 331.3 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 317.3 ms   (313.9 ms .. 319.8 ms)
std dev              3.256 ms   (3.257 μs .. 3.947 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking rdf4h/parsers/ntriples-attoparsec
time                 243.0 ms   (233.0 ms .. 250.2 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 242.8 ms   (239.4 ms .. 245.9 ms)
std dev              3.566 ms   (2.099 ms .. 5.014 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking rdf4h/parsers/turtle-parsec
time                 349.1 ms   (329.8 ms .. 367.2 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 345.9 ms   (342.6 ms .. 348.0 ms)
std dev              3.203 ms   (0.0 s .. 3.692 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking rdf4h/parsers/turtle-attoparsec
time                 310.0 ms   (303.6 ms .. 317.2 ms)
                     1.000 R²   (0.998 R² .. 1.000 R²)
mean                 308.2 ms   (306.1 ms .. 311.1 ms)
std dev              3.019 ms   (749.2 μs .. 3.740 ms)
variance introduced by outliers: 16% (moderately inflated)

axman6

Review summary: I'm an idiot =)

axman6 · 2016-11-21T00:16:56Z

bench/MainCriterion.hs

        (do rdfContent <- T.pack <$> readFile "bills.099.actions.rdf"
+            fawltyContentTurtle <- T.pack <$> readFile "data/ttl/fawlty1.ttl"
+            fawltyContentNTriples <- T.pack <$> readFile "data/nt/all-fawlty-towers.nt"
            let (Right rdf1) =
                  parseString (XmlParser Nothing Nothing) rdfContent
            let (Right rdf2) =


Is there any reason to not just replace all uses of rdf2 with rdf1? they are guaranteed to be identical.

Oh, they're different types! never mind!

Would be good to have type sigs here instead of at their use site.

axman6 · 2016-11-21T00:21:58Z

That's great news, I'm glad you managed to fix those failures. Have you confirmed that the attoparsec implementation passes all the tests? this isn't guaranteed since parsec and attoparsec may implement different semantics (though your liberal use of try should avoid that problem - it might be worth trying its usage if possible).

Also, would you be able to add the instructions for running the tests to the README? I was running them by downloading the W3C tests from the W3C directly and putting them in the ref-tests directory, which seemed to work, but was apparently wrong.

robstewart57 · 2016-11-26T16:27:28Z

Good point. I've separated out the tests for Turtle and NTriples parsing, to separately report parsec and attoparsec results. Here they are:

NTriples

NTriples with parsec

$ stack test --test-arguments="--pattern parser-w3c-tests-ntriples-parsec"
...
All 68 tests passed (1.41s)

NTriples with attoparsec

$ stack test --test-arguments="--pattern parser-w3c-tests-ntriples-attoparsec"
...
27 out of 68 tests failed (1.56s)

Turtle

Turtle with parsec

$ stack test --test-arguments="--pattern parser-w3c-tests-turtle-parsec"
...
20 out of 291 tests failed (1.48s)

Turtle with attoparsec

$ stack test --test-arguments="--pattern parser-w3c-tests-turtle-attoparsec"
...
110 out of 291 tests failed (1.61s)

Summary

So, indeed their semantics are different. I haven't looked into the failure reports to see why.

I've exposed TurtleParserCustom and NTriplesParserCustom, to opt in to either the parsec and attoparsec instances. The existing TurtleParser and NTriplesParser instances of RdfParser use the parsec instances of Parsing, because it passes more W3C tests than the attoparsec instance currently. If a user wants higher performance, they should use TurtleParserCustom and NTriplesParserCustom.

I've merged all of your commits, and my recent commits, into the master branch.

robstewart57 · 2016-11-26T16:38:58Z

Also, would you be able to add the instructions for running the tests to the README? I was running them by downloading the W3C tests from the W3C directly and putting them in the ref-tests directory, which seemed to work, but was apparently wrong.

Done.

Alex Mason added 11 commits November 17, 2016 15:39

Initial attempt at parsers based parser

008ef9c

WIP Debugging

4068265

Fix nesting of State{T} and Parsec{T}

1a0a845

iReset Manifest.hs to remove debugging nonsense

08070b0

Remove all redundant constraints from TurtleParser

b23e7df

Convert NTriplesParsers to use parsers library

3265cbb

Cleanup Types and use parsers library

7ece3ae

Make parsers more idiomatic, avoid Monad when we can, use gets, modif…

383e897

…y etc.

Cleanup NTriplesParser

ce3793e

More cleaning up in Types

0ebdc65

More tidying up

0f2a4b0

Fixes 6 W3C test failures

146db62

Fixes 383e897

axman6 commented Nov 21, 2016

View reviewed changes

robstewart57 merged commit 7416501 into robstewart57:master Nov 26, 2016

robstewart57 mentioned this pull request May 31, 2019

Improve performance #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsers based parsers #36

Parsers based parsers #36

axman6 commented Nov 18, 2016

robstewart57 commented Nov 19, 2016

axman6 commented Nov 20, 2016

axman6 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016 •

edited

Loading

axman6 left a comment

axman6 Nov 21, 2016

axman6 Nov 21, 2016

axman6 Nov 21, 2016

axman6 commented Nov 21, 2016

robstewart57 commented Nov 26, 2016

robstewart57 commented Nov 26, 2016

Parsers based parsers #36

Parsers based parsers #36

Conversation

axman6 commented Nov 18, 2016

robstewart57 commented Nov 19, 2016

axman6 commented Nov 20, 2016

axman6 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016

robstewart57 commented Nov 20, 2016 • edited Loading

axman6 left a comment

Choose a reason for hiding this comment

axman6 Nov 21, 2016

Choose a reason for hiding this comment

axman6 Nov 21, 2016

Choose a reason for hiding this comment

axman6 Nov 21, 2016

Choose a reason for hiding this comment

axman6 commented Nov 21, 2016

robstewart57 commented Nov 26, 2016

NTriples

Turtle

Summary

robstewart57 commented Nov 26, 2016

robstewart57 commented Nov 20, 2016 •

edited

Loading