Skip to content

Latest commit

 

History

History
117 lines (107 loc) · 4.81 KB

TODO.org

File metadata and controls

117 lines (107 loc) · 4.81 KB

TODO-List

Driver

allow selecting/filtering schemas

report SQLSTATE

need to patch libpqxx for that http://pqxx.org/development/libpqxx/ticket/219

It’s probably better to ditch libpqxx for the testing connection and use a custom class instead that abstracts different products

grammar

[#A] add proper identifier quoting

sqlsmiths fails horribly with databases containing identifiers that require quoting.

[#A] Allow productions to fail in factories

  • Make productions throw a smith::no_candidate : std::runtime_error
  • Catch it and randomly pick another one.
  • Need to count errors and escalate at some point

Add functions

  • convenient for pulling values out of nowhere (version(), now(), etc)
  • Should be some coverage wrt. fuzzing (to_char…)

Add aggregates

  • in window functions
  • in group by
  • need to find a way to constrain column-referencing exprs to contain an aggregate somewhere expr->agg->expr agg->expr expr->agg
    • maybe use needs_aggreate bool default argument to expr constructors

UPDATE

INSERT

DELETE

[#C] Generate data for literal use in queries

Improve random generation

  • Add Attributes to rules so factories can pick candidates in a weighted fashion.
  • This attribute could then also be used to blacklist productions for incompatible RDBMs
  • Factor in graph level and AST node count in decisions productions with high “fan-out”. This should be made customizeable. E.g. target depth/target node count. Or better use a single scale factor instead

schema + type system

DTRT with arrays

  • review standard so we don’t drift into non-standard pg stuff

composite/record types

[#C] support more RDMBSs

  • Should add more than products early to avoid a point of no return
  • Allow blacklisting of productions for RDBMs with gaps wrt standards conformance (probably depends on improved random generation)
  • Should be easy to prune unsupported productions automatically on syntax errors. Or maybe it’s better to add a calibration phase? -> conflicts with hard-coded grammar
  • Allow automatic detection of problematic productions via stats visitor.
  • Alternate approach: add a test_compat() method to rules

relmodel

[#C] operations on Tuples

instead of hacking up tuples inside productions it’s more sensible to implement operators in relmodel.cc join() project() union() select()

[#C] primitive cost model

Load samples at startup to have a pool of values for literals

  • how to do it in a reproducible fashion? TABLESAMPLE? ORDER BY?
  • maybe use atomic value subselects to fetch compatible values

Performance

revisionqueries/snodescomment
ee9c94f-dirty208?
4547909-dirty12572
7fa25c6-dirty15654
32a5d2a18854
3a29a4023854
57101e219354
52c5b9221237
efca82720537changed RNG to 64-Bit Mersenne Twister
9099e0718537coalesce production
time ./sqlsmith --verbose --target='dbname=regression' --dry-run --max-queries=10000 > /dev/null

Postgresql Line Coverage

sqlsmithoverallparser
a4c198926.020.4
ee099e633.825.8
231c88a34.6528.1
7ffac2d39.830.3
combined testingoverallparser
sqlsmith+make check65.180.4
make check6280.2
sqlsmith 7ffac2d39.830.3

Reference:

overallparser
pg_ctl start/stop5.80.5
–max-queries=016.614.6
./configure --enable-coverage

:

make install
initdb /tmp/gcov
pg_ctl -D /tmp/gcov start
make installcheck
pg_ctl -D /tmp/gcov stop
make coverage-clean
pg_ctl -D /tmp/gcov start
# since 7ffac2d: 4 instances w/25000 each instead 1 instance w/10000 queries
sqlsmith --target='dbname=regression' --max-queries=25000 &
sqlsmith --target='dbname=regression' --max-queries=25000 &
sqlsmith --target='dbname=regression' --max-queries=25000 &
sqlsmith --target='dbname=regression' --max-queries=25000 &
wait
pg_ctl -D /tmp/gcov stop
make coverage-html