-
Notifications
You must be signed in to change notification settings - Fork 36
Dealing with rapper's 2GB limitation
Problem: csv2rdf4lod outputs Turtle and rapper cannot parse turtle files >2GB.
-
find . -size +1900M
to find output files that rapper will fail to parse. -
stat -c "%s"
ORstat -f "%z"
, depending on the flavor of unix... -
du -sch *.ttl | tail -1
will show file size
Logic to determine if a turtle file is too big for rapper is in:
- bin/util/too-big-for-rapper.sh (use this one)
- bin/util/rdf2nt.sh (reproduced logic to avoid csv2rdf4lod dependencies, so that this script can stand alone.)
- bin/convert-aggregate.sh (uses
stat -f
andfind publish -size +1900M
) SHOULD BE UPDATED TO USE too-big-for-rapper.sh - bin/util/pvload.sh (uses
find . -size +1900M
) SHOULD BE UPDATED TO USE too-big-for-rapper.sh
-
$CSV2RDF4LOD_HOME/bin/util/too-big-for-rapper.sh will tell you "yes" or "no".
-
$CSV2RDF4LOD_HOME/bin/split_ttl.pl will take a list of files and split them into
chunk-FILENAME-NNN.ttl
Assumes that@prefix
definitions are sprinkled every 1.9 gig or so (acceptable for csv2rdf4lod outputs, but does not work generally). -
$CSV2RDF4LOD_HOME/bin/util/bigttl2nt.sh will print ntriples to stdout. Assumes that
@prefix
definitions are sprinkled every 1.9 gig or so (acceptable for csv2rdf4lod outputs, but does not work generally).
serdi does not have the 2GB restriction that rapper has. And it's fast, with "no" dependencies. serdi doesn't handle RDF/XML, so rapper is still in the game...