Replies: 7 comments 6 replies
-
Looking for graph names uses the GSPO - the iterator is in the ideal place at the end of a graph because the graph boundaries are not significant in the index. The next quad is a new graph. A special iterator could be written that worked directly on an index to find only the first part of a tuple (the indexes don't know about tuples - they store This would be specific to the BPT implementation. Maintaining a separate, additional datastructure for the graph names is possible but not-trivial in the delete(quad) case. It's easy for adding data (check whether the graph name has been seen - a set in other words) but knowing when a graph name is not long in-use is not easy. There's no reference count of triples in a graph which is needed to know if a delete removes the last quad in a graph. Maintaining a reference count means knowing whether an add has an effect, or whether an add is for a triple that already exists. Ditto delete. Indexes return the information at the low level (this is TDB2/BPT specific), but it isn't exposed all the way up the stack. It would restrict the possibility of remote or async streaming data if there is a response coming back from "add(quad)" or "delete(quad)". It doesn't work for TDB3/RocksDB where changes are batched to amortize overheads. |
Beta Was this translation helpful? Give feedback.
-
As usual thank's for the insights and the discussion Andy! TDB2 with ~ 1 billion triples in named graphs to keep the data separate for possible updates (e.g. an event dataset updates on a daily/weekly basis) "Give me all events in a given region resp. bounding box" SELECT ?e {
GRAPH ?g {
?e spatial:withinBoxGeom ("POLYGON((19.49 50.62,26.87 50.626,26.87 46.43,19.49 46.43,19.49 50.62))"^^geo:wktLiteral)
}
} it's a very simple query here as sketch, but from what we could see from code or guess from JStack dump during execution, it iterates all quads with a distinct on the iterator to keep seen For now we do some query rewriting that inlines all named graphs as a filter expression which seems to work as there is some query optimizer that expands If you have some other hint for iterating all named graphs more efficiently, I'd be happy to test it. |
Beta Was this translation helpful? Give feedback.
-
That's two questions, both different from the original. The "distinct" would be made a distinctAdjacent (which is WIP). The property function changes the situation because IIRC it's not a struct multi-function. Any optimization risks changing the query. |
Beta Was this translation helpful? Give feedback.
-
thanks for the ideas, I guess adjacent just doesn't matter much for this small amount of graphs: duration of select ?e { ?e spatial:withinBoxGeom(...) } + unionDefaultGraph => 0.098 seconds duration of select ?g { graph ?g {} } group by ?g => 83 seconds duration of select ?e { graph ?g { ?e spatial:withinBoxGeom(...) } } => 82 seconds with your patches duration of select ?g { graph ?g {} } group by ?g => 82 seconds duration of select ?e { graph ?g { ?e spatial:withinBoxGeom(...) } } => 82 seconds |
Beta Was this translation helpful? Give feedback.
-
If the graphs are known - and we have only just heard it is 40 graphs - try putting the graphs into a general dataset where graph names is a set keys operation.
No. |
Beta Was this translation helpful? Give feedback.
-
another case probably similar to our spatial case. but does not involve custom property functions
|
Beta Was this translation helpful? Give feedback.
-
The iterator that walks the B+trees data access is |
Beta Was this translation helpful? Give feedback.
-
Version
4.7.0-SNAPSHOT
Question
In some queries like graph ?g {}, it is desirable to get a list of all named graphs currently in a dataset. However, this is exceptionally slow for huge datasets with billions of quads, as Jena has to run quads.findAll().distinct(), or even for some query types where ?g is unbound.
I wonder if it could be sped up somehow by making use of a G??? index, by bisecting to the right node after the end of each Graph and thus avoiding having to iterate through all quads?
Beta Was this translation helpful? Give feedback.
All reactions