Share vector qual functionality across scan nodes #7283

erimatnor · 2024-09-19T13:45:43Z

Refactor the code for vectorized filtering over arrow arrays so that the code can be called from different scan nodes.

In the future, it might make sense to move the code to a separate directory for code related to vectorization.

Disable-check: force-changelog-file

codecov · 2024-09-19T13:55:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.28%. Comparing base (59f50f2) to head (e18606a).
Report is 361 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #7283       +/-   ##
===========================================
+ Coverage   80.06%   92.28%   +12.21%     
===========================================
  Files         190      205       +15     
  Lines       37181    38593     +1412     
  Branches     9450    10000      +550     
===========================================
+ Hits        29770    35615     +5845     
+ Misses       2997     2978       -19     
+ Partials     4414        0     -4414

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akuzm · 2024-09-19T14:19:43Z

The compressed batch changes should also be in this PR, right?

erimatnor · 2024-09-19T15:05:56Z

The compressed batch changes should also be in this PR, right?

Ok, I'll add those. The original change splits plan-time and execution-time changes across two commits. But maybe for this it makes sense to combine.

erimatnor · 2024-09-19T15:43:30Z

@akuzm updated the PR with the additional changes to compressed_batch.c

akuzm · 2024-09-20T13:29:39Z

Tsbench looks OK: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3736&var-run2=3747&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

erimatnor · 2024-09-20T16:56:37Z

Tsbench looks OK: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3736&var-run2=3747&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

Is that an approval? 😄

akuzm · 2024-09-20T17:49:07Z

Is that an approval? 😄

Well, at least this is good news :) I plan to study the TAM code to understand more how you use this interface, and think about compatibility with vectorized expressions, so probably next week.

erimatnor · 2024-09-21T08:39:59Z

Is that an approval? 😄

Well, at least this is good news :) I plan to study the TAM code to understand more how you use this interface, and think about compatibility with vectorized expressions, so probably next week.

FWIW, these were the minimal changes I came up with to not change too much. I think it is possible, with your collaboration, to do further refactoring later to make this into a kind of vectorization "library". But I think the goal right now is to just make smallest changes necessary to make it work across scan nodes.

mkindahl

Mostly refactoring and moving code, so no surprises here. A few nits that you can deal with as you see fit.

tsl/src/nodes/decompress_chunk/compressed_batch.c

tsl/src/nodes/decompress_chunk/vector_quals.h

tsl/src/nodes/decompress_chunk/planner.c

akuzm · 2024-09-24T12:59:30Z

tsl/src/nodes/decompress_chunk/vector_quals.h

+	const ArrowArray *(*get_arrow_array)(struct VectorQualState *vqstate, const Var *var,
+										 bool *is_default_value);


So this thing is going to become problematic for vectorized expressions. For now the plan is to change this Var to an Expr, and do a recursive descent evaluation of these expressions. Unfortunately this has to be tightly coupled with filters to allow expressions there, and the filters are coupled with decompression to allow lazy column reads.

Probably the way I'll have to get around this, at least at the early stages, is by copying the code to decouple.

But there's a more high level thing I want to understand. I see that the ArrowTupleTableSlot inherits from VirtualTupleTableSlot. It also looks like you are not using late materialization (decompressing the columns lazily in slot_getattr). What other technical problems prevent you from inheriting from DecompressBatchState and using the compressed_batch interface?

If it is a Var now, I think we can merge it as a Var, and then change the common interface later to use Expr. We can even rework the entire interface to something suitable. But the important thing is that this works now. Code can always change.

But there's a more high level thing I want to understand. I see that the ArrowTupleTableSlot inherits from VirtualTupleTableSlot. It also looks like you are not using late materialization (decompressing the columns lazily in slot_getattr). What other technical problems prevent you from inheriting from DecompressBatchState and using the compressed_batch interface?

I guess this can't be answered without a little bit of history.

When we first implemented ArrowTupleTableSlot, DecompresBatchState did not inherit VirtualTupleTableSlot.

As you might recall, I proposed ArrowTupleTableSlot as a way to pass columnar data across scan nodes already back in February (7th) (there's a thread about it). I had obviously looked at DecompressBatchState et al., and it didn't fulfill our needs and didn't even "inherit" a TupleTableSlot, which is a requirement to be able to pass it "upward" in scan nodes. Therefore, we proposed ArrowTupleTableSlot and that we develop it for use with both the TAM and other scan nodes, e.g., VectorAgg and DecompressChunk.

However, at that time you said that table slots are not a suitable API for columnar data, they are made for single rows, and therefore lack any methods for columnar data.

However, it seems that we now agree more than we did back then, which is very positive. I also note that after our discussion, DecompressBatchState changed and started to also "inherit" VirtualTupleTableSlot just like ArrowTupleTableSlot . This is a good change, because it seems, after all, that "being a slot" and have it carry columnar data works well. And, yes, we do implement late materialization via slot_getattr()/slot_getsomeattrs().

Now, DecompressBatchState is still not a full TupleTableSlot implementation because it doesn't implement any of the callback interfaces which is what you need to make it identify as a separate type. Obviously, the table access method relies heavily on this interface and the implementation of the callbacks. Ripping it out now and replacing it with something else is not feasible.

While it is positive that DecompresBatchState is now more like ArrowTupleTableSlot in that it "inherits" VirtualTupleTableSlot, I don't think it is the right time to restart the discussion on a common slot implementation right now, given that this is the exact proposal I had back when DecompressBatchState didn't use inheritance.

I think we should merge ArrowTupleTableSlot as is, and then it seems like a good idea to resume the February discussion on a common slot implementation since there seems to now be more agreement on the approaches proposed and implemented by ArrowTupleTableSlot.

I added a commit that includes passing down Expr instead of Var. The casting to Var now happens in get_arrow_array() instead. Does this alleviate your concern @akuzm ?

Again, the code is not static and if you plan future refactoring it is possible to change this interface.

Well, the code changes until it becomes too complicated. Another couple callbacks like that, and I'll be unable to tell what's going on. I think we all understand that this interface is not particularly elegant, and more of a hack to proceed with TAM.

Anyway, when I feel that I'm unable to refactor it, I'll solve it by duplicating the compressed batch stuff in its entirety, until we figure out a good common interface. We don't have time to get stuck in these things.

Now, DecompressBatchState is still not a full TupleTableSlot implementation because it doesn't implement any of the callback interfaces which is what you need to make it identify as a separate type. Obviously, the table access method relies heavily on this interface and the implementation of the callbacks. Ripping it out now and replacing it with something else is not feasible.

Exactly, this is done on purpose, I'm following the virtual tuple protocol for performance reasons, Postgres has several special optimization for it. I don't mean you should do the same, but you probably could have used the DecompressBatchState as a storage and use all the normal interfaces. Adding late materialization is relatively easy, I had a PR for that. You don't do that so that's the reason you have to pass your data in via a callback. I don't think this is viable long term.

You don't do that so that's the reason you have to pass your data in via a callback. I don't think this is viable long term.

Maybe we have different definitions of "late materialization". ArrowTupleTableSlot materializes the arrow array (decompresses the data) for the column in question when slot_getsomeattrs/slot_getattr is called. Obviously, you have to materialize the array when the data is needed, which is the case for executing the vector quals over it. So, the ArrowTupleTableSlot version of get_arrow_array, will in fact call slot_getsomeattrs() on the slot and it will only materialize the set of columns the qual is calculated over. For now, we are obviously bound to the vector qual code, as we didn't want to change it more then necessary initially. But if there are improvements to be made there for "late materialization" it can be done later.

Refactor the code for vectorized filtering over arrow arrays so that the code can be called from different scan nodes. In the future, it might make sense to move the code to a separate directory for code related to vectorization.

erimatnor requested review from akuzm, antekresic and mkindahl September 19, 2024 13:46

erimatnor force-pushed the create-vectorized-qual-interface branch from d1a54c7 to 3cd576c Compare September 19, 2024 15:41

erimatnor force-pushed the create-vectorized-qual-interface branch from 3cd576c to 2c68492 Compare September 20, 2024 16:57

fabriziomello assigned erimatnor Sep 23, 2024

mkindahl approved these changes Sep 24, 2024

View reviewed changes

akuzm reviewed Sep 24, 2024

View reviewed changes

erimatnor force-pushed the create-vectorized-qual-interface branch from 2c68492 to d271847 Compare September 25, 2024 06:24

erimatnor requested a review from akuzm September 25, 2024 08:44

akuzm approved these changes Sep 25, 2024

View reviewed changes

erimatnor enabled auto-merge (rebase) September 25, 2024 12:26

erimatnor force-pushed the create-vectorized-qual-interface branch from d271847 to 83d3021 Compare September 25, 2024 12:27

erimatnor disabled auto-merge September 25, 2024 12:38

Share vector qual functionality across scan nodes

e18606a

Refactor the code for vectorized filtering over arrow arrays so that the code can be called from different scan nodes. In the future, it might make sense to move the code to a separate directory for code related to vectorization.

erimatnor force-pushed the create-vectorized-qual-interface branch from 83d3021 to e18606a Compare September 25, 2024 12:40

erimatnor merged commit 57c1dec into timescale:main Sep 25, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share vector qual functionality across scan nodes #7283

Share vector qual functionality across scan nodes #7283

erimatnor commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

akuzm commented Sep 19, 2024

erimatnor commented Sep 19, 2024 •

edited

Loading

erimatnor commented Sep 19, 2024

akuzm commented Sep 20, 2024

erimatnor commented Sep 20, 2024

akuzm commented Sep 20, 2024

erimatnor commented Sep 21, 2024

mkindahl left a comment

akuzm Sep 24, 2024

erimatnor Sep 24, 2024

erimatnor Sep 25, 2024

akuzm Sep 25, 2024

akuzm Sep 25, 2024

erimatnor Sep 25, 2024

		const ArrowArray (get_arrow_array)(struct VectorQualState vqstate, const Var var,
		bool *is_default_value);

Share vector qual functionality across scan nodes #7283

Share vector qual functionality across scan nodes #7283

Conversation

erimatnor commented Sep 19, 2024 • edited Loading

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

akuzm commented Sep 19, 2024

erimatnor commented Sep 19, 2024 • edited Loading

erimatnor commented Sep 19, 2024

akuzm commented Sep 20, 2024

erimatnor commented Sep 20, 2024

akuzm commented Sep 20, 2024

erimatnor commented Sep 21, 2024

mkindahl left a comment

Choose a reason for hiding this comment

akuzm Sep 24, 2024

Choose a reason for hiding this comment

erimatnor Sep 24, 2024

Choose a reason for hiding this comment

erimatnor Sep 25, 2024

Choose a reason for hiding this comment

akuzm Sep 25, 2024

Choose a reason for hiding this comment

akuzm Sep 25, 2024

Choose a reason for hiding this comment

erimatnor Sep 25, 2024

Choose a reason for hiding this comment

erimatnor commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

erimatnor commented Sep 19, 2024 •

edited

Loading