feat(streaming): support up to 16-bit vnode count in row id gen #18529

BugenZhao · 2024-09-13T09:19:05Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This is a progress towards #15900.

This PR supports vnode count up to 16-bit in row-id generator.

Previously we reserved 10 bits for the vnode part in row-id, which limits the vnode count to 1024. This PR extends the format to dynamically allocate bits between the vnode part and the sequence part in row-id, allowing arbitrary vnode count up to 16-bit.

This does not affect the maximum throughput we support in row-id generator, i.e., still 1 << 22 rows per millisecond.

Note that there are some subtle cases that need to pay attention to, majorly in backward compatibility. Can refer to the documentation and comments in the code for more details.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

BugenZhao · 2024-09-13T09:19:18Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @BugenZhao and the rest of your teammates on Graphite

fuyufjh

LGTM

fuyufjh · 2024-09-18T06:00:12Z

src/common/src/util/row_id.rs

+/// This is okay because we rely on the reversibility only if the serial type (row id) is generated
+/// and persisted in the same fragment, where the vnode count is the same. In other cases, the
+/// serial type is more like a normal integer type, and the algorithm to hash or compute vnode from
+/// it does not matter.


IIUC, the root cause of the problem is because of the improper way to hack the hash() function of Serial.

Currently, we did a hack here: if there is only one column and the column type is Serial, we will use extract_vnode_id_from_row_id instead of the standard hash function.

risingwave/src/common/src/hash/consistent_hash/vnode.rs

Lines 133 to 168 in 8a32a9b

// `compute_chunk` is used to calculate the `VirtualNode` for the columns in the

// chunk. When only one column is provided and its type is `Serial`, we consider the column to

// be the one that contains RowId, and use a special method to skip the calculation of Hash

// and directly extract the `VirtualNode` from `RowId`.

pub fn compute_chunk(

data_chunk: &DataChunk,

keys: &[usize],

vnode_count: usize,

) -> Vec<VirtualNode> {

if let Ok(idx) = keys.iter().exactly_one()

&& let ArrayImpl::Serial(serial_array) = &**data_chunk.column_at(*idx)

{

return serial_array

.iter()

.enumerate()

.map(|(idx, serial)| {

if let Some(serial) = serial {

extract_vnode_id_from_row_id(serial.as_row_id())

} else {

// NOTE: here it will hash the entire row when the `_row_id` is missing,

// which could result in rows from the same chunk being allocated to different chunks.

// This process doesn’t guarantee the order of rows, producing indeterminate results in some cases,

// such as when `distinct on` is used without an `order by`.

let (row, _) = data_chunk.row_at(idx);

row.hash(Crc32FastBuilder).to_vnode(vnode_count)

}

})

.collect();

}

data_chunk

.get_hash_values(keys, Crc32FastBuilder)

.into_iter()

.map(|hash| hash.to_vnode(vnode_count))

.collect()

}

I think the best solution is to use a special distribution, e.g. RowIdDistribution instead of HashDistribution. This essentially remove the hack and make everything clear.

cc. @st1page

yezizp2012

LGTM

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

This was referenced Sep 13, 2024

feat(storage): variable vnode count support #18415

Merged

feat: support per-fragment vnode count #18444

Merged

refactor(frontend): extract filling fields in fragment graph #18466

Merged

BugenZhao mentioned this pull request Sep 13, 2024

feat: user-facing part of variable vnode count #18515

Merged

4 tasks

github-actions bot added the type/feature label Sep 13, 2024

BugenZhao requested review from shanicky, fuyufjh and yezizp2012 September 13, 2024 09:20

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from fe25c89 to 4ae30fd Compare September 13, 2024 10:13

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from 82d9a83 to cb62b7a Compare September 13, 2024 10:13

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from 4ae30fd to 9cdeee0 Compare September 16, 2024 05:31

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch 4 times, most recently from 4f4f9ce to c20dae1 Compare September 16, 2024 07:39

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from 9cdeee0 to 16b19b0 Compare September 17, 2024 07:32

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from c20dae1 to 787e661 Compare September 17, 2024 07:33

fuyufjh approved these changes Sep 18, 2024

View reviewed changes

yezizp2012 approved these changes Sep 18, 2024

View reviewed changes

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from 16b19b0 to e5b0095 Compare September 19, 2024 06:21

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from 787e661 to ec5bca3 Compare September 19, 2024 06:21

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from e5b0095 to 0d6830b Compare September 20, 2024 15:51

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from ec5bca3 to 8c36626 Compare September 20, 2024 15:51

BugenZhao force-pushed the bz/var-vnode-user-facing-local branch from 0d6830b to db14e46 Compare September 23, 2024 10:07

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from 8c36626 to 3c19e8b Compare September 23, 2024 10:08

Base automatically changed from bz/var-vnode-user-facing-local to main September 24, 2024 07:12

graphite-app bot requested a review from a team September 24, 2024 07:12

BugenZhao added 2 commits September 24, 2024 15:33

feat(streaming): support up to 16-bit vnode count in row id gen

441617c

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

refine docs

93813df

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

still use zip

cf797a8

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao force-pushed the bz/var-vnode-row-id-gen branch from 3c19e8b to cf797a8 Compare September 24, 2024 07:34

BugenZhao enabled auto-merge September 24, 2024 07:41

BugenZhao added this pull request to the merge queue Sep 24, 2024

Merged via the queue into main with commit 5f800b9 Sep 24, 2024
30 of 31 checks passed

BugenZhao deleted the bz/var-vnode-row-id-gen branch September 24, 2024 08:02

BugenZhao mentioned this pull request Sep 24, 2024

feat(frontend): show job's max parallelism in system tables #18672

Merged

4 tasks

fuyufjh mentioned this pull request Sep 24, 2024

refactor: introduce new distribution to avoid the hack of hash of SERIAL #18677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(streaming): support up to 16-bit vnode count in row id gen #18529

feat(streaming): support up to 16-bit vnode count in row id gen #18529

BugenZhao commented Sep 13, 2024 •

edited

Loading

BugenZhao commented Sep 13, 2024 •

edited

Loading

fuyufjh left a comment

fuyufjh Sep 18, 2024

yezizp2012 left a comment

	// `compute_chunk` is used to calculate the `VirtualNode` for the columns in the
	// chunk. When only one column is provided and its type is `Serial`, we consider the column to
	// be the one that contains RowId, and use a special method to skip the calculation of Hash
	// and directly extract the `VirtualNode` from `RowId`.
	pub fn compute_chunk(
	data_chunk: &DataChunk,
	keys: &[usize],
	vnode_count: usize,
	) -> Vec<VirtualNode> {
	if let Ok(idx) = keys.iter().exactly_one()
	&& let ArrayImpl::Serial(serial_array) = &*data_chunk.column_at(idx)
	{
	return serial_array
	.iter()
	.enumerate()
	.map(\|(idx, serial)\| {
	if let Some(serial) = serial {
	extract_vnode_id_from_row_id(serial.as_row_id())
	} else {
	// NOTE: here it will hash the entire row when the `_row_id` is missing,
	// which could result in rows from the same chunk being allocated to different chunks.
	// This process doesn’t guarantee the order of rows, producing indeterminate results in some cases,
	// such as when `distinct on` is used without an `order by`.
	let (row, _) = data_chunk.row_at(idx);
	row.hash(Crc32FastBuilder).to_vnode(vnode_count)
	}
	})
	.collect();
	}

	data_chunk
	.get_hash_values(keys, Crc32FastBuilder)
	.into_iter()
	.map(\|hash\| hash.to_vnode(vnode_count))
	.collect()
	}

feat(streaming): support up to 16-bit vnode count in row id gen #18529

feat(streaming): support up to 16-bit vnode count in row id gen #18529

Conversation

BugenZhao commented Sep 13, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

BugenZhao commented Sep 13, 2024 • edited Loading

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh Sep 18, 2024

Choose a reason for hiding this comment

yezizp2012 left a comment

Choose a reason for hiding this comment

BugenZhao commented Sep 13, 2024 •

edited

Loading

BugenZhao commented Sep 13, 2024 •

edited

Loading