-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoids emitting the same output from multiple processes of the same process pool #3531
Avoids emitting the same output from multiple processes of the same process pool #3531
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
crates/turbopack-node/src/pool.rs
Outdated
let mut shared = shared.lock().unwrap(); | ||
shared.insert((line.clone(), occurances)) | ||
}; | ||
if new_line { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding this correctly, a new line gets logged whenever there are more than the previous amount of occurrences?
This also means if one execution logs something 2 times and the next logs it 3 times, it will only output 1 time for the next execution? That seems like confusing behavior that might potentially hide information necessary for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could be a bit confusing in some edge cases... Do you have a better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do it for the whole output and not line by line
Benchmark for 81e2498
Click to view full benchmark
|
Benchmark for 2f063d8
Click to view full benchmark
|
|
} | ||
let line = Arc::from(take(&mut buffer).into_boxed_slice()); | ||
let occurance_number = *own_output | ||
.entry(Arc::clone(&line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we store a hash instead of the raw bytes? Wouldn't that save memory overall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That work work (with a small risk of collisions), but that would loose the ability to read the logged output from the browser ui in future.
.or_insert(0); | ||
let new_line = { | ||
let mut shared = shared.lock().unwrap(); | ||
shared.insert((line.clone(), occurance_number)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need occurance_number
? Wouldn't just a HashSet
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to allow logging lines multiple times from a process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially planned to use HashMap<line: String, count: u32>
but that won't preserve the order of logging. So it's a IndexSet<(line: String, occurance_number: u32)>
which preserves all logging in the correct order.
We can display that in the browser UI for inspecting by iterating over the IndexSet. That will give all logging in the right order. Merged from all processes of the pool.
Benchmark for 9c2ea3aClick to view benchmark
|
Ok(_) => {} | ||
} | ||
let line = Arc::from(take(&mut buffer).into_boxed_slice()); | ||
let occurance_number = *own_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let occurance_number = *own_output | |
let occurrence_number = *own_output |
.or_insert(0); | ||
let new_line = { | ||
let mut shared = shared.lock().unwrap(); | ||
shared.insert((line.clone(), occurance_number)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared.insert((line.clone(), occurance_number)) | |
shared.insert((line.clone(), occurrence_number)) |
stream: impl AsyncRead + Unpin, | ||
shared: SharedOutputSet, | ||
mut final_stream: impl AsyncWrite + Unpin, | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If stream A reports:
error A:
shared line
non-shared line (A)
while stream B reports:
error B:
shared line
non-shared line (B)
We would end up with:
error A:
shared line
non-shared line (A)
error B:
non-shared line (B)
I expect this to also happen with empty lines printed for formatting purposes.
Should we instead only dedupe the start of a stream, up until the first different character?
Benchmark for ba8c683Click to view benchmark
|
…rocess pool (#3531) It uses a `shared` set of `(line: [u8], occurences: u32)` to avoid emitting lines that has been emitted by other processes already. It still supports emitting the same line multiple times from one process. The `shared` data might be later used for showing logging from a worker pool. It contains the merged output of all processes. Fixes WEB-489
…49736102+kodiakhq[bot]@users.noreply.github.com> # New Features - vercel/turborepo#3540 - vercel/turborepo#3549 - vercel/turborepo#3465 - vercel/turborepo#3550 - vercel/turborepo#3495 - vercel/turborepo#3624 - vercel/turborepo#3600 - vercel/turborepo#3676 - vercel/turborepo#3689 # Fixes - vercel/turborepo#3437 - vercel/turborepo#3542 - vercel/turborepo#3531 - vercel/turborepo#3552 - vercel/turborepo#3551 - vercel/turborepo#3597 - vercel/turborepo#3644 - vercel/turborepo#3623 - vercel/turborepo#3634 - vercel/turborepo#3574 - vercel/turborepo#3673 - vercel/turborepo#3675 - vercel/turborepo#3723 - vercel/turborepo#3677 - vercel/turborepo#3717 - vercel/turborepo#3701 # Performance Improvements - vercel/turborepo#3361 - vercel/turborepo#3619 --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Why did you merge this with 0 approvals and 1 request for change? |
…rocess pool (vercel/turborepo#3531) It uses a `shared` set of `(line: [u8], occurences: u32)` to avoid emitting lines that has been emitted by other processes already. It still supports emitting the same line multiple times from one process. The `shared` data might be later used for showing logging from a worker pool. It contains the merged output of all processes. Fixes WEB-489
…rocess pool (vercel/turborepo#3531) It uses a `shared` set of `(line: [u8], occurences: u32)` to avoid emitting lines that has been emitted by other processes already. It still supports emitting the same line multiple times from one process. The `shared` data might be later used for showing logging from a worker pool. It contains the merged output of all processes. Fixes WEB-489
…rocess pool (vercel/turborepo#3531) It uses a `shared` set of `(line: [u8], occurences: u32)` to avoid emitting lines that has been emitted by other processes already. It still supports emitting the same line multiple times from one process. The `shared` data might be later used for showing logging from a worker pool. It contains the merged output of all processes. Fixes WEB-489
It uses a
shared
set of(line: [u8], occurences: u32)
to avoid emitting lines that has been emitted by other processes already.It still supports emitting the same line multiple times from one process.
The
shared
data might be later used for showing logging from a worker pool. It contains the merged output of all processes.Fixes WEB-489