You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, Beam automatically chooses the number of shards to write, but there are cases where these shards are either significantly larger or smaller than desired. Users can manually set the number of shards, but this comes at a(n often significant) performance cost as it forces all data to be reshuffled, and also requires guessing at the total output size to determine the correct number of shards.
While creating two many small shards may be difficult to rectify (distinct bundles cannot write to the same shard), writing multiple shards in a single bundle is a straightforward way to enforce an upper bound on shard sizes.
Issue Priority
Priority: 2
Issue Component
Component: io-py-common
The text was updated successfully, but these errors were encountered:
What would you like to happen?
By default, Beam automatically chooses the number of shards to write, but there are cases where these shards are either significantly larger or smaller than desired. Users can manually set the number of shards, but this comes at a(n often significant) performance cost as it forces all data to be reshuffled, and also requires guessing at the total output size to determine the correct number of shards.
While creating two many small shards may be difficult to rectify (distinct bundles cannot write to the same shard), writing multiple shards in a single bundle is a straightforward way to enforce an upper bound on shard sizes.
Issue Priority
Priority: 2
Issue Component
Component: io-py-common
The text was updated successfully, but these errors were encountered: