Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filestore] Make Filestore read throughput scalable with respect to the number of clients #324

Closed
qkrorlqr opened this issue Feb 1, 2024 · 5 comments
Assignees
Labels
2024Q1 filestore Add this label to run only cloud/filestore build and tests on PR

Comments

@qkrorlqr
Copy link
Collaborator

qkrorlqr commented Feb 1, 2024

We want a single FS to be able to provide more read throughput when more clients (VMs) connect to the FS. The idea is simple: right now the bottleneck for large reads is the IndexTablet, because all data is proxied via it. But we don't really need to transfer all data via the tablet. The tablet can return <BSGroupId, BlobId, ByteRange> tupes to the client (filestore-vhost) instead of returning the requested data. The client can read the data from the specified BSGroups by itself. A fallback should be implemented for the case when the client cannot read the data by itself - e.g. if the specified BlobId has already been deleted or if the client has no direct access to the storage nodes.

@qkrorlqr qkrorlqr self-assigned this Feb 1, 2024
@qkrorlqr qkrorlqr added the 2024Q1 label Feb 2, 2024
qkrorlqr added a commit that referenced this issue Feb 6, 2024
…client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc
qkrorlqr added a commit that referenced this issue Feb 6, 2024
…client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc (#351)
@qkrorlqr
Copy link
Collaborator Author

qkrorlqr commented Feb 6, 2024

Включать фичу per fs будем как-то так:
filestore-client executeaction --action changestorageconfig --input-json '{"FileSystemId": "cahh6fjsng36rp186gjq", "StorageConfig": {"TwoStageReadEnabled": true}}'

Проверять так:
filestore-client executeaction --action getstorageconfigfields --input-json '{FileSystemId: "cahh6fjsng36rp186gjq", StorageConfigFields: ["TwoStageReadEnabled"]}'

qkrorlqr added a commit that referenced this issue Feb 6, 2024
…to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup
qkrorlqr added a commit that referenced this issue Feb 7, 2024
…to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup (#352)

* issue #324: cleanup: filesystem-id field is not needed for filestore-client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc

* issue #324: introduced TwoStageReadEnabled flag, returning this flag to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup
qkrorlqr added a commit that referenced this issue Feb 9, 2024
…client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc (#351)
qkrorlqr added a commit that referenced this issue Feb 9, 2024
…to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup (#352)

* issue #324: cleanup: filesystem-id field is not needed for filestore-client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc

* issue #324: introduced TwoStageReadEnabled flag, returning this flag to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup
qkrorlqr added a commit that referenced this issue Feb 9, 2024
…client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc (#351)
qkrorlqr added a commit that referenced this issue Feb 9, 2024
…to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup (#352)

* issue #324: cleanup: filesystem-id field is not needed for filestore-client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc

* issue #324: introduced TwoStageReadEnabled flag, returning this flag to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup
qkrorlqr added a commit that referenced this issue Feb 9, 2024
* issue #324: DescribeData API and implementation stub (#329)

* issue #324: DescribeData implementation and ut (#343)

* issue #324: cleanup: filesystem-id field is not needed for filestore-client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc (#351)

* issue #324: introduced TwoStageReadEnabled flag, returning this flag to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup (#352)

* issue #324: cleanup: filesystem-id field is not needed for filestore-client executeaction, removed broken (and unneeded) json validation; fixed comments here and there, etc

* issue #324: introduced TwoStageReadEnabled flag, returning this flag to filestore-vhost via TCreateSessionResponse::FileStore::Features; outputting StorageConfig overrides on monpage; minor cleanup

* issue #95: 1. deduplicating out-of-order compaction map chunk load requests in queue 2. limiting the number of out-of-order compaction map chunk load requests in queue (#382)

* fixed build after cherry-pick: contrib/ydb -> ydb + CMakeLists
qkrorlqr added a commit that referenced this issue Feb 9, 2024
…the two stage read implementation in TServiceActor
qkrorlqr added a commit that referenced this issue Feb 10, 2024
…the two stage read implementation in TServiceActor (#400)
qkrorlqr added a commit that referenced this issue Feb 11, 2024
…ockIndex since this block might be not initialized
qkrorlqr added a commit that referenced this issue Feb 11, 2024
…ockIndex since this block might be not initialized (#402)
qkrorlqr added a commit that referenced this issue Feb 15, 2024
…the two stage read implementation in TServiceActor (#400)
qkrorlqr added a commit that referenced this issue Feb 15, 2024
…ockIndex since this block might be not initialized (#402)
qkrorlqr added a commit that referenced this issue Feb 15, 2024
* issue-324: DescribeData: 1. BlobOffsets should be in bytes, not blocks 2. returning FileSize 3. added a ut that checks that we don't return data outside of the aligned superrange that contains the requested range (#399)

* issue #324: storing NProto::TFileStore in session state to use it in the two stage read implementation in TServiceActor (#400)

* issue #324: DescribeData: offset calculation shouldn't use TBlock::BlockIndex since this block might be not initialized (#402)

* NBSNEBIUS-101: use vhost-side reads WIP. Issue: #95 (#394)

* [Draft] NBSNEBIUS-101: use vhost-side reads

* WIP: service_actor_readdata

* add describe data test

* add describe data test

* add ut + fix readblob implementation

* enable TwoStageReadEnabled feature-flag

* fix ut + trigger large tests

---------

Co-authored-by: Maxim Deb Natkh <debnatkh@yandex.ru>

* issue-324: proper EvGet error handling, not outputting user data to logs, dependencies cleanup, code cleanup (#419)

* issue-324: proper EvGet error handling, not outputting user data to logs, dependencies cleanup, code cleanup

* issue-324: proper EvGet error handling, not outputting user data to logs, dependencies cleanup, code cleanup - forgot libs/storage/model/ut

* issue-324: proper EvGet error handling, not outputting user data to logs, dependencies cleanup, code cleanup - forgot libs/storage/model/public.h

* issue-324: proper EvGet error handling, not outputting user data to logs, dependencies cleanup, code cleanup - fixed event-log lib

* issue-324: TServiceActor: ReadData fallback in case DescribeData or EvGet fail (#428)

* issue-324: TServiceActor: ReadData fallback in case DescribeData or EvGet fail

* issue-324: TServiceActor: ReadData fallback in case DescribeData or EvGet fail

* issue-324: TServiceActor: ReadData fallback in case DescribeData or EvGet fail - discarding EvGetResults after switching to the ReadData fallback

* issue-324: TServiceActor: ReadData fallback in case DescribeData or EvGet fail - cleanup

* issue-324: added ut and loadtest for the TwoStageRead feature, fixed a crash in TReadDataActor::HandleReadBlobResponse (#441)

* issue-324: added ut and loadtest for the TwoStageRead feature, fixed a crash in TReadDataActor::HandleReadBlobResponse

* issue-324: added ut and loadtest for the TwoStageRead feature, fixed a crash in TReadDataActor::HandleReadBlobResponse - forgot nfs-storage.txt

* updated CMakeLists after cherry-pick

* updated CMakeLists after cherry-pick - forgot to add new CMakeLists

* contrib/ydb -> ydb

* contrib/ydb -> ydb

---------

Co-authored-by: Maxim Deb Natkh <debnatkh@gmail.com>
Co-authored-by: Maxim Deb Natkh <debnatkh@yandex.ru>
@qkrorlqr
Copy link
Collaborator Author

Screenshot from 2024-02-19 22-02-36
managed to scale read throughput to 10 GB/s total for 10 VMs

@debnatkh
Copy link
Collaborator

After optimizations in IndexTablet managed to get 13-15 GB/s for 10 vms.

image

@qkrorlqr
Copy link
Collaborator Author

qkrorlqr commented Apr 2, 2024

Implemented, deployed, works well, closing the issue

@qkrorlqr qkrorlqr closed this as completed Apr 2, 2024
@debnatkh
Copy link
Collaborator

debnatkh commented Apr 2, 2024

20 clients, fio, num_jobs = 32, bs = 1M

~25-30 GB/s

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024Q1 filestore Add this label to run only cloud/filestore build and tests on PR
Projects
None yet
Development

No branches or pull requests

2 participants