-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability for Dynamic Storage Tiering - NVME (superfast) + SSD (mid-tier) + HDD (slow) - manipulate 'brtfs balance' profiles #610
Comments
Also from my research, it seems that Netgear may have forked brtfs already to achieve this and they implemented their own algorithm for storage tiering in their now defunct ReadyNAS OS. See page 10 of https://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNAS_FlexRAID_Optimization_Guide.pdf and https://unix.stackexchange.com/questions/623460/tiered-storage-with-btrfs-how-is-it-done?answertab=modifieddesc#tab-top |
I was not aware of that, thanks for the links. It seems that readynas is not maitained and I can't find any git repositories assuming it's built on top of linux. Their page also does not mention 'btrfs' anywhere. The storage tiers are a feature people ask for so no surprise that somebody implemented that outside of linux but merging that back would be desirable. I haven't seen the code so it's hard to tell in what way it was implemented and if it would be acceptable, vendors often don't have to deal with backward compatibility or long term support so it's "cheaper" to do their private extensions instead. |
There is the patch set for metdata-on-ssd somewhere. This, I think would be a good middle-ground if they were accepted into mainline kernel. https://patchwork.kernel.org/project/linux-btrfs/patch/20200405082636.18016-2-kreijack@libero.it/ |
https://www.downloads.netgear.com/files/GPL/ReadyNASOS_V6.10.8_WW_src.zip The paths I looked at are:
I haven't looked at the full diff since the kernel is pretty old and much has changed, but basically it looks like it adds another sort function to sort the devices in |
This would be a fantastic addition to Btrfs. I'd like to emphasize the importance of being able to specify sub-volume affinity. Imagine having sub-volumes for /, /var/log, and /home. Here's the concept:
In this system, data from / is given the highest priority for storage space on tier 1, with a lower priority for /var/log and /home on the same tier. Similarly, data from /var/log is given the highest priority for storage space on tier 3, with a lower priority for /var/log and /home on the same tier. I imagine two parameters to implement this:
This level of control over data placement within sub-volumes would be a game-changer. It allows for finely tuned optimization of storage resources based on specific usage scenarios. It would further solidify Btrfs as a powerful and flexible file system for data management. |
@TheLinuxGuy , @studyfranco It might be worth for you to have a look at the Btrfs preferred metadata patches. kakra/linux#26 They do not explicitly deal in tiers, but they do introduce metadata-only, metadata-preferred, data-only and data-preferred priorities. |
Rebased to 6.6 LTS: kakra/linux#31 |
This is a very good begin. But, my use case (and my proposition) is more complex. |
Currently I'm solving it this way: I have two NVMe drives, each drive has a 64GB meta-data-preferred partiton for btrfs. The remaining space is md-raid1, then bcache backing partition put into it. All HDDs (4x 4TB) are data-preferred partitions formatted on bcache writeback backend partition and attached to the md-raid1 cache. This way, meta data is on native NVMe because bcache doesn't handle cow metadata very efficient, and I still get the benefits of having hot data on NVMe. I'm using these patches to exclude some IO traffic from being cached (e.g. backup or maintenance jobs with idle IO priority): kakra/linux#32 I achieve cache hit rate of 96% and bypass-hits of 95% (IO requests that should have bypassed caching but already have been in cache) for a 800 GB cache and 4.2TB used btrfs storage. Actually, combining bcache with preferred meta data worked magic: cache hit rates went up and response times went down a lot. Transfer rates peak around 2 GB/s which is slower than native NVMe but still very good. Average transfer rates are around 300-500 MB/s with data coming partially from cache and HDD. Migrating this setup from single-SSD to dual-NVMe improved perceived responsiveness a lot. Still, due to cow and btrfs-data-raid1, bcache cannot work optimally and wastes some space and performance. A better integration of both would be useful where bcache would know about btrfs-raid1 and store data just once, or cow would inform bcache about unused blocks. |
Im losing the space of the cache, which is a really big drawback for me, Im following this issue: kdave/btrfs-progs#610, maybe I'll switch to a single btrfs partition accross the two drives if they ever implement it so I dont lose the extra space from the SSD cache
I would like to add to this feature that it would also be a great idea to have tiered storage on a directory or file level. Meaning, making the "tiering" a property of a directory or file itself:
This could be done for both data and metadata. As proposed, we could even have different "tier level" defined for use cases like NVMe <-> SATA SSD <-> HDD. By making tiering a property of a file or directory, people could mark certain files they would always want to be accessable fast (e.g., without spin-up time) in a way that would make the filesystem store them on the fast cache ssds of a pool. This would be a cool way to decide which files are stored on the cache, as opposed to only being able to go by the last accessed data and keeping that in the cache. Usage case could be e.g. a homeserver where personal files, pictures etc. should always be available without delay, while large media files can be stored on slower rotational drives that take time to spin up. |
Could brtfs implement a feature to support multiple devices of different speed/types with a profiling algorithm for data balancing? In other words - dynamic storage tiering.
Assume a user with a combined brtfs filesystem with:
To keep things simple, assume no redundancy in each tier. The goal the user is looking for is to ensure the maximum performance and for the storage in the filesystem to be as optimized as it can be within some customizable settings (e.g: how much nvme space should be left "free" for writeback caching of new I/O).
As I was thinking brtfs-balance already does some of the filesystem optimization by balancing disk space utilization evenly across each disk. This feature is asking for more options to change how should brtfs-balance should work and how new I/O writes are handled so that 'tier 1' is always the priority.
Least used data blocks not recently accessed would be "downgraded" or moved down to a lower tier if the user hasn't accessed those data blocks and as the filesystem usage grows demanding some purging / rebalance.
The text was updated successfully, but these errors were encountered: