-
Notifications
You must be signed in to change notification settings - Fork 3
Meeting Notes
Use page table to accelerate metadata lookup for small tables.
Filesystem metadata can be stored in NVM storage, then metadata lookup does not need to access the disk.
Benchmarking time-series databases.
Time-series database benchmarking: SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things
Google's in memory time-series database: Monarch: Google’s Planet-Scale In-Memory Time Series Database
F2FS on ZNS
Track the location of files on ZNS device. (Custom printk to log file locations etc.)
File classifications:
- Static: Initial write (Warm)
- Dynamic: After GC (if not updated, cold)
- Use hints(fcontorl)
Q: What if a bad classification is chosen?
With RocksDB:
- What hints are used, at what level the hints are passed?
- Classification of DB files.
- Is other info passed down, such as flags?
Q: What is the classification before BG GC and after BG GC?
Q: What does the layout look like? where do the different files end up?
Q: Overall goal to answer: When does the passing of better hints at file creation time benefit the performance
Q: What other application can be used to test the files ystem's performance?
Influx DB: https://cs.ulb.ac.be/public/_media/teaching/influxdb_2017.pdf. Store arrays in rows.
ClickHouse DB has better performance.
Facebook 2015: Hbase
Use B+ tree? Where to build the tree?
Deploy the databases mentioned above and run benchmarks.
(TODO: Find the paper) Use tree structure to find a file.
User-space file systems are easier to implement and debug. So try
ufs exposes a set of calls to user space, and a large par of it is implemented in user space, such as page tables. A possible solution is to expose kernel API to user space.
Implement fs in user space as much as possible?
TODO: Make a slide in two weeks, one or two page.
File allocation for RocksDB: Check wiki page for detailed description.
There are 6 lifetime hints, only 3 are used: short, medium, extreme for hot, warm and cold.
SST structures ends in different segments(cold or warm segment). Check what and where happens(In rocks DB or f2fs?).
Hints:
- Hotness classfication
- Lifecycle
Test SPDK performance on ZNS, use callback for trace.
Use BPF: Check the wiki doc for more details.
The paper ingest, 30MB/sec. What cause the performance issues. (Ask for the paper).
sw/hw problem, use a in-memory DB. Is it a hardware or software problem. What's the bottleneck.
rerun the benchmark, couple of GB is to expect.
(TODO for Z: read the paper)
Paper: (ask later)
Page table? Most papers are software based.
Hardware optimization:
Pay attention to general class of data structres, caching, prefechting
Most use b+ trees. (https://tolia.org/files/pubs/fast2011.pdf)
B tree for metadata lookup.
Plan: Extend the list of papers.
for the next week meeting 1pm present an overview
SPDK and ioruing. io co-processer. IO compeletion on a single core.
Benchmark spdk and iouring.
Compare ZNS and normal nvme device.
(250k in depth 4)
OPT1: One job, qdepth 1-8
the ZNS device has a normal 4gb part, do benchmarking on that part?
libZBC: want to show spdk deliver better perfromance
Problem: Does it deliever better latency?
better latency for single read, write, append?
Latency of each operation qd = 1, jobsize - 1 4kb.
Increase #jobs with qd = 1.
Single job mutiple qd, MQ deadline, use append(no spdk)
(SPDK can not do multiple writes buffering?)
Expe1: 1 reader loc 0 and 1 reader loc 1/2 exp2: 1 job, qd = 2, control where they are reading from.
(SPDK is a single process library)
libzbc/libzbd ->
fio use iouring for zns interaction -> write to wiki
if a single open for each classification. W multilple files ->
two zone w performace
how many concurrents.
how to assign how many zones open at the same time.
design the mechanism rather than the policy.
TODO: reconsile the performance single and two zone writes.
Fast fs metadata lookup(check the literature study)
More general questions:
- What kind of PM fs exists?
What to improve(total 3, ask later):
TLB page table
Metadata,
paper to read:
CTFS, FAST 2022
paper: Storing ?? data
timescale DB and kafka
Checkpoints failed sometimes, check why.
When checking point, IO ops before much be finished.
SPDK and iouring
seq zone and ramdon zones. 4k lba, db not scale. (try to reduce padding?)
assume append with improve performance. redo exercise on new devices.
try to write intro; why spdk, why zfs, why ?
backgru . zns, zfs, what diff, one page. incl fig etc
spdk:
CTFS:
Following: read more papers. Ten next week. usenix, sigmod etc
write a list of conferences
key papers in detail.
btr DB: special tree structure ot index time-series data
4 nodes, 500mb, hard disk,
Read more papers
Write the discussed section
MMU for translation. SCMFS
Logging structure,
High level design of mdata lookup
ddl: Begging of DEC? Last week in 26, NOV?
Structure: this Friday
in mem, on disk and some benchmarking, stream processing
key problems, how they addressed.
wed 4.30
fio instruction counter
6 types. (hot cold warm/ data )
Multiple for each part, more than one streams(zones) for each zone
round robin first(scheduling)
fs virtulazation with DPU(PCIE)
FUSE on DPU
now: NFS on DPU
- NFS maps easliy on fuse
- cloud, use NFS a lot
host to guest VM
ZNS device, KV store in zns
triple db, use space with spdk
Larger block, higher through output.
disable compaction?
draft for intor, background.
Find more papers on stream processing.
two-- more hot data setcion, round robin by block, implemented.
round robin with number, no good results.
512 qd 4k, 4g file
spearation of the fs design,
hashtable logging strucuters, kernel base
some papers about hashing structures
next week: finsh deisgn section, more papers, about metadata, atomic
how to ensure mmap in kernel
paper about: failure atomic
hotstorage 2022 bye block and hello byte? (Check the exact name later)
Implement
latency test: host -> dpu 187 us dpu-nfs 133 us
git rep: micro-arch benchmark
search for pci latency
Design Guidelines for High Performance RDMA Systems ATC16
Test lower frame work, how many write are happening
something are wrong
without buffering,
may be inefficient buffering
set up of bpf and spdk
realated work