Skip to content

Commit

Permalink
Deploy preview for PR 121 🛫
Browse files Browse the repository at this point in the history
  • Loading branch information
abarciauskas-bgse committed Dec 20, 2024
1 parent bcc7271 commit d542faf
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion pr-preview/pr-121/cloud-optimized-netcdf4-hdf5/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ <h2 class="anchored" data-anchor-id="consolidated-internal-file-metadata">Consol
<p>HDF5 file organization—data, metadata, and free space—depends on the file space management strategy. Details on these strategies are in <a href="https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/FileSpaceManagement.html">HDF Support: File Space Management</a>.</p>
<p>Here are a few additional considerations for understanding and implementing the <code>H5F_FSPACE_STRATEGY_PAGE</code> strategy:</p>
<ul>
<li><strong>Chunks vs.&nbsp;Pages:</strong> In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also <a href="https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html">Chunking in HDF5</a>). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. A chunk does not have to fit within a single page, however misalignment leads to chunks spanning multiple pages, which increases read latency. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.</li>
<li><strong>Chunks vs.&nbsp;Pages:</strong> In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also <a href="https://support.hdfgroup.org/documentation/hdf5-docs/advanced_topics/chunking_in_hdf5.html">Chunking in HDF5</a>). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.</li>
<li><strong>Page Size Considerations:</strong> The page size applies to both metadata and raw data. Therefore, the chosen page size should strike a balance: it must consolidate metadata efficiently while minimizing unused space in raw data chunks. Excess unused space can significantly increase file size. File size is typically not a concern for I/O performance when accessing parts of a file. However, increased file size can become a concern for storage costs.</li>
</ul>
</div>
Expand Down
2 changes: 1 addition & 1 deletion pr-preview/pr-121/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -1373,7 +1373,7 @@
"href": "cloud-optimized-netcdf4-hdf5/index.html#consolidated-internal-file-metadata",
"title": "Cloud-Optimized HDF/NetCDF",
"section": "Consolidated Internal File Metadata",
"text": "Consolidated Internal File Metadata\nConsolidated metadata is a key characteristic of cloud-optimized data and enables “lazy loading” (see the Lazy Loading block below). Client libraries use file metadata to understand what’s in the file and where it is stored. When metadata is scattered across a file (which is the default for HDF5 writing), client applications have to make multiple requests for metadata information.\nFor HDF5 files, to consolidate metadata, files should be written with the paged aggregation file space management strategy (see also H5F_FSPACE_STRATEGY_PAGE). When using this strategy, HDF5 will write data in pages where metadata is separated from raw data chunks. Note the page size should also be set, as the default size is 4096 bytes (or 4KB, source). Further, only files using paged aggregation can use the HDF5 page buffer cache – a low-level library cache (Jelenak 2022) – to reduce subsequent data access.\n\n\n\n\n\n\nLazy loading\n\n\n\nLazy loading is a common term for first loading only metadata, and deferring reading of data values until required by computation.\n\n\n\n\n\n\n\n\nHDF5 File Space Management Strategies\n\n\n\nHDF5 file organization—data, metadata, and free space—depends on the file space management strategy. Details on these strategies are in HDF Support: File Space Management.\nHere are a few additional considerations for understanding and implementing the H5F_FSPACE_STRATEGY_PAGE strategy:\n\nChunks vs. Pages: In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also Chunking in HDF5). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. A chunk does not have to fit within a single page, however misalignment leads to chunks spanning multiple pages, which increases read latency. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.\nPage Size Considerations: The page size applies to both metadata and raw data. Therefore, the chosen page size should strike a balance: it must consolidate metadata efficiently while minimizing unused space in raw data chunks. Excess unused space can significantly increase file size. File size is typically not a concern for I/O performance when accessing parts of a file. However, increased file size can become a concern for storage costs.",
"text": "Consolidated Internal File Metadata\nConsolidated metadata is a key characteristic of cloud-optimized data and enables “lazy loading” (see the Lazy Loading block below). Client libraries use file metadata to understand what’s in the file and where it is stored. When metadata is scattered across a file (which is the default for HDF5 writing), client applications have to make multiple requests for metadata information.\nFor HDF5 files, to consolidate metadata, files should be written with the paged aggregation file space management strategy (see also H5F_FSPACE_STRATEGY_PAGE). When using this strategy, HDF5 will write data in pages where metadata is separated from raw data chunks. Note the page size should also be set, as the default size is 4096 bytes (or 4KB, source). Further, only files using paged aggregation can use the HDF5 page buffer cache – a low-level library cache (Jelenak 2022) – to reduce subsequent data access.\n\n\n\n\n\n\nLazy loading\n\n\n\nLazy loading is a common term for first loading only metadata, and deferring reading of data values until required by computation.\n\n\n\n\n\n\n\n\nHDF5 File Space Management Strategies\n\n\n\nHDF5 file organization—data, metadata, and free space—depends on the file space management strategy. Details on these strategies are in HDF Support: File Space Management.\nHere are a few additional considerations for understanding and implementing the H5F_FSPACE_STRATEGY_PAGE strategy:\n\nChunks vs. Pages: In HDF5, datasets can be chunked, meaning the dataset is divided into smaller blocks of data that can be individually compressed (see also Chunking in HDF5). Pages, on the other hand, represent the smallest unit HDF5 uses for reading and writing data. To optimize performance, chunk sizes should ideally align with the page size or be a multiple thereof. Entire pages are read into memory when accessing chunks or metadata. Only the relevant data (e.g., a specific chunk) is decompressed.\nPage Size Considerations: The page size applies to both metadata and raw data. Therefore, the chosen page size should strike a balance: it must consolidate metadata efficiently while minimizing unused space in raw data chunks. Excess unused space can significantly increase file size. File size is typically not a concern for I/O performance when accessing parts of a file. However, increased file size can become a concern for storage costs.",
"crumbs": [
"Formats",
"Cloud-Optimized HDF/NetCDF",
Expand Down

0 comments on commit d542faf

Please sign in to comment.