Skip to content

Commit

Permalink
Document enumeration path map in the spec. (#5203)
Browse files Browse the repository at this point in the history
[SC-51428](https://app.shortcut.com/tiledb-inc/story/51428/enumeration-path-map-does-not-exist-in-the-array-schema-format-spec)

I noticed that the array schema format specification does not include
the enumeration name-path map introduced in #4051. This PR updates the
documentation.

I used the term "enumeration filename" to describe the string written
after the enumeration name because [it is just the file's
name](https://github.com/TileDB-Inc/TileDB/blob/78ac1d2ec338fd468eb63481e85049215908e39f/tiledb/sm/array/array_directory.cc#L1324-L1326),
and updated previous usages of "enumeration pathname" or "enumeration
URI" in code.

---
TYPE: NO_HISTORY
DESC: Added documentation for the enumeration path map in array scehmas,
present since format version 20.
  • Loading branch information
teo-tsirpanis authored Jul 25, 2024
1 parent ce52063 commit 378cae1
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 11 deletions.
9 changes: 9 additions & 0 deletions format_spec/array_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@ The array schema file consists of a single [generic tile](./generic_tile.md), wi
| Label 1 | [Dimension Label](#dimension_label) | First dimension label |
||||
| Label N | [Dimension Label](#dimension_label) | Nth dimension label |
| Num enumerations | `uint32_t` | Number of [enumerations](./enumeration.md) in the array |
| Enumeration name length 1 | `uint32_t` | The number of characters in the enumeration 1 name |
| Enumeration name 1 | `uint8_t[]` | The name of enumeration 1 |
| Enumeration filename length 1 | `uint32_t` | The number of characters in the enumeration 1 file |
| Enumeration filename 1 | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration 1's data |
| Enumeration name length N | `uint32_t` | The number of characters in the enumeration N name |
| Enumeration name N | `uint8_t[]` | The name of enumeration N |
| Enumeration filename length N | `uint32_t` | The number of characters in the enumeration N file |
| Enumeration filename N | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration N's data |
| CurrentDomain | [CurrentDomain](./current_domain.md) | The array current domain |

## Domain
Expand Down
30 changes: 20 additions & 10 deletions tiledb/sm/array_schema/array_schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,8 @@ ArraySchema::ArraySchema(
dim_map_[dim->name()] = dim;
}

for (auto& [enmr_name, enmr_uri] : enumeration_path_map_) {
(void)enmr_uri;
for (auto& [enmr_name, enmr_filename] : enumeration_path_map_) {
(void)enmr_filename;
enumeration_map_[enmr_name] = nullptr;
}

Expand Down Expand Up @@ -753,6 +753,16 @@ bool ArraySchema::is_nullable(const std::string& name) const {
// dimension_label #1
// dimension_label #2
// ...
// enumeration_num (uint32_t)
// enumeration_name_length #1 (uint32_t)
// enumeration_name_chars #1 (string)
// enumeration_filename_length #1 (uint32_t)
// enumeration_filename_chars #1 (string)
// enumeration_name_length #2 (uint32_t)
// enumeration_name_chars #2 (string)
// enumeration_filename_length #2 (uint32_t)
// enumeration_filename_chars #2 (string)
// ...
// current_domain
void ArraySchema::serialize(Serializer& serializer) const {
// Write version, which is always the current version. Despite
Expand Down Expand Up @@ -812,14 +822,14 @@ void ArraySchema::serialize(Serializer& serializer) const {
utils::safe_integral_cast<size_t, uint32_t>(enumeration_map_.size());

serializer.write<uint32_t>(enmr_num);
for (auto& [enmr_name, enmr_uri] : enumeration_path_map_) {
for (auto& [enmr_name, enmr_filename] : enumeration_path_map_) {
auto enmr_name_size = static_cast<uint32_t>(enmr_name.size());
serializer.write<uint32_t>(enmr_name_size);
serializer.write(enmr_name.data(), enmr_name_size);

auto enmr_uri_size = static_cast<uint32_t>(enmr_uri.size());
serializer.write<uint32_t>(enmr_uri_size);
serializer.write(enmr_uri.data(), enmr_uri_size);
auto enmr_filename_size = static_cast<uint32_t>(enmr_filename.size());
serializer.write<uint32_t>(enmr_filename_size);
serializer.write(enmr_filename.data(), enmr_filename_size);
}

// Serialize array current domain information
Expand Down Expand Up @@ -1367,11 +1377,11 @@ shared_ptr<ArraySchema> ArraySchema::deserialize(
std::string enmr_name(
deserializer.get_ptr<char>(enmr_name_size), enmr_name_size);

auto enmr_path_size = deserializer.read<uint32_t>();
std::string enmr_path_name(
deserializer.get_ptr<char>(enmr_path_size), enmr_path_size);
auto enmr_filename_size = deserializer.read<uint32_t>();
std::string enmr_filename(
deserializer.get_ptr<char>(enmr_filename_size), enmr_filename_size);

enumeration_path_map[enmr_name] = enmr_path_name;
enumeration_path_map[enmr_name] = enmr_filename;
}
}

Expand Down
2 changes: 1 addition & 1 deletion tiledb/sm/array_schema/array_schema.h
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ class ArraySchema {
tdb::pmr::unordered_map<std::string, shared_ptr<const Enumeration>>
enumeration_map_;

/** A map of Enumeration names to Enumeration URIs */
/** A map of Enumeration names to Enumeration filenames */
tdb::pmr::unordered_map<std::string, std::string> enumeration_path_map_;

/** The filter pipeline run on offset tiles for var-length attributes. */
Expand Down

0 comments on commit 378cae1

Please sign in to comment.