Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove first rows fallback variable #771

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions chart/templates/worker/first-rows/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@
# value: {{ .Values.queue.maxJobsPerNamespace | quote }}
# overridden
value: {{ .Values.firstRows.queue.maxJobsPerNamespace | quote }}
- name: FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE
value: {{ .Values.firstRows.fallbackMaxDatasetSize | quote }}
severo marked this conversation as resolved.
Show resolved Hide resolved
- name: FIRST_ROWS_MAX_BYTES
value: {{ .Values.firstRows.maxBytes | quote }}
- name: FIRST_ROWS_MAX_NUMBER
Expand Down
2 changes: 0 additions & 2 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -282,8 +282,6 @@ splits:
tolerations: []

firstRows:
# Max size (in bytes) of the dataset to fallback in normal mode if streaming fails
fallbackMaxDatasetSize: "100_000_000"
# Max size of the /first-rows endpoint response in bytes
maxBytes: "1_000_000"
# Max number of rows in the /first-rows endpoint response
Expand Down
1 change: 0 additions & 1 deletion tools/docker-compose-datasets-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ services:
ASSETS_BASE_URL: "http://localhost:${PORT_REVERSE_PROXY-8000}/assets" # hard-coded to work with the reverse-proxy
ASSETS_STORAGE_DIRECTORY: ${ASSETS_STORAGE_DIRECTORY-/assets}
DATASETS_BASED_ENDPOINT: "/first-rows" # hard-coded
FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE: ${FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE-100_000_000}
FIRST_ROWS_MAX_BYTES: ${FIRST_ROWS_MAX_BYTES-1_000_000}
FIRST_ROWS_MAX_NUMBER: ${FIRST_ROWS_MAX_NUMBER-100}
FIRST_ROWS_MIN_CELL_BYTES: ${FIRST_ROWS_MIN_CELL_BYTES-100}
Expand Down
1 change: 0 additions & 1 deletion tools/docker-compose-dev-datasets-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ services:
ASSETS_BASE_URL: "http://localhost:${PORT_REVERSE_PROXY-8000}/assets" # hard-coded to work with the reverse-proxy
ASSETS_STORAGE_DIRECTORY: ${ASSETS_STORAGE_DIRECTORY-/assets}
DATASETS_BASED_ENDPOINT: "/first-rows" # hard-coded
FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE: ${FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE-100_000_000}
FIRST_ROWS_MAX_BYTES: ${FIRST_ROWS_MAX_BYTES-1_000_000}
FIRST_ROWS_MAX_NUMBER: ${FIRST_ROWS_MAX_NUMBER-100}
FIRST_ROWS_MIN_CELL_BYTES: ${FIRST_ROWS_MIN_CELL_BYTES-100}
Expand Down
1 change: 0 additions & 1 deletion workers/datasets_based/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ Only needed when the `DATASETS_BASED_ENDPOINT` is set to `/first-rows`.

Set environment variables to configure the first rows worker (`FIRST_ROWS_` prefix):

- `FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE`: the maximum size in bytes of the dataset to fall back into normal mode if streaming fails. Note that it requires to have the size in the info metadata. Set to `0` to disable the fallback. Defaults to `100_000_000`.
- `FIRST_ROWS_MAX_BYTES`: the max size of the /first-rows endpoint response in bytes. Defaults to `1_000_000` (1 MB).
- `FIRST_ROWS_MAX_NUMBER`: the max number of rows fetched by the worker for the split and provided in the /first-rows endpoint response. Defaults to `100`.
- `FIRST_ROWS_MIN_CELL_BYTES`: the minimum size in bytes of a cell when truncating the content of a row (see `FIRST_ROWS_ROWS_MAX_BYTES`). Below this limit, the cell content will not be truncated. Defaults to `100`.
Expand Down
5 changes: 0 additions & 5 deletions workers/datasets_based/src/datasets_based/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ def from_env() -> "DatasetsBasedConfig":
)


FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE = 100_000_000
FIRST_ROWS_MAX_BYTES = 1_000_000
FIRST_ROWS_MAX_NUMBER = 100
FIRST_ROWS_CELL_MIN_BYTES = 100
Expand All @@ -82,7 +81,6 @@ def from_env() -> "DatasetsBasedConfig":
@dataclass
class FirstRowsConfig:
assets: AssetsConfig = field(default_factory=AssetsConfig)
fallback_max_dataset_size: int = FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE
max_bytes: int = FIRST_ROWS_MAX_BYTES
max_number: int = FIRST_ROWS_MAX_NUMBER
min_cell_bytes: int = FIRST_ROWS_CELL_MIN_BYTES
Expand All @@ -95,9 +93,6 @@ def from_env() -> "FirstRowsConfig":
with env.prefixed("FIRST_ROWS_"):
return FirstRowsConfig(
assets=AssetsConfig.from_env(),
fallback_max_dataset_size=env.int(
name="FALLBACK_MAX_DATASET_SIZE", default=FIRST_ROWS_FALLBACK_MAX_DATASET_SIZE
),
max_bytes=env.int(name="MAX_BYTES", default=FIRST_ROWS_MAX_BYTES),
max_number=env.int(name="MAX_NUMBER", default=FIRST_ROWS_MAX_NUMBER),
min_cell_bytes=env.int(name="CELL_MIN_BYTES", default=FIRST_ROWS_CELL_MIN_BYTES),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -399,12 +399,12 @@ def compute_first_rows_response(
assets_base_url: str,
hf_token: Optional[str],
min_cell_bytes: int,
max_size_fallback: Optional[int],
rows_max_bytes: int,
rows_max_number: int,
rows_min_number: int,
columns_max_number: int,
assets_directory: str,
max_size_fallback: Optional[int] = None,
) -> FirstRowsResponse:
"""
Get the response of /first-rows for one specific split of a dataset from huggingface.co.
Expand Down Expand Up @@ -635,7 +635,6 @@ def compute(self) -> Mapping[str, Any]:
assets_directory=self.first_rows_config.assets.storage_directory,
hf_token=self.common_config.hf_token,
min_cell_bytes=self.first_rows_config.min_cell_bytes,
max_size_fallback=self.first_rows_config.fallback_max_dataset_size,
rows_max_bytes=self.first_rows_config.max_bytes,
rows_max_number=self.first_rows_config.max_number,
rows_min_number=self.first_rows_config.min_number,
Expand Down