Skip to content

Commit

Permalink
test hfh 0.14.0rc1
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Apr 20, 2023
1 parent 61db0e9 commit 1f14984
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:
push:
branches:
- main
- test-hfh*

env:
HF_ALLOW_CODE_EVAL: 1
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
"aiohttp",
# To get datasets from the Datasets Hub on huggingface.co
# minimum 0.11.0 to fix 400 Client Error issues
"huggingface-hub>=0.11.0,<1.0.0",
"huggingface-hub==0.14.0rc1",
# Utilities from PyPA to e.g., compare versions
"packaging",
"responses<0.19",
Expand Down

1 comment on commit 1f14984

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010334 / 0.011353 (-0.001019) 0.005958 / 0.011008 (-0.005050) 0.127502 / 0.038508 (0.088994) 0.036299 / 0.023109 (0.013190) 0.395785 / 0.275898 (0.119886) 0.460024 / 0.323480 (0.136544) 0.007050 / 0.007986 (-0.000935) 0.005783 / 0.004328 (0.001455) 0.105632 / 0.004250 (0.101381) 0.049607 / 0.037052 (0.012555) 0.390023 / 0.258489 (0.131534) 0.414208 / 0.293841 (0.120367) 0.060353 / 0.128546 (-0.068193) 0.020875 / 0.075646 (-0.054771) 0.398444 / 0.419271 (-0.020828) 0.071925 / 0.043533 (0.028392) 0.391147 / 0.255139 (0.136008) 0.454145 / 0.283200 (0.170945) 0.109770 / 0.141683 (-0.031913) 1.737779 / 1.452155 (0.285625) 1.835993 / 1.492716 (0.343277)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.223097 / 0.018006 (0.205090) 0.529207 / 0.000490 (0.528718) 0.001352 / 0.000200 (0.001152) 0.000089 / 0.000054 (0.000034)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.027955 / 0.037411 (-0.009457) 0.109901 / 0.014526 (0.095375) 0.126335 / 0.176557 (-0.050222) 0.187344 / 0.737135 (-0.549792) 0.143598 / 0.296338 (-0.152740)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.623301 / 0.215209 (0.408092) 6.336905 / 2.077655 (4.259250) 2.548465 / 1.504120 (1.044345) 2.170441 / 1.541195 (0.629246) 2.110952 / 1.468490 (0.642462) 1.271414 / 4.584777 (-3.313363) 5.685118 / 3.745712 (1.939406) 5.374938 / 5.269862 (0.105077) 2.857651 / 4.565676 (-1.708025) 0.146474 / 0.424275 (-0.277801) 0.014919 / 0.007607 (0.007312) 0.828478 / 0.226044 (0.602434) 8.133186 / 2.268929 (5.864257) 3.336462 / 55.444624 (-52.108163) 2.524244 / 6.876477 (-4.352232) 2.800331 / 2.142072 (0.658259) 1.572788 / 4.805227 (-3.232439) 0.261509 / 6.500664 (-6.239155) 0.079122 / 0.075469 (0.003653)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.476502 / 1.841788 (-0.365286) 17.128093 / 8.074308 (9.053785) 20.372090 / 10.191392 (10.180698) 0.228434 / 0.680424 (-0.451990) 0.026843 / 0.534201 (-0.507358) 0.542825 / 0.579283 (-0.036458) 0.579578 / 0.434364 (0.145214) 0.609844 / 0.540337 (0.069507) 0.694888 / 1.386936 (-0.692048)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010086 / 0.011353 (-0.001267) 0.010924 / 0.011008 (-0.000084) 0.104770 / 0.038508 (0.066262) 0.033243 / 0.023109 (0.010134) 0.406048 / 0.275898 (0.130150) 0.432430 / 0.323480 (0.108950) 0.006811 / 0.007986 (-0.001175) 0.005349 / 0.004328 (0.001020) 0.095184 / 0.004250 (0.090934) 0.047914 / 0.037052 (0.010862) 0.402696 / 0.258489 (0.144207) 0.492651 / 0.293841 (0.198810) 0.061042 / 0.128546 (-0.067505) 0.020957 / 0.075646 (-0.054690) 0.112539 / 0.419271 (-0.306733) 0.061641 / 0.043533 (0.018109) 0.408216 / 0.255139 (0.153077) 0.418754 / 0.283200 (0.135555) 0.112075 / 0.141683 (-0.029608) 1.780634 / 1.452155 (0.328479) 1.833961 / 1.492716 (0.341244)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.208279 / 0.018006 (0.190272) 0.514219 / 0.000490 (0.513730) 0.004101 / 0.000200 (0.003901) 0.000124 / 0.000054 (0.000069)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.030960 / 0.037411 (-0.006451) 0.118320 / 0.014526 (0.103794) 0.133590 / 0.176557 (-0.042967) 0.199758 / 0.737135 (-0.537377) 0.139861 / 0.296338 (-0.156478)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.656481 / 0.215209 (0.441272) 6.903833 / 2.077655 (4.826178) 2.669184 / 1.504120 (1.165064) 2.330539 / 1.541195 (0.789345) 2.365052 / 1.468490 (0.896562) 1.424056 / 4.584777 (-3.160721) 5.637999 / 3.745712 (1.892287) 3.270924 / 5.269862 (-1.998937) 2.194340 / 4.565676 (-2.371336) 0.164634 / 0.424275 (-0.259641) 0.016026 / 0.007607 (0.008419) 0.874829 / 0.226044 (0.648784) 8.384851 / 2.268929 (6.115922) 3.522429 / 55.444624 (-51.922195) 2.785885 / 6.876477 (-4.090591) 2.890712 / 2.142072 (0.748639) 1.553063 / 4.805227 (-3.252165) 0.257095 / 6.500664 (-6.243569) 0.082227 / 0.075469 (0.006758)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.621715 / 1.841788 (-0.220073) 17.541377 / 8.074308 (9.467069) 22.427379 / 10.191392 (12.235987) 0.208823 / 0.680424 (-0.471601) 0.031811 / 0.534201 (-0.502390) 0.538810 / 0.579283 (-0.040473) 0.582618 / 0.434364 (0.148255) 0.614860 / 0.540337 (0.074523) 0.706609 / 1.386936 (-0.680327)

Please sign in to comment.