-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ConfigNamesError with the underlying error #3010
Comments
Current state:
|
I think some of them are not valid anymore. Refreshing: DatasetWithScriptNotSupportedErrordb.cachedResponsesBlue.aggregate([ { $match: { kind: "dataset-config-names", error_code: "ConfigNamesError", "details.copied_from_artifact": { $exists: false }, "details.cause_exception": "DatasetWithScriptNotSupportedError", }, }, { $group: { _id: "aaa", datasets: { $addToSet: "$dataset" } }, }, ]); { _id: 'aaa', datasets: [ 'flyingfishinwater/ultrachat-uncensored', 'qgyd2021/few_shot_ner_sft', 'aghent/copiapoa-cactis', 'SatwikKambham/suim', 'TenzinKhorloVIT/custom_subtitle_en_fr', 'quaeast/multimodal_sarcasm_detection', 'TrieuNguyen/llm-ai-detect', 'minnnnn/test_11_07_8', 'McGill-NLP/feedbackQA', 'saisamarth/librispeech_asr_devclean', 'Wayne017/floor_plan', 'guaixiaomei/bdcc', 'sanak/IDD', 'ginger-turmeric/LibriSpeech', 'indonlp/nusatranslation_mt', 'inductiva/fluid_cube', 'Kiaset/conll2003', 'PAD6/modified', 'bigbio/bc5cdr', 'thiyaneshnlp/po_layoutlm', 'Birchlabs/openai-guided-diffusion-256-classcond-unguided-samples-50k', 'pidakwo/disease_CoNLL', 'jon-tow/okapi_truthfulqa', 'qgyd2021/sentence_pair', 'xinjianl/yodas_sample', 'MMInstruction/VLFeedback', 'pie/argmicro', 'clarin-knext/kpwr-long', 'PAD6/original', 'Tevatron/wikipedia-squad', 'NLPC-UOM/Sinhala-POS-Data', 'lawallanre/geo_nlp_tweets', 'stvhuang/cc100', 'mstz/medieval_latin', 'Graphcore/gqa', 'katie312/fundus', 'OfekGlick/DiscoEval', 'Yash2998db/stan_small', 'bible-nlp/biblenlp-corpus', 'georgetowngeronimo/ops2', 'dvruette/truthful_qa_rephrased', 'paclove/test', 'minnnnn/test_11_07_6', 'Levi980623/Tennis', 'ugursahin/generative-negative-mining-dataset', 'WeiJie422/MYFood175', 'mcemilg/tquad', 'pranjalipathre/i2i', 'adithya7/background-summaries', 'cutterd/alphaXv4_pre_0', 'St4n/self_dataset', 'sabilmakbar/sea_wiki', 'pie/squad_v2', 'Yulong-W/eoym', 'aakanksha/udpos', 'yunusskeete/cppe5', 'malteos/m_mmlu', 'KBLab/rixvox', 'Levi980623/Ros_term2', 'kaniam/invoice', 'castorini/msmarco_v1_doc_segmented_doc2query-t5_expansions', 'mstz/victorian_authorship', 'sieu-n/alpaca_eval_multilingual', 'difraud/difraud', 'VirtualRoyalty/QC', 'paniniDot/sci_lay', 'Ahmed-ibn-Harun/wake', 'lmqg/qag_zhquad', 'tankaplans/WorkshopsEval', 'VirtualRoyalty/toxic_comments', 'aisuko/funsd-layoutlmv3', '2030NLP/SpaCE2021', 'lmqg/qg_zhquad', 'jinaai/negation-dataset-v2', 'pie/conll2003', 'Tobius/teric_asr_lab', 'dwadden/covidfact_entailment', 'damilola2104/OpenSlrData', 'qgyd2021/spam_detect', 'Davlan/nollysenti', 'pie/abstrct', 'hobeter/JJQA', 'malteos/m_truthfulqa', 'aghent/Aerial-Semantic-Segmentation-Cactis', 'audioshake/jam-alt', 'WeiJie422/MYFood50', 'ginger-turmeric/LibriLight', 'hendrycks/competition_math', 'PrincipledPreTraining/DiscoEval', 'SinKove/synthetic_chest_xray', 'cdminix/bu_radio', 'Graphcore/vqa', 'owanr/llm-artifacts-collection', 'juched/spotifinders-dataset', 'ms3c/swahili-common-voices-africas-talking', 'gugaio/notas-fiscais', 'ebrigham/agnewsadapted', 'princeton-nlp/LLMBar', 'pranjalipathre/flow_data', 'pie/tacred', 'tomrb/minipileoflaw', 'VirtualRoyalty/20ng', 'qpzz/function_test', '2030NLP/SpaCE2022', 'mHossain/test3', 'wisenut-nlp-team/squad_kor_v1', 'jon-tow/okapi_arc_challenge', 'tdklab/Hebrew_Squad_v1', 'GadidGC/tesis', 'minnnnn/test_11_07_7', 'echarlaix/vqa', 'minnnnn/hwj_image', 'universalner/universal_ner', 'mulcyber/europarl-mono', 'yunusskeete/Carla-COCO-Object-Detection-Dataset', 'BAAI/CCI-Data', 'hoffman-lab/SkyScenes', 'victorcosta/ria_pt__proems_format', 'ABC-iRobotics/SMVB', 'Graphcore/vqa-lxmert', 'research-backup/qa_squadshifts_synthetic_random', 'malteos/m_hellaswag', 'zilu-peter-tang/MultiPL-C2C', 'IconicAI/janet-24oct', 'pie/scientific_papers', 'lhoestq/custom_squad', 'lipi17/building-cracks', 'BahAdoR0101/conll2003job', 'XiaHan19/cmmlu', 'textminr/simplebooks', 'jon-tow/okapi_hellaswag', 'ppp121386/Image-demo', 'NamCyan/repo-codegen-v3', 'TenzinKhorloVIT/MT_Eng_Dzo', 'hjm0525/drawings', 'PierreLepagnol/WRENCH', 'CoderBose/white-labelling', 'stas/openwebtext-synthetic-testing', 'St4n/dataset_2', 'traveler-leon1/my_dataset', 'iohadrubin/zstd', 'lawallanre/hausa_common_voice_audio', 'Alvenir/alvenir_asr_da_eval', 'minnnnn/test_11_07_9', 'minnnnn/test_11_08_1', 'minnnnn/test_11_08_2', 'MZOO/STANFORD-CARS-TYPES', 'alexsu52/mvtec_capsule', 'EleutherAI/proof-pile-2', 'tocard-inc/aiornot', 'pensieves/mimicause', 'pie/imdb', 'jxu124/OpenX-Embodiment', 'epptt/erukaLabels', 'qbo-odp/MNBVC-core', 'pie/brat', 'mathiaszinnen/odor', 'jimregan/eatd_corpus', 'pie/scidtb_argmin', 'Graphcore/gqa-lxmert', 'gugaio/dokki-pagamentos', 'kundank/usb', 'sweis/script_executing', 'lijianbin/test_script', 'sled-umich/2D-ATOMS', 'timo-pierre-schrader/MuLMS', 'VirtualRoyalty/SST5', 'RepoFusion/Stack-Repo', 'buihungtpd3/custom-longformer', 'sieu-n/chunked', 'emon-j/bn_audio', 'andstor/code_search_net_files', 'castorini/wura', 'sabilmakbar/indo_wiki', 'lukemann/baby-agi-dataset-v1', 'iohadrubin/dolma_cc', 'fsicoli/common_voice_15_0', 'cutterd/alphaXv5_noindex_manu', 'HugsVision/SkinDisease', 'cjaniake/squad_v2_pt', 'sieu-n/processed-newstext', 'paren8esis/S4A', 'echarlaix/vqa-lxmert', 'minnnnn/test', 'minnnnn/test_11_07_5', 'codefuse-ai/CodeFuseEval', 'trueorfalse441/korean_hate_speech_copy', 'cryptom/rm-static', 'gmnlp/dialect_nli', 'cutterd/handtest', 'VirtualRoyalty/20ng_not_enough_data', 'malteos/m_arc', 'Linsss/X11k', 'aisuko/vqa', 'Yulong-W/eoyo', 'dwadden/healthver_entailment', 'jon-tow/okapi_mmlu', 'MDCurrent/fill500', 'ngoxuanphong/zalo', 'mosaicml/test_dataset', 'biglam/artigo', 'mHossain/pos_tag_100k', 'Lagyamfi/akan_audio', 'iahlt/UD_Hebrew-IAHLTwiki', 'EiffL/DESI', 'ncats/EpiSet4BinaryClassification', 'chiseng-cheang/TempoSum', 'Chrisneverdie/sports_llm', 'paclove/test2', 'aburnazy/hy_asr_grqaser', 'tmnam20/InstructNSText2SQL', 'IconicAI/janet-textclassification-10k', 'allenai/nllb', 'Birchlabs/wds-dataset-viewer-test', 'tasksource/imppres', 'AILab-CVC/SEED-Bench-2', 'semeru/completeformer', 'flyingfishinwater/samantha-data', 'Srajanseth84/mydatasetlayoutlmv3', 'DanielCerda/pid-object-detection', 'WuWenc/tiny_coco', 'Dupaja/cmu-arctic-xvectors', 'Alkemi/xd-violence-small', 'fsicoli/common_voice_16_0', 'echarlaix/gqa-lxmert', 'Ehzoahis/UAVVG', 'Ubaid000/layoutlmv3', 'katie312/isic' ] } after that, we can remove this line in moonlanding: https://github.com/huggingface-internal/moon-landing/blob/7015dffe5a99dcf718ae5a8a176b689d403af6df/server/views/components/DatasetViewer/DatasetViewerError.svelte#L26 |
Also: the Currently: While it should be FileNotFoundErrordb.cachedResponsesBlue.aggregate([ { $match: { kind: "dataset-config-names", error_code: "ConfigNamesError", "details.copied_from_artifact": { $exists: false }, "details.cause_exception": "FileNotFoundError", }, }, { $group: { _id: "aaa", datasets: { $addToSet: "$dataset" } }, }, ]); ... edit: after recomputing we still have some of these errors, but they are legit: it's when the README.md YAML configures some data files, but they don't exist in the repo. See huggingface/datasets#7120 to try to handle them a bit better. |
OK, we went from 10,608 entries with New entries:
|
100K cache entries (1%) have the
ConfigNamesError
. It would be better to show the underlying error, and help the user debug their data files.The text was updated successfully, but these errors were encountered: