diff --git a/.github/workflows/build_repocard_examples.yaml b/.github/workflows/build_repocard_examples.yaml index feea0aa9f1..7cfcf37ea9 100644 --- a/.github/workflows/build_repocard_examples.yaml +++ b/.github/workflows/build_repocard_examples.yaml @@ -17,7 +17,7 @@ jobs: - name: Set up Python uses: actions/setup-python@v2 with: - python-version: 3.12 + python-version: 3.13 # Install dependencies - name: Configure and install dependencies diff --git a/.github/workflows/python-tests.yml b/.github/workflows/python-tests.yml index 25dc2c0583..8c3a5ca81c 100644 --- a/.github/workflows/python-tests.yml +++ b/.github/workflows/python-tests.yml @@ -21,15 +21,15 @@ jobs: strategy: fail-fast: false matrix: - python-version: ["3.8", "3.12"] + python-version: ["3.8", "3.13"] test_name: [ "Repository only", "Everything else", - "torch_latest", + ] include: - - python-version: "3.12" # LFS not ran on 3.8 + - python-version: "3.13" # LFS not ran on 3.8 test_name: "lfs" - python-version: "3.8" test_name: "fastai" @@ -41,6 +41,8 @@ jobs: test_name: "tensorflow" - python-version: "3.8" # test torch~=1.11 on python 3.8 only. test_name: "Python 3.8, torch_1.11" + - python-version: "3.12" # test torch latest on python 3.12 only. + test_name: "torch_latest" steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python-version }} diff --git a/docs/source/cn/installation.md b/docs/source/cn/installation.md index 2b3003e977..c800b4b173 100644 --- a/docs/source/cn/installation.md +++ b/docs/source/cn/installation.md @@ -97,7 +97,7 @@ cd huggingface_hub pip install -e . ``` -这些命令将你克隆存储库的文件夹与你的 Python 库路径链接起来。Python 现在将除了正常的库路径之外,还会在你克隆到的文件夹中查找。例如,如果你的 Python 包通常安装在`./.venv/lib/python3.12/site-packages/`中,Python 还会搜索你克隆的文件夹`./huggingface_hub/` +这些命令将你克隆存储库的文件夹与你的 Python 库路径链接起来。Python 现在将除了正常的库路径之外,还会在你克隆到的文件夹中查找。例如,如果你的 Python 包通常安装在`./.venv/lib/python3.13/site-packages/`中,Python 还会搜索你克隆的文件夹`./huggingface_hub/` ## 通过 conda 安装 diff --git a/docs/source/de/installation.md b/docs/source/de/installation.md index 8cb6ae3f31..3ba965bd4b 100644 --- a/docs/source/de/installation.md +++ b/docs/source/de/installation.md @@ -90,7 +90,7 @@ cd huggingface_hub pip install -e . ``` -Diese Befehle verknüpfen den Ordner, in den Sie das Repository geklont haben, mit Ihren Python-Bibliothekspfaden. Python wird nun zusätzlich zu den normalen Bibliothekspfaden im geklonten Ordner suchen. Wenn Ihre Python-Pakete normalerweise in `./.venv/lib/python3.12/site-packages/` installiert sind, wird Python auch den geklonten Ordner `./huggingface_hub/` durchsuchen. +Diese Befehle verknüpfen den Ordner, in den Sie das Repository geklont haben, mit Ihren Python-Bibliothekspfaden. Python wird nun zusätzlich zu den normalen Bibliothekspfaden im geklonten Ordner suchen. Wenn Ihre Python-Pakete normalerweise in `./.venv/lib/python3.13/site-packages/` installiert sind, wird Python auch den geklonten Ordner `./huggingface_hub/` durchsuchen. ## Installieren mit conda diff --git a/docs/source/en/guides/hf_file_system.md b/docs/source/en/guides/hf_file_system.md index d0b96a2bd4..92184838d8 100644 --- a/docs/source/en/guides/hf_file_system.md +++ b/docs/source/en/guides/hf_file_system.md @@ -6,6 +6,14 @@ rendered properly in your Markdown viewer. In addition to the [`HfApi`], the `huggingface_hub` library provides [`HfFileSystem`], a pythonic [fsspec-compatible](https://filesystem-spec.readthedocs.io/en/latest/) file interface to the Hugging Face Hub. The [`HfFileSystem`] builds on top of the [`HfApi`] and offers typical filesystem style operations like `cp`, `mv`, `ls`, `du`, `glob`, `get_file`, and `put_file`. + + + [`HfFileSystem`] provides fsspec compatibility, which is useful for libraries that require it (e.g., reading + Hugging Face datasets directly with `pandas`). However, it introduces additional overhead due to this compatibility + layer. For better performance and reliability, it's recommended to use [`HfApi`] methods when possible. + + + ## Usage ```python diff --git a/docs/source/en/installation.md b/docs/source/en/installation.md index 89ddaff0d9..9af8a32676 100644 --- a/docs/source/en/installation.md +++ b/docs/source/en/installation.md @@ -104,7 +104,7 @@ pip install -e . These commands will link the folder you cloned the repository to and your Python library paths. Python will now look inside the folder you cloned to in addition to the normal library paths. -For example, if your Python packages are typically installed in `./.venv/lib/python3.12/site-packages/`, +For example, if your Python packages are typically installed in `./.venv/lib/python3.13/site-packages/`, Python will also search the folder you cloned `./huggingface_hub/`. ## Install with conda diff --git a/docs/source/fr/installation.md b/docs/source/fr/installation.md index 219808f823..15fb47a9f8 100644 --- a/docs/source/fr/installation.md +++ b/docs/source/fr/installation.md @@ -104,7 +104,7 @@ pip install -e . Python regardera maintenant à l'intérieur du dossier dans lequel vous avez cloné le dépôt en plus des chemins de librairie classiques. Par exemple, si vos packages Python sont installés dans -`./.venv/lib/python3.12/site-packages/`, Python regardera aussi dans le dossier que vous avez +`./.venv/lib/python3.13/site-packages/`, Python regardera aussi dans le dossier que vous avez cloné `./huggingface_hub/`. ## Installation avec conda diff --git a/docs/source/hi/installation.md b/docs/source/hi/installation.md index 4587838950..4d16d6624b 100644 --- a/docs/source/hi/installation.md +++ b/docs/source/hi/installation.md @@ -103,7 +103,7 @@ pip install -e . ये कमांड उस फ़ोल्डर को लिंक करेंगे जिसे आपने रिपॉजिटरी में क्लोन किया है और आपके पायथन लाइब्रेरी पथ। पाइथॉन अब सामान्य लाइब्रेरी पथों के अलावा आपके द्वारा क्लोन किए गए फ़ोल्डर के अंदर भी देखेगा। -उदाहरण के लिए, यदि आपके पायथन पैकेज आमतौर पर `./.venv/lib/python3.12/site-packages/` में स्थापित हैं, +उदाहरण के लिए, यदि आपके पायथन पैकेज आमतौर पर `./.venv/lib/python3.13/site-packages/` में स्थापित हैं, पायथन आपके द्वारा क्लोन किए गए फ़ोल्डर `./huggingface_hub/` को भी खोजेगा। ## कोंडा के साथ स्थापित करें diff --git a/docs/source/ko/installation.md b/docs/source/ko/installation.md index b70808b3b7..720346b1a1 100644 --- a/docs/source/ko/installation.md +++ b/docs/source/ko/installation.md @@ -94,7 +94,7 @@ pip install -e . 이렇게 클론한 레포지토리 폴더와 Python 경로를 연결합니다. 이제 Python은 일반적인 라이브러리 경로 외에도 복제된 폴더 내부를 찾습니다. -예를 들어 파이썬 패키지가 일반적으로 `./.venv/lib/python3.12/site-packages/`에 설치되어 있다면, Python은 복제된 폴더 `./huggingface_hub/`도 검색하게 됩니다. +예를 들어 파이썬 패키지가 일반적으로 `./.venv/lib/python3.13/site-packages/`에 설치되어 있다면, Python은 복제된 폴더 `./huggingface_hub/`도 검색하게 됩니다. ## conda로 설치하기 [[install-with-conda]] diff --git a/setup.py b/setup.py index 33f6db4559..9d69ad35c4 100644 --- a/setup.py +++ b/setup.py @@ -134,6 +134,7 @@ def get_version() -> str: "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", "Topic :: Scientific/Engineering :: Artificial Intelligence", ], include_package_data=True, diff --git a/src/huggingface_hub/file_download.py b/src/huggingface_hub/file_download.py index 46d5ccaead..47bd055871 100644 --- a/src/huggingface_hub/file_download.py +++ b/src/huggingface_hub/file_download.py @@ -390,9 +390,8 @@ def http_get( consistency_error_message = ( f"Consistency check failed: file should be of size {expected_size} but has size" - f" {{actual_size}} ({displayed_filename}).\nWe are sorry for the inconvenience. Please retry" - " with `force_download=True`.\nIf the issue persists, please let us know by opening an issue " - "on https://github.com/huggingface/huggingface_hub." + f" {{actual_size}} ({displayed_filename}).\nThis is usually due to network issues while downloading the file." + " Please retry with `force_download=True`." ) # Stream file to buffer diff --git a/src/huggingface_hub/hf_api.py b/src/huggingface_hub/hf_api.py index 0b53fc877d..a02c6a0eba 100644 --- a/src/huggingface_hub/hf_api.py +++ b/src/huggingface_hub/hf_api.py @@ -760,6 +760,8 @@ class ModelInfo: List of spaces using the model. safetensors (`SafeTensorsInfo`, *optional*): Model's safetensors information. + security_repo_status (`Dict`, *optional*): + Model's security scan status. """ id: str @@ -788,6 +790,7 @@ class ModelInfo: siblings: Optional[List[RepoSibling]] spaces: Optional[List[str]] safetensors: Optional[SafeTensorsInfo] + security_repo_status: Optional[Dict] def __init__(self, **kwargs): self.id = kwargs.pop("id") @@ -853,7 +856,7 @@ def __init__(self, **kwargs): if safetensors else None ) - + self.security_repo_status = kwargs.pop("securityRepoStatus", None) # backwards compatibility self.lastModified = self.last_modified self.cardData = self.card_data @@ -1546,6 +1549,36 @@ def _inner(self, *args, **kwargs): class HfApi: + """ + Client to interact with the Hugging Face Hub via HTTP. + + The client is initialized with some high-level settings used in all requests + made to the Hub (HF endpoint, authentication, user agents...). Using the `HfApi` + client is preferred but not mandatory as all of its public methods are exposed + directly at the root of `huggingface_hub`. + + Args: + endpoint (`str`, *optional*): + Endpoint of the Hub. Defaults to . + token (Union[bool, str, None], optional): + A valid user access token (string). Defaults to the locally saved + token, which is the recommended method for authentication (see + https://huggingface.co/docs/huggingface_hub/quick-start#authentication). + To disable authentication, pass `False`. + library_name (`str`, *optional*): + The name of the library that is making the HTTP request. Will be added to + the user-agent header. Example: `"transformers"`. + library_version (`str`, *optional*): + The version of the library that is making the HTTP request. Will be added + to the user-agent header. Example: `"4.24.0"`. + user_agent (`str`, `dict`, *optional*): + The user agent info in the form of a dictionary or a single string. It will + be completed with information about the installed packages. + headers (`dict`, *optional*): + Additional headers to be sent with each request. Example: `{"X-My-Header": "value"}`. + Headers passed here are taking precedence over the default headers. + """ + def __init__( self, endpoint: Optional[str] = None, @@ -1555,32 +1588,6 @@ def __init__( user_agent: Union[Dict, str, None] = None, headers: Optional[Dict[str, str]] = None, ) -> None: - """Create a HF client to interact with the Hub via HTTP. - - The client is initialized with some high-level settings used in all requests - made to the Hub (HF endpoint, authentication, user agents...). Using the `HfApi` - client is preferred but not mandatory as all of its public methods are exposed - directly at the root of `huggingface_hub`. - - Args: - token (Union[bool, str, None], optional): - A valid user access token (string). Defaults to the locally saved - token, which is the recommended method for authentication (see - https://huggingface.co/docs/huggingface_hub/quick-start#authentication). - To disable authentication, pass `False`. - library_name (`str`, *optional*): - The name of the library that is making the HTTP request. Will be added to - the user-agent header. Example: `"transformers"`. - library_version (`str`, *optional*): - The version of the library that is making the HTTP request. Will be added - to the user-agent header. Example: `"4.24.0"`. - user_agent (`str`, `dict`, *optional*): - The user agent info in the form of a dictionary or a single string. It will - be completed with information about the installed packages. - headers (`dict`, *optional*): - Additional headers to be sent with each request. Example: `{"X-My-Header": "value"}`. - Headers passed here are taking precedence over the default headers. - """ self.endpoint = endpoint if endpoint is not None else constants.ENDPOINT self.token = token self.library_name = library_name @@ -1791,8 +1798,8 @@ def list_models( A tuple of two ints or floats representing a minimum and maximum carbon footprint to filter the resulting models with in grams. sort (`Literal["last_modified"]` or `str`, *optional*): - The key with which to sort the resulting models. Possible values - are the properties of the [`huggingface_hub.hf_api.ModelInfo`] class. + The key with which to sort the resulting models. Possible values are "last_modified", "trending_score", + "created_at", "downloads" and "likes". direction (`Literal[-1]` or `int`, *optional*): Direction in which to sort. The value `-1` sorts by descending order while all other values sort by ascending order. @@ -1904,7 +1911,15 @@ def list_models( if len(search_list) > 0: params["search"] = search_list if sort is not None: - params["sort"] = "lastModified" if sort == "last_modified" else sort + params["sort"] = ( + "lastModified" + if sort == "last_modified" + else "trendingScore" + if sort == "trending_score" + else "createdAt" + if sort == "created_at" + else sort + ) if direction is not None: params["direction"] = direction if limit is not None: @@ -2003,8 +2018,8 @@ def list_datasets( search (`str`, *optional*): A string that will be contained in the returned datasets. sort (`Literal["last_modified"]` or `str`, *optional*): - The key with which to sort the resulting datasets. Possible - values are the properties of the [`huggingface_hub.hf_api.DatasetInfo`] class. + The key with which to sort the resulting models. Possible values are "last_modified", "trending_score", + "created_at", "downloads" and "likes". direction (`Literal[-1]` or `int`, *optional*): Direction in which to sort. The value `-1` sorts by descending order while all other values sort by ascending order. @@ -2114,7 +2129,15 @@ def list_datasets( if len(search_list) > 0: params["search"] = search_list if sort is not None: - params["sort"] = "lastModified" if sort == "last_modified" else sort + params["sort"] = ( + "lastModified" + if sort == "last_modified" + else "trendingScore" + if sort == "trending_score" + else "createdAt" + if sort == "created_at" + else sort + ) if direction is not None: params["direction"] = direction if limit is not None: @@ -2186,8 +2209,8 @@ def list_spaces( linked (`bool`, *optional*): Whether to return Spaces that make use of either a model or a dataset. sort (`Literal["last_modified"]` or `str`, *optional*): - The key with which to sort the resulting Spaces. Possible - values are the properties of the [`huggingface_hub.hf_api.SpaceInfo`]` class. + The key with which to sort the resulting models. Possible values are "last_modified", "trending_score", + "created_at" and "likes". direction (`Literal[-1]` or `int`, *optional*): Direction in which to sort. The value `-1` sorts by descending order while all other values sort by ascending order. @@ -2223,7 +2246,15 @@ def list_spaces( if search is not None: params["search"] = search if sort is not None: - params["sort"] = "lastModified" if sort == "last_modified" else sort + params["sort"] = ( + "lastModified" + if sort == "last_modified" + else "trendingScore" + if sort == "trending_score" + else "createdAt" + if sort == "created_at" + else sort + ) if direction is not None: params["direction"] = direction if limit is not None: @@ -2493,7 +2524,7 @@ def model_info( Whether to set a timeout for the request to the Hub. securityStatus (`bool`, *optional*): Whether to retrieve the security status from the model - repository as well. + repository as well. The security status will be returned in the `security_repo_status` field. files_metadata (`bool`, *optional*): Whether or not to retrieve metadata for files in the repository (size, LFS metadata, etc). Defaults to `False`. @@ -9186,7 +9217,8 @@ def _prepare_upload_folder_additions( token=token, ) if len(filtered_repo_objects) > 30: - logger.info( + log = logger.warning if len(filtered_repo_objects) > 200 else logger.info + log( "It seems you are trying to upload a large folder at once. This might take some time and then fail if " "the folder is too large. For such cases, it is recommended to upload in smaller batches or to use " "`HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, " diff --git a/src/huggingface_hub/hf_file_system.py b/src/huggingface_hub/hf_file_system.py index a9fb009570..2e70a66a90 100644 --- a/src/huggingface_hub/hf_file_system.py +++ b/src/huggingface_hub/hf_file_system.py @@ -1,4 +1,3 @@ -import inspect import os import re import tempfile @@ -7,7 +6,7 @@ from datetime import datetime from itertools import chain from pathlib import Path -from typing import Any, Dict, List, NoReturn, Optional, Tuple, Union +from typing import Any, Dict, Iterator, List, NoReturn, Optional, Tuple, Union from urllib.parse import quote, unquote import fsspec @@ -20,11 +19,7 @@ from .errors import EntryNotFoundError, RepositoryNotFoundError, RevisionNotFoundError from .file_download import hf_hub_url, http_get from .hf_api import HfApi, LastCommitInfo, RepoFile -from .utils import ( - HFValidationError, - hf_raise_for_status, - http_backoff, -) +from .utils import HFValidationError, hf_raise_for_status, http_backoff # Regex used to match special revisions with "/" in them (see #1710) @@ -64,13 +59,22 @@ class HfFileSystem(fsspec.AbstractFileSystem): """ Access a remote Hugging Face Hub repository as if were a local file system. + + + [`HfFileSystem`] provides fsspec compatibility, which is useful for libraries that require it (e.g., reading + Hugging Face datasets directly with `pandas`). However, it introduces additional overhead due to this compatibility + layer. For better performance and reliability, it's recommended to use `HfApi` methods when possible. + + + Args: token (`str` or `bool`, *optional*): A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see https://huggingface.co/docs/huggingface_hub/quick-start#authentication). To disable authentication, pass `False`. - + endpoint (`str`, *optional*): + Endpoint of the Hub. Defaults to . Usage: ```python @@ -133,6 +137,25 @@ def _repo_and_revision_exist( return self._repo_and_revision_exists_cache[(repo_type, repo_id, revision)] def resolve_path(self, path: str, revision: Optional[str] = None) -> HfFileSystemResolvedPath: + """ + Resolve a Hugging Face file system path into its components. + + Args: + path (`str`): + Path to resolve. + revision (`str`, *optional*): + The revision of the repo to resolve. Defaults to the revision specified in the path. + + Returns: + [`HfFileSystemResolvedPath`]: Resolved path information containing `repo_type`, `repo_id`, `revision` and `path_in_repo`. + + Raises: + `ValueError`: + If path contains conflicting revision information. + `NotImplementedError`: + If trying to list repositories. + """ + def _align_revision_in_path_with_revision( revision_in_path: Optional[str], revision: Optional[str] ) -> Optional[str]: @@ -209,15 +232,33 @@ def _align_revision_in_path_with_revision( return HfFileSystemResolvedPath(repo_type, repo_id, revision, path_in_repo, _raw_revision=revision_in_path) def invalidate_cache(self, path: Optional[str] = None) -> None: + """ + Clear the cache for a given path. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.invalidate_cache). + + Args: + path (`str`, *optional*): + Path to clear from cache. If not provided, clear the entire cache. + + """ if not path: self.dircache.clear() self._repo_and_revision_exists_cache.clear() else: - path = self.resolve_path(path).unresolve() + resolved_path = self.resolve_path(path) + path = resolved_path.unresolve() while path: self.dircache.pop(path, None) path = self._parent(path) + # Only clear repo cache if path is to repo root + if not resolved_path.path_in_repo: + self._repo_and_revision_exists_cache.pop((resolved_path.repo_type, resolved_path.repo_id, None), None) + self._repo_and_revision_exists_cache.pop( + (resolved_path.repo_type, resolved_path.repo_id, resolved_path.revision), None + ) + def _open( self, path: str, @@ -254,6 +295,28 @@ def rm( revision: Optional[str] = None, **kwargs, ) -> None: + """ + Delete files from a repository. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.rm). + + + + Note: When possible, use `HfApi.delete_file()` for better performance. + + + + Args: + path (`str`): + Path to delete. + recursive (`bool`, *optional*): + If True, delete directory and all its contents. Defaults to False. + maxdepth (`int`, *optional*): + Maximum number of subdirectories to visit when deleting recursively. + revision (`str`, *optional*): + The git revision to delete from. + + """ resolved_path = self.resolve_path(path, revision=revision) paths = self.expand_path(path, recursive=recursive, maxdepth=maxdepth, revision=revision) paths_in_repo = [self.resolve_path(path).path_in_repo for path in paths if not self.isdir(path)] @@ -276,7 +339,32 @@ def rm( def ls( self, path: str, detail: bool = True, refresh: bool = False, revision: Optional[str] = None, **kwargs ) -> List[Union[str, Dict[str, Any]]]: - """List the contents of a directory.""" + """ + List the contents of a directory. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.ls). + + + + Note: When possible, use `HfApi.list_repo_tree()` for better performance. + + + + Args: + path (`str`): + Path to the directory. + detail (`bool`, *optional*): + If True, returns a list of dictionaries containing file information. If False, + returns a list of file paths. Defaults to True. + refresh (`bool`, *optional*): + If True, bypass the cache and fetch the latest data. Defaults to False. + revision (`str`, *optional*): + The git revision to list from. + + Returns: + `List[Union[str, Dict[str, Any]]]`: List of file paths (if detail=False) or list of file information + dictionaries (if detail=True). + """ resolved_path = self.resolve_path(path, revision=revision) path = resolved_path.unresolve() kwargs = {"expand_info": detail, **kwargs} @@ -396,13 +484,37 @@ def _ls_tree( out.append(cache_path_info) return out - def walk(self, path, *args, **kwargs): + def walk(self, path: str, *args, **kwargs) -> Iterator[Tuple[str, List[str], List[str]]]: + """ + Return all files below the given path. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.walk). + + Args: + path (`str`): + Root path to list files from. + + Returns: + `Iterator[Tuple[str, List[str], List[str]]]`: An iterator of (path, list of directory names, list of file names) tuples. + """ # Set expand_info=False by default to get a x10 speed boost kwargs = {"expand_info": kwargs.get("detail", False), **kwargs} path = self.resolve_path(path, revision=kwargs.get("revision")).unresolve() yield from super().walk(path, *args, **kwargs) - def glob(self, path, **kwargs): + def glob(self, path: str, **kwargs) -> List[str]: + """ + Find files by glob-matching. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.glob). + + Args: + path (`str`): + Path pattern to match. + + Returns: + `List[str]`: List of paths matching the pattern. + """ # Set expand_info=False by default to get a x10 speed boost kwargs = {"expand_info": kwargs.get("detail", False), **kwargs} path = self.resolve_path(path, revision=kwargs.get("revision")).unresolve() @@ -418,6 +530,28 @@ def find( revision: Optional[str] = None, **kwargs, ) -> Union[List[str], Dict[str, Dict[str, Any]]]: + """ + List all files below path. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.find). + + Args: + path (`str`): + Root path to list files from. + maxdepth (`int`, *optional*): + Maximum depth to descend into subdirectories. + withdirs (`bool`, *optional*): + Include directory paths in the output. Defaults to False. + detail (`bool`, *optional*): + If True, returns a dict mapping paths to file information. Defaults to False. + refresh (`bool`, *optional*): + If True, bypass the cache and fetch the latest data. Defaults to False. + revision (`str`, *optional*): + The git revision to list from. + + Returns: + `Union[List[str], Dict[str, Dict[str, Any]]]`: List of paths or dict of file information. + """ if maxdepth: return super().find( path, maxdepth=maxdepth, withdirs=withdirs, detail=detail, refresh=refresh, revision=revision, **kwargs @@ -448,6 +582,24 @@ def find( return {name: out[name] for name in names} def cp_file(self, path1: str, path2: str, revision: Optional[str] = None, **kwargs) -> None: + """ + Copy a file within or between repositories. + + + + Note: When possible, use `HfApi.upload_file()` for better performance. + + + + Args: + path1 (`str`): + Source path to copy from. + path2 (`str`): + Destination path to copy to. + revision (`str`, *optional*): + The git revision to copy from. + + """ resolved_path1 = self.resolve_path(path1, revision=revision) resolved_path2 = self.resolve_path(path2, revision=revision) @@ -489,10 +641,45 @@ def cp_file(self, path1: str, path2: str, revision: Optional[str] = None, **kwar self.invalidate_cache(path=resolved_path2.unresolve()) def modified(self, path: str, **kwargs) -> datetime: + """ + Get the last modified time of a file. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.modified). + + Args: + path (`str`): + Path to the file. + + Returns: + `datetime`: Last commit date of the file. + """ info = self.info(path, **kwargs) return info["last_commit"]["date"] def info(self, path: str, refresh: bool = False, revision: Optional[str] = None, **kwargs) -> Dict[str, Any]: + """ + Get information about a file or directory. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.info). + + + + Note: When possible, use `HfApi.get_paths_info()` or `HfApi.repo_info()` for better performance. + + + + Args: + path (`str`): + Path to get info for. + refresh (`bool`, *optional*): + If True, bypass the cache and fetch the latest data. Defaults to False. + revision (`str`, *optional*): + The git revision to get info from. + + Returns: + `Dict[str, Any]`: Dictionary containing file information (type, size, commit info, etc.). + + """ resolved_path = self.resolve_path(path, revision=revision) path = resolved_path.unresolve() expand_info = kwargs.get( @@ -570,30 +757,80 @@ def info(self, path: str, refresh: bool = False, revision: Optional[str] = None, return out def exists(self, path, **kwargs): - """Is there a file at the given path""" + """ + Check if a file exists. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.exists). + + + + Note: When possible, use `HfApi.file_exists()` for better performance. + + + + Args: + path (`str`): + Path to check. + + Returns: + `bool`: True if file exists, False otherwise. + """ try: + if kwargs.get("refresh", False): + self.invalidate_cache(path) + self.info(path, **{**kwargs, "expand_info": False}) return True except: # noqa: E722 - # any exception allowed bar FileNotFoundError? return False def isdir(self, path): - """Is this entry directory-like?""" + """ + Check if a path is a directory. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.isdir). + + Args: + path (`str`): + Path to check. + + Returns: + `bool`: True if path is a directory, False otherwise. + """ try: return self.info(path, expand_info=False)["type"] == "directory" except OSError: return False def isfile(self, path): - """Is this entry file-like?""" + """ + Check if a path is a file. + + For more details, refer to [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.isfile). + + Args: + path (`str`): + Path to check. + + Returns: + `bool`: True if path is a file, False otherwise. + """ try: return self.info(path, expand_info=False)["type"] == "file" except: # noqa: E722 return False def url(self, path: str) -> str: - """Get the HTTP URL of the given path""" + """ + Get the HTTP URL of the given path. + + Args: + path (`str`): + Path to get URL for. + + Returns: + `str`: HTTP URL to access the file or directory on the Hub. + """ resolved_path = self.resolve_path(path) url = hf_hub_url( resolved_path.repo_id, @@ -607,7 +844,26 @@ def url(self, path: str) -> str: return url def get_file(self, rpath, lpath, callback=_DEFAULT_CALLBACK, outfile=None, **kwargs) -> None: - """Copy single remote file to local.""" + """ + Copy single remote file to local. + + + + Note: When possible, use `HfApi.hf_hub_download()` for better performance. + + + + Args: + rpath (`str`): + Remote path to download from. + lpath (`str`): + Local path to download to. + callback (`Callback`, *optional*): + Optional callback to track download progress. Defaults to no callback. + outfile (`IO`, *optional*): + Optional file-like object to write to. If provided, `lpath` is ignored. + + """ revision = kwargs.get("revision") unhandled_kwargs = set(kwargs.keys()) - {"revision"} if not isinstance(callback, (NoOpCallback, TqdmCallback)) or len(unhandled_kwargs) > 0: @@ -882,20 +1138,3 @@ def _raise_file_not_found(path: str, err: Optional[Exception]) -> NoReturn: def reopen(fs: HfFileSystem, path: str, mode: str, block_size: int, cache_type: str): return fs.open(path, mode=mode, block_size=block_size, cache_type=cache_type) - - -# Add docstrings to the methods of HfFileSystem from fsspec.AbstractFileSystem -for name, function in inspect.getmembers(HfFileSystem, predicate=inspect.isfunction): - parent = getattr(fsspec.AbstractFileSystem, name, None) - if parent is not None and parent.__doc__ is not None: - parent_doc = parent.__doc__ - parent_doc = parent_doc.replace("Parameters\n ----------\n", "Args:\n") - parent_doc = parent_doc.replace("Returns\n -------\n", "Return:\n") - function.__doc__ = ( - ( - "\n_Docstring taken from " - f"[fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.{name})._" - ) - + "\n\n" - + parent_doc - ) diff --git a/src/huggingface_hub/inference/_client.py b/src/huggingface_hub/inference/_client.py index b4ac6f009a..4f100d03f1 100644 --- a/src/huggingface_hub/inference/_client.py +++ b/src/huggingface_hub/inference/_client.py @@ -589,7 +589,7 @@ def chat_completion( Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. max_tokens (`int`, *optional*): - Maximum number of tokens allowed in the response. Defaults to 20. + Maximum number of tokens allowed in the response. Defaults to 100. n (`int`, *optional*): UNUSED. presence_penalty (`float`, *optional*): @@ -2074,7 +2074,7 @@ def text_generation( grammar ([`TextGenerationInputGrammarType`], *optional*): Grammar constraints. Can be either a JSONSchema or a regex. max_new_tokens (`int`, *optional*): - Maximum number of generated tokens + Maximum number of generated tokens. Defaults to 100. repetition_penalty (`float`, *optional*): The parameter for repetition penalty. 1.0 means no penalty. See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details. diff --git a/src/huggingface_hub/inference/_generated/_async_client.py b/src/huggingface_hub/inference/_generated/_async_client.py index b2980f675c..c46f4c63c2 100644 --- a/src/huggingface_hub/inference/_generated/_async_client.py +++ b/src/huggingface_hub/inference/_generated/_async_client.py @@ -625,7 +625,7 @@ async def chat_completion( Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. max_tokens (`int`, *optional*): - Maximum number of tokens allowed in the response. Defaults to 20. + Maximum number of tokens allowed in the response. Defaults to 100. n (`int`, *optional*): UNUSED. presence_penalty (`float`, *optional*): @@ -2137,7 +2137,7 @@ async def text_generation( grammar ([`TextGenerationInputGrammarType`], *optional*): Grammar constraints. Can be either a JSONSchema or a regex. max_new_tokens (`int`, *optional*): - Maximum number of generated tokens + Maximum number of generated tokens. Defaults to 100. repetition_penalty (`float`, *optional*): The parameter for repetition penalty. 1.0 means no penalty. See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details. diff --git a/src/huggingface_hub/repocard_data.py b/src/huggingface_hub/repocard_data.py index fb2c6e8f96..9a07a8f29f 100644 --- a/src/huggingface_hub/repocard_data.py +++ b/src/huggingface_hub/repocard_data.py @@ -252,8 +252,8 @@ class ModelCardData(CardData): The identifier of the base model from which the model derives. This is applicable for example if your model is a fine-tune or adapter of an existing model. The value must be the ID of a model on the Hub (or a list of IDs if your model derives from multiple models). Defaults to None. - datasets (`List[str]`, *optional*): - List of datasets that were used to train this model. Should be a dataset ID + datasets (`Union[str, List[str]]`, *optional*): + Dataset or list of datasets that were used to train this model. Should be a dataset ID found on https://hf.co/datasets. Defaults to None. eval_results (`Union[List[EvalResult], EvalResult]`, *optional*): List of `huggingface_hub.EvalResult` that define evaluation results of the model. If provided, @@ -312,7 +312,7 @@ def __init__( self, *, base_model: Optional[Union[str, List[str]]] = None, - datasets: Optional[List[str]] = None, + datasets: Optional[Union[str, List[str]]] = None, eval_results: Optional[List[EvalResult]] = None, language: Optional[Union[str, List[str]]] = None, library_name: Optional[str] = None, diff --git a/tests/test_hf_api.py b/tests/test_hf_api.py index a2ba07b8c1..158a32ca30 100644 --- a/tests/test_hf_api.py +++ b/tests/test_hf_api.py @@ -1764,6 +1764,30 @@ def test_list_models_complex_query(self): assert isinstance(model, ModelInfo) assert all(tag in model.tags for tag in ["bert", "jax"]) + def test_list_models_sort_trending_score(self): + models = list(self._api.list_models(sort="trending_score", limit=10)) + assert len(models) == 10 + assert isinstance(models[0], ModelInfo) + assert all(model.trending_score is not None for model in models) + + def test_list_models_sort_created_at(self): + models = list(self._api.list_models(sort="created_at", limit=10)) + assert len(models) == 10 + assert isinstance(models[0], ModelInfo) + assert all(model.created_at is not None for model in models) + + def test_list_models_sort_downloads(self): + models = list(self._api.list_models(sort="downloads", limit=10)) + assert len(models) == 10 + assert isinstance(models[0], ModelInfo) + assert all(model.downloads is not None for model in models) + + def test_list_models_sort_likes(self): + models = list(self._api.list_models(sort="likes", limit=10)) + assert len(models) == 10 + assert isinstance(models[0], ModelInfo) + assert all(model.likes is not None for model in models) + def test_list_models_with_config(self): for model in self._api.list_models(filter=("adapter-transformers", "bert"), fetch_config=True, limit=20): self.assertIsNotNone(model.config) @@ -1832,21 +1856,16 @@ def test_model_info(self): self.assertIsInstance(model, ModelInfo) self.assertEqual(model.sha, DUMMY_MODEL_ID_REVISION_ONE_SPECIFIC_COMMIT) - # TODO; un-skip this test once it's fixed. - @unittest.skip( - "Security status is currently unreliable on the server endpoint, so this" - " test occasionally fails. Issue is tracked in" - " https://github.com/huggingface/huggingface_hub/issues/1002 and" - " https://github.com/huggingface/moon-landing/issues/3695. TODO: un-skip" - " this test once it's fixed." - ) def test_model_info_with_security(self): + # Note: this test might break in the future if `security_repo_status` object structure gets updated server-side + # (not yet fully stable) model = self._api.model_info( repo_id=DUMMY_MODEL_ID, revision=DUMMY_MODEL_ID_REVISION_ONE_SPECIFIC_COMMIT, securityStatus=True, ) - self.assertEqual(model.securityStatus, {"containsInfected": False}) + self.assertIsNotNone(model.security_repo_status) + self.assertEqual(model.security_repo_status, {"scansDone": True, "filesWithIssues": []}) def test_model_info_with_file_metadata(self): model = self._api.model_info( diff --git a/tests/test_hf_file_system.py b/tests/test_hf_file_system.py index ec4ffd5ba1..34094a0265 100644 --- a/tests/test_hf_file_system.py +++ b/tests/test_hf_file_system.py @@ -586,3 +586,20 @@ def test_access_repositories_lists(not_supported_path): fs.ls(not_supported_path) with pytest.raises(NotImplementedError): fs.open(not_supported_path) + + +def test_exists_after_repo_deletion(): + """Test that exists() correctly reflects repository deletion.""" + # Initialize with staging endpoint and skip cache + hffs = HfFileSystem(endpoint=ENDPOINT_STAGING, token=TOKEN, skip_instance_cache=True) + api = hffs._api + + # Create a new repo + temp_repo_id = repo_name() + repo_url = api.create_repo(temp_repo_id) + repo_id = repo_url.repo_id + assert hffs.exists(repo_id, refresh=True) + # Delete the repo + api.delete_repo(repo_id=repo_id, repo_type="model") + # Verify that the repo no longer exists. + assert not hffs.exists(repo_id, refresh=True)