Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ci: Update ruff and cruft template #291

Merged
merged 4 commits into from
Dec 26, 2024

Conversation

KennethEnevoldsen
Copy link
Collaborator

@KennethEnevoldsen KennethEnevoldsen commented Oct 30, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a GitHub Actions workflow for static type checks on the codebase.
    • Enhanced model downloading functionality with additional DaCy models and improved error handling.
  • Bug Fixes

    • Improved warning messages in the DaCy loading process to guide users on model management.
  • Documentation

    • Updated documentation for scoring functions to clarify parameters and expected types.
    • Streamlined type annotations across various files for improved clarity and consistency.
  • Chores

    • Reorganized import statements in several test files for better clarity.
    • Updated dependency version for ruff in project configuration.

Copy link

coderabbitai bot commented Oct 30, 2024

Walkthrough

The changes in this pull request primarily involve updates to configuration files, the introduction of a new GitHub Actions workflow for static type checks, and modifications to existing workflows. The .cruft.json file has an updated commit hash, while the Makefile has changes to the linting commands. The pyproject.toml file reflects a specific version for the ruff dependency and reorganizes linting configurations. Additionally, several Python source files have been enhanced for model management and scoring functionality, and various test files have reordered import statements without altering their functionality.

Changes

File Change Summary
.cruft.json Updated commit hash from e02068889310225ea4f65ea0b203c2949e1597a9 to 85413085032f305896da8bad287a83d53fb0b196.
.github/workflows/static_type_checks.yml New workflow added for running static type checks on the codebase.
.github/workflows/tests.yml Workflow name changed from "Tests" to "test" and job name changed from "pytest" to "test"; comments updated.
Makefile Lint target commands modified to use ruff check instead of ruff.
pyproject.toml Updated ruff dependency version to ==0.7.1, added tool.ruff configuration with exclude and target-version, and reorganized linting sections.
src/dacy/download.py Expanded models_url dictionary with additional DaCy model URLs and updated download_model function for better error handling.
src/dacy/load.py Added warning message in where_is_my_dacy function when verbose is True.
src/dacy/score/input_length.py Updated n_sents_score function's docstring and default score_fn value to include "dep".
src/dacy/score/score.py Modified score function to use get method for accessing scores, added type hints for parameters.
tests/test_datasets.py Reordered import statements; no functional changes.
tests/test_download.py Reordered import statements; no functional changes.
tests/test_hate_speech.py Activated dacy import; no functional changes.
tests/test_score.py Reordered import statements; no functional changes.
tests/test_sentiment.py Reordered import statements; no functional changes.

Poem

In the burrow deep, where changes bloom,
A workflow new dispels the gloom.
With ruff in hand, we check and play,
Our code refined, come what may.
From models grand to tests that cheer,
The rabbit's work is ever near! 🐇✨

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (11)
.github/workflows/static_type_checks.yml (3)

1-7: Documentation looks good but could be more specific.

The comments clearly state the purpose and setup requirements. However, consider adding more details about the specific type checker being used and any configuration files it depends on.

 # THIS ACTION WILL:
   # 1. Install dependencies
   # 2. Run static type checker
+  #    (using mypy/pyright configured in pyproject.toml)

 # SETUP:
   # None required except for the Makefile
+  # See pyproject.toml for type checker configuration

Line range hint 16-41: Consider enhancing the job configuration for better efficiency and security.

While the basic setup is functional, there are several opportunities for improvement:

  1. Pin the action versions more specifically for security:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v3.5.3
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v4.7.1
  1. Add hash verification for dependencies:
       - name: Install dependencies
         shell: bash
         run: |
+          pip install --require-hashes -r requirements.txt
           make install
  1. Add timeout to prevent hanging jobs:
   static_type_checks:
     runs-on: ubuntu-latest
+    timeout-minutes: 10
  1. Consider adding error annotations:
       - name: Run static type checker
         shell: bash
         run: |
           make static-type-check
+        continue-on-error: true
+      - name: Upload type check results
+        if: always()
+        uses: actions/upload-artifact@v3
+        with:
+          name: typecheck-results
+          path: .mypy_cache/

Static type checks should be run against Python 3.9, 3.10, 3.11, and 3.12

The PyPI badge shows that this package officially supports Python 3.9, 3.10, 3.11, and 3.12. Additionally, the test workflow already runs against multiple Python versions (3.9 and 3.10). The static type checks should be expanded to cover all supported Python versions for consistency.

     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: ["3.9"]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
🔗 Analysis chain

Line range hint 19-23: Consider testing against multiple Python versions.

The workflow only tests against Python 3.9. If the project supports multiple Python versions, consider expanding the matrix.

Let's check the supported Python versions:

     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: ["3.9"]
+        python-version: ["3.9", "3.10", "3.11"]
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check pyproject.toml for Python version requirements
rg "python_requires|python_version" pyproject.toml

Length of output: 167


Script:

#!/bin/bash
# Let's check for Python version requirements in multiple potential locations
echo "=== Checking setup.py ==="
fd "setup.py" --exec cat {}

echo -e "\n=== Checking requirements.txt ==="
fd "requirements.txt" --exec cat {}

echo -e "\n=== Checking README.md for Python version info ==="
fd "README.md" --exec grep -i "python" {}

echo -e "\n=== Checking tox.ini for test environments ==="
fd "tox.ini" --exec cat {}

echo -e "\n=== Checking .python-version or .tool-versions ==="
fd "^\.python-version$|^\.tool-versions$" --exec cat {}

Length of output: 2521


Script:

#!/bin/bash
# Let's check the PyPI badge URL to see supported Python versions
echo "=== Checking PyPI badge info ==="
curl -s "https://img.shields.io/pypi/pyversions/dacy" || true

# Let's also check any CI configurations that might indicate Python version support
echo -e "\n=== Checking all GitHub workflow files ==="
fd "\.ya?ml$" .github/workflows/ --exec cat {}

Length of output: 7783

src/dacy/score/input_length.py (3)

Line range hint 12-19: Document the new "dep" scoring option

The score_fn parameter's default value now includes "dep", but this option isn't documented in the docstring's description of valid score_fn values.

Update the docstring to include "dep" in the list of valid options:

         score_fn (List[Union[str, Callable]], optional): A scoring function which takes
             in a list of examples and return a dictionary of the form {"score_name":
             score}. Four potiential strings are valid. "ents" for measuring the
             performance of entity spans. "pos" for measuring the performance of
             pos-tags. "token" for measuring the performance of tokenization. "nlp" for
-            measuring the performance of all components in the specified nlp pipeline.
+            measuring the performance of all components in the specified nlp pipeline.
+            "dep" for measuring the performance of dependency parsing.

Line range hint 57-58: Consider improving type safety in pandas operations

The pandas concatenation operation and return value are marked with # type: ignore. Consider adding proper type annotations:

  1. Define a type variable for the DataFrame structure
  2. Add proper return type annotation for the pandas DataFrame

Example improvement:

from typing import TypeVar
import pandas as pd

DataFrameType = TypeVar('DataFrameType', bound=pd.DataFrame)

def n_sents_score(...) -> DataFrameType:

Type ignore comments indicate real typing issues that need attention

The codebase shows systematic use of # type: ignore comments across scoring functionality, particularly in handling complex types involving Callables, Iterables, and pandas operations. Key observations:

  • No mypy configuration is present, but project uses pyright for type checking
  • Multiple instances of type ignores in score-related functions suggest structural typing issues
  • Common patterns involve:
    • Complex function signatures with Callable types
    • Pandas DataFrame operations
    • Dynamic attribute access with getattr
    • Type coercion in data processing

Recommendations:

  • Replace Union[int, List[int]] with TypeAlias or typing.Sequence
  • Use proper type annotations for pandas operations
  • Consider Protocol classes for Callable parameters
  • Add proper return type hints instead of ignoring them
🔗 Analysis chain

Line range hint 12-19: Consider addressing type ignore comments

There are multiple # type: ignore comments in the function signature. This might indicate underlying typing issues that should be resolved.

Let's check for similar patterns in other files:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for type ignore comments in similar scoring functions
rg -l "# type: ignore" "src/dacy/score/"

# Check if mypy is configured in pyproject.toml
cat pyproject.toml | grep -A 10 "\[tool.mypy\]"

Length of output: 145


Script:

#!/bin/bash
# Let's examine the actual type ignore comments and their context
rg -A 2 "# type: ignore" "src/dacy/score/"

# Let's also check if there's a mypy configuration file
fd "mypy.ini|\.mypy.ini"

# And check the content of pyproject.toml for any typing-related configurations
cat pyproject.toml

Length of output: 8259

src/dacy/download.py (4)

Line range hint 18-31: LGTM! Consider documenting the None values.

The models_url dictionary is well-structured with specific commit hashes for reproducibility. The None values for "small", "medium", "large" appear to serve as aliases for the latest versions.

Consider adding a comment explaining the purpose of the None values for clarity:

     "da_dacy_small_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_small_trf/resolve/0eadea074d5f637e76357c46bbd56451471d0154/da_dacy_small_trf-any-py3-none-any.whl",
     "da_dacy_medium_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_medium_trf/resolve/e7dba91f855a1d26679dc1ef3aa49f7874b50543/da_dacy_medium_trf-any-py3-none-any.whl",
     "da_dacy_large_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_large_trf/resolve/963232f378190476503a1bfc35b520cb142e9e41/da_dacy_large_trf-any-py3-none-any.whl",
+    # Aliases for latest versions
     "small": None,
     "medium": None,
     "large": None,

Line range hint 35-50: Enhance type safety and error handling.

The function could benefit from:

  1. Proper typing instead of # type: ignore comments
  2. Error handling for cases when no versions are found

Consider this improved implementation:

 def get_latest_version(model: str) -> str:
     if model in {"small", "medium", "large"}:
         model = f"da_dacy_{model}_trf"
     versions = [mdl.split("-")[-1] for mdl in models_url if mdl.startswith(model)]
+    if not versions:
+        raise ValueError(f"No versions found for model: {model}")
     versions = sorted(
         versions,
-        key=lambda s: [int(u) for u in s.split(".")],  # type: ignore
+        key=lambda s: tuple(int(u) for u in s.split(".")),
         reverse=True,
     )
-    return versions[0]  # type: ignore
+    return versions[0]

Line range hint 67-72: Replace noqa with proper typing.

The noqa comment can be replaced with proper type hints.

 class DownloadProgressBar(tqdm):
-    def update_to(self, b: int = 1, bsize: int = 1, tsize=None) -> None:  # noqa
+    def update_to(self, b: int = 1, bsize: int = 1, tsize: int | None = None) -> None:
         if tsize is not None:
             self.total = tsize
         self.update(b * bsize - self.n)

Line range hint 102-137: Improve type safety and error messages.

The function could benefit from:

  1. Proper typing instead of # type: ignore comments
  2. More specific error messages listing available models

Consider these improvements:

 def download_model(
     model: str,
     force: bool = False,
 ) -> str:
     if model in {"small", "medium", "large"}:
         latest_version = get_latest_version(model)
         model = f"da_dacy_{model}_trf-{latest_version}"
-    mdl_version = model.split("-")[-1]  # type: ignore
+    mdl_version = model.split("-")[-1]

     if model not in models_url:
+        available_models = "\n".join(f"- {m}" for m in sorted(models_url.keys()))
         raise ValueError(
-            f"The model '{model}' is not available in DaCy. Please use dacy.models() to see a"
-            + " list of all models",
+            f"The model '{model}' is not available in DaCy. Available models:\n{available_models}",
         )

-    mdl = model.split("-")[0]  # type: ignore
+    mdl = model.split("-")[0]
src/dacy/score/score.py (1)

Line range hint 38-179: Consider refactoring for improved maintainability.

The score function is quite complex with multiple nested functions. Consider:

  1. Moving the scorer functions (ents_scorer, pos_scorer, etc.) to module level
  2. Extracting the core scoring logic from __score into a separate function
  3. Addressing the # type: ignore comments with proper typing

This would improve maintainability and make the code easier to test.

Would you like me to provide a detailed refactoring proposal?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 57fcb61 and 81ae227.

📒 Files selected for processing (14)
  • .cruft.json (1 hunks)
  • .github/workflows/static_type_checks.yml (1 hunks)
  • .github/workflows/tests.yml (1 hunks)
  • makefile (1 hunks)
  • pyproject.toml (3 hunks)
  • src/dacy/download.py (1 hunks)
  • src/dacy/load.py (1 hunks)
  • src/dacy/score/input_length.py (1 hunks)
  • src/dacy/score/score.py (2 hunks)
  • tests/test_datasets.py (1 hunks)
  • tests/test_download.py (1 hunks)
  • tests/test_hate_speech.py (1 hunks)
  • tests/test_score.py (1 hunks)
  • tests/test_sentiment.py (1 hunks)
✅ Files skipped from review due to trivial changes (6)
  • .cruft.json
  • .github/workflows/tests.yml
  • tests/test_datasets.py
  • tests/test_download.py
  • tests/test_score.py
  • tests/test_sentiment.py
🔇 Additional comments (12)
tests/test_hate_speech.py (2)

3-4: LGTM! The dacy import is required for the test functions.

The import is necessary as the test functions use dacy components for hate speech detection and classification.


3-4: Verify test environment dependencies.

Since this import was previously commented out, let's ensure the test environment is properly configured.

✅ Verification successful

Dependencies are properly configured for dacy

The verification shows that:

  • dacy (version >=2.6.0) is listed in the requirements
  • dacy components are properly registered in pyproject.toml
  • The test file actively uses dacy's hate speech detection and classification components
  • The import is necessary and actively used in the tests
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify dacy dependency is properly specified in project configuration

# Check if dacy is listed in project dependencies
echo "Checking for dacy in pyproject.toml..."
rg "dacy.*=" pyproject.toml

# Check if dacy is listed in test requirements
echo "Checking for dacy in test requirements..."
fd -t f "requirements.*txt" -x rg "dacy"

# Check if dacy is mentioned in CI configuration
echo "Checking CI configuration for test environment setup..."
fd -t f "^\.github/workflows/.*yml$" -x rg "dacy"

Length of output: 1283


Script:

#!/bin/bash
# Let's check the CI workflow files including hidden ones
fd -H -t f "^\.github/workflows/.*yml$" -x rg "dacy"

# Let's also check the test file to understand how dacy is being used
cat tests/test_hate_speech.py

Length of output: 766

.github/workflows/static_type_checks.yml (2)

Line range hint 9-15: LGTM! Workflow triggers are well configured.

The workflow is properly set up to run on both pull requests and pushes to main, which is a good practice for catching type issues early.


Line range hint 32-36: Verify Makefile installation command.

The workflow relies on make install. Let's ensure this command exists and installs all necessary dependencies for type checking.

makefile (1)

12-14: LGTM! Verify ruff version compatibility.

The changes correctly update the ruff commands to use the explicit check subcommand, which is consistent with ruff's newer versions.

Let's verify the ruff version specification in pyproject.toml:

✅ Verification successful

Ruff version compatibility confirmed

The pyproject.toml specifies ruff==0.7.1 in the dev dependencies, which is exactly the version where the explicit check subcommand became standard. The Makefile changes are correctly aligned with this version requirement.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify ruff version compatibility with the new command syntax
# Expect: Version should be >=0.7.1 where 'check' command became standard

rg -l 'ruff.*=.*".*"' pyproject.toml

Length of output: 38


Script:

#!/bin/bash
# Let's try different approaches to find ruff version specification
# 1. Look for any mention of ruff in pyproject.toml
cat pyproject.toml

# 2. As fallback, search for any files that might contain ruff configuration
fd -t f "pyproject.toml|requirements.*|setup.*|poetry.lock"

Length of output: 5194

src/dacy/load.py (2)

2-2: LGTM: Clean import addition

The warnings import is properly placed and necessary for the new warning functionality.


Line range hint 41-57: Several improvements suggested for the warning implementation

  1. The # type: ignore comment suggests unresolved type issues. Consider fixing the type annotations instead of ignoring them.
  2. Consider using a more specific warning category (e.g., DeprecationWarning or UserWarning) instead of the default one.
  3. The warning message could be more readable using f-strings.
  4. The version information could be more explicit.

Here's a suggested improvement:

-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]:  # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
     """Returns a path to where DaCy models are located. The default the model
     location can be configured with the environmental variable
     `DACY_CACHE_DIR`.

     Args:
         verbose (bool, optional): Toggles the verbosity of the function. Defaults to
             True.

     Returns:
-        str: path to the location of DaCy models
+        Path: path to the location of DaCy models

     Example:
         >>> import dacy
         >>> dacy.where_is_my_dacy()
     """
     if verbose is True:
-        warnings.warn(  # noqa: B028
-            "DaCy pipelines above and including version 0.1.0 are installed as a "
-            + "python module and are thus located in your python environment under "
-            + "the respective names. To get a list of installed models use "
-            + "spacy.util.get_installed_models()",
+        warnings.warn(
+            f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+            f"To find installed models, use spacy.util.get_installed_models()",
+            UserWarning,
         )
     return DEFAULT_CACHE_DIR

Let's verify the type annotation issue:

✅ Verification successful

Based on the results, I can now generate the final response:

Type annotation can be simplified and made more accurate

The # type: ignore comment is unnecessary as we can see that:

  1. DEFAULT_CACHE_DIR is defined in src/dacy/download.py and is derived from DACY_DEFAULT_PATH which is explicitly a Path object (Path.home() / ".dacy")
  2. The function always returns DEFAULT_CACHE_DIR which is a Path object
  3. Path is properly imported in the module

The other suggestions about warning implementation are valid improvements for code readability and maintainability.

-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]:  # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
     if verbose:
         warnings.warn(
-            "DaCy pipelines above and including version 0.1.0 are installed as a "
-            + "python module and are thus located in your python environment under "
-            + "the respective names. To get a list of installed models use "
-            + "spacy.util.get_installed_models()",  # noqa: B028
+            f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+            f"To find installed models, use spacy.util.get_installed_models()",
+            UserWarning,
         )
     return DEFAULT_CACHE_DIR
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if DEFAULT_CACHE_DIR is defined as Path elsewhere
rg -A 1 "DEFAULT_CACHE_DIR\s*=.*Path" 

Length of output: 39


Script:

#!/bin/bash
# Let's check for DEFAULT_CACHE_DIR definition and its usage
rg "DEFAULT_CACHE_DIR" -B 2 -A 2

# Also check for Path imports and usage
rg "from pathlib import Path" -B 2 -A 2

# And check the actual return value usage
ast-grep --pattern 'where_is_my_dacy($$$)'

Length of output: 4869

src/dacy/score/input_length.py (1)

3-3: LGTM: Type hint imports are properly added

The new typing imports align well with the type hints used in the function signature and annotations.

src/dacy/download.py (1)

Line range hint 1-8: LGTM! Well-structured code organization.

The file is well-organized with clear sections, proper imports, and consistent type hints.

pyproject.toml (2)

51-51: LGTM! Good practice to pin ruff version.

Pinning to exact version 0.7.1 ensures consistent linting behavior across all environments.


196-201: LGTM! Well-structured rule configurations.

The configurations for flake8-annotations and mccabe follow best practices:

  • Appropriate handling of type annotations with mypy integration
  • Standard McCabe complexity threshold of 10
src/dacy/score/score.py (1)

Line range hint 38-76: LGTM! Well-documented function signature.

The function signature is properly typed and the documentation is comprehensive, clearly explaining all parameters and their expected types.

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
src/dacy/score/score.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_utils.py (1)

1-1: Consider importing Iterable from collections.abc instead.

Ruff suggests migrating away from typing.Iterable to collections.abc.Iterable to align with modern Python best practices.

-from typing import Callable, Iterable
+from collections.abc import Callable, Iterable
🧰 Tools
🪛 Ruff (0.8.2)

1-1: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_nerda.py (1)

3-3: Avoid overshadowing the Python built-in list.

Importing list from typing shadows the built-in. Remove list from the import, and update the Iterable import to collections.abc as recommended by Ruff.

-from typing import Iterable, list
+from collections.abc import Iterable
🧰 Tools
🪛 Ruff (0.8.2)

3-3: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


3-3: Import list is shadowing a Python builtin

(A004)

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_daluke.py (1)

2-2: Migrate Iterable import to collections.abc.

Ruff recommends importing Iterable from collections.abc in newer Python versions to align with best practices.

-from typing import Iterable
+from collections.abc import Iterable
🧰 Tools
🪛 Ruff (0.8.2)

2-2: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

src/dacy/datasets/dane.py (1)

23-23: Consider the Python 3.10 union operator.

While Union[list[Corpus], Corpus] is correct, if your project supports Python 3.10 or higher, you might replace it with list[Corpus] | Corpus for readability.

training/main/scripts/add_readme_metadata.py (4)

91-94: Use of dict[str, Any] in _insert_value.

This transition away from Dict[str, Any] is well-aligned with the modern type hint approach.
Also, static analysis suggests replacing Union with the pipe operator (X | Y), for example Optional[Any]Any | None. This is optional but recommended for code clarity.

-def _insert_value(
-    metadata: dict[str, Any],
-    name: str,
-    value: Optional[Any],
-) -> dict[str, Any]:
+def _insert_value(
+    metadata: dict[str, Any],
+    name: str,
+    value: Any | None,
+) -> dict[str, Any]:
🧰 Tools
🪛 Ruff (0.8.2)

93-93: Use X | Y for type annotations

Convert to X | Y

(UP007)


102-105: Use of dict[str, list[Any]] in _insert_values_as_list.

Similarly, you can replace Optional[Any] with Any | None to align with the new type hinting style introduced in Python 3.10.

-def _insert_values_as_list(
-    metadata: dict[str, Any],
-    name: str,
-    values: Optional[Any],
-) -> dict[str, list[Any]]:
+def _insert_values_as_list(
+    metadata: dict[str, Any],
+    name: str,
+    values: Any | None,
+) -> dict[str, list[Any]]:
🧰 Tools
🪛 Ruff (0.8.2)

104-104: Use X | Y for type annotations

Convert to X | Y

(UP007)


116-116: Consider using the pipe operator for the return type.

For _create_metric, you might replace Union[str, float] with str | float.

-def _create_metric(name: str, t: str, value: float) -> dict[str, Union[str, float]]:
+def _create_metric(name: str, t: str, value: float) -> dict[str, str | float]:
🧰 Tools
🪛 Ruff (0.8.2)

116-116: Use X | Y for type annotations

Convert to X | Y

(UP007)


125-125: Adopt the pipe operator for the function _create_p_r_f_list.

Currently returning list[dict[str, Union[str, float]]]. Consider the modern union syntax for Union[str, float].

-def _create_p_r_f_list(
-    metric_name: str,
-    precision: float,
-    recall: float,
-    f_score: float,
-) -> list[dict[str, Union[str, float]]]:
+def _create_p_r_f_list(
+    metric_name: str,
+    precision: float,
+    recall: float,
+    f_score: float,
+) -> list[dict[str, str | float]]:
🧰 Tools
🪛 Ruff (0.8.2)

125-125: Use X | Y for type annotations

Convert to X | Y

(UP007)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81ae227 and d9e8a08.

📒 Files selected for processing (15)
  • docs/evaluation/datasets.py (4 hunks)
  • docs/evaluation/utils.py (6 hunks)
  • papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_daluke.py (2 hunks)
  • papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_nerda.py (2 hunks)
  • papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_utils.py (2 hunks)
  • pyproject.toml (3 hunks)
  • src/dacy/datasets/dane.py (3 hunks)
  • src/dacy/datasets/names.py (7 hunks)
  • src/dacy/hate_speech/wrapped_models.py (2 hunks)
  • src/dacy/score/input_length.py (2 hunks)
  • src/dacy/score/score.py (4 hunks)
  • src/dacy/sentiment/wrapped_models.py (2 hunks)
  • training/main/scripts/add_readme_metadata.py (5 hunks)
  • training/main/scripts/combine.py (1 hunks)
  • training/main/scripts/create_kb.py (3 hunks)
✅ Files skipped from review due to trivial changes (2)
  • training/main/scripts/combine.py
  • src/dacy/datasets/names.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/dacy/score/input_length.py
  • pyproject.toml
  • src/dacy/score/score.py
🧰 Additional context used
🪛 Ruff (0.8.2)
papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_nerda.py

3-3: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


3-3: Import list is shadowing a Python builtin

(A004)

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_daluke.py

2-2: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_utils.py

1-1: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

training/main/scripts/add_readme_metadata.py

93-93: Use X | Y for type annotations

Convert to X | Y

(UP007)


104-104: Use X | Y for type annotations

Convert to X | Y

(UP007)


116-116: Use X | Y for type annotations

Convert to X | Y

(UP007)


125-125: Use X | Y for type annotations

Convert to X | Y

(UP007)

training/main/scripts/create_kb.py

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


133-133: Missing type annotation for function argument qid

(ANN001)


133-133: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


150-150: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (21)
papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_utils.py (1)

15-15: Change for built-in type hints is correct.

Switching from List[str] to list[str] is aligned with modern Python type hinting. This looks good and maintains clarity.

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_nerda.py (1)

20-20: Return type update is correct.

Using the built-in list[Example] is a good modern practice. No further changes needed here.

papers/DaCy-A-Unified-Framework-for-Danish-NLP/apply_fns/apply_fn_daluke.py (1)

21-21: Updated return type.

Shifting to list[Example] is consistent with the modern Python approach to type hints. No issues found here.

docs/evaluation/datasets.py (2)

1-1: Using from __future__ import annotations is beneficial.

This helps optimize and simplify type hints in Python, especially with forward references. Great addition.


20-20: Consistent use of built-in generic types is commendable.

The shift from Dict & List to dict & list is aligned with modern Python best practices. This unifies the codebase with a clearer syntax.

Also applies to: 39-39, 68-68, 73-73, 78-78, 86-86

src/dacy/datasets/dane.py (4)

8-8: Consider removing unused imports from typing.

Now that List is replaced by the built-in list, verify whether any imports like List or others from typing can be safely removed to avoid confusion.


18-18: Modern type annotation usage.

Switching from List[str] to list[str] aligns well with Python 3.9+ conventions.


30-30: Docstring type hint updated correctly.

This docstring accurately reflects the updated splits (list[str]).


43-43: Docstring return type updated.

This follows modern type notation (Union[list[Corpus], Corpus]), so it’s consistent with the function signature.

src/dacy/hate_speech/wrapped_models.py (1)

78-83: Transition to modern built-in types.

Lines 78, 79, and 83 correctly replace List[...] with list[...], maintaining consistency. Nicely done.

src/dacy/sentiment/wrapped_models.py (1)

135-140: Use of list[...] type hints.

The updates at lines 135, 136, and 140 enhance clarity. This reflects best practices for Python 3.9+.

docs/evaluation/utils.py (6)

4-4: Remove unused type imports if unnecessary.

If List and Dict are no longer used, consider removing them from typing imports to keep things clean.


18-22: Updated function parameter and return type to built-in generics.

Changing examples: list[Example] and returning list[dict[str, Any]] is consistent with modern type hints.


38-38: Consistent usage of list[dict[str, Any]].

This improves readability and aligns with Python’s recommended approach.


119-119: Use list[Example] for “predictions_to_disk” function.

This update ensures a uniform typing style throughout the file.


202-202: Change to built-in list[Example].

Maintains the codebase-wide shift to native type annotations.


215-215: Refined signature for score_to_string.

dict[str, Any] is a clear and concise type hint, fully consistent with the rest of the changes.

training/main/scripts/add_readme_metadata.py (4)

8-9: Modernized type annotations with from __future__ import annotations.

Enabling this future import allows the usage of Python 3.9+ style annotations, which is consistent with the updated type hints below.


12-12: Removed Dict and List from imports.

You no longer need Dict or List from the typing module as you’re now using built-in collection types for type annotations. This is a clean way to modernize the codebase.


21-21: Return type updated to use built-in dict[str, Any].

This is consistent with Python 3.9+ type hinting. Good enhancement for clearer type annotations.


132-132: Use of list[dict[str, Any]] in _create_model_index.

Transitioning from List[Dict[str, Any]] to list[dict[str, Any]] is consistent with Python 3.9+ style. Looks great!

@@ -131,7 +130,7 @@ def _fetch_wikidata_description(qid: str):
return item.description.get("da", item.description.get("en", "")) # type: ignore


def _fetch_wikidata_aliases(qid, langs: List[str] = ["da", "en"]):
def _fetch_wikidata_aliases(qid, langs: list[str] = ["da", "en"]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing type annotation for qid; also avoid mutable defaults.

Here, qid is untyped, and langs: list[str] = ["da", "en"] can lead to mutable default issues. Consider declaring a type for qid (e.g., str) and replacing the default list argument.

-def _fetch_wikidata_aliases(qid, langs: list[str] = ["da", "en"]):
+def _fetch_wikidata_aliases(qid: str, langs: list[str] | None = None):
+    if langs is None:
+        langs = ["da", "en"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _fetch_wikidata_aliases(qid, langs: list[str] = ["da", "en"]):
def _fetch_wikidata_aliases(qid: str, langs: list[str] | None = None):
if langs is None:
langs = ["da", "en"]
🧰 Tools
🪛 Ruff (0.8.2)

133-133: Missing type annotation for function argument qid

(ANN001)


133-133: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@@ -24,7 +23,7 @@
def main(
trf_name: str,
save_path_kb: Path = project_path / "assets/knowledge_bases/knowledge_base.kb",
langs_to_fetch: List[str] = ["da", "en"],
langs_to_fetch: list[str] = ["da", "en"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid using mutable lists as default arguments.

Static analysis warns that langs_to_fetch: list[str] = ["da", "en"] could cause unexpected behavior if the default list is ever modified. Replace with None and initialize inside the function.

-def main(
-    trf_name: str,
-    save_path_kb: Path = project_path / "assets/knowledge_bases/knowledge_base.kb",
-    langs_to_fetch: list[str] = ["da", "en"],
-):
+def main(
+    trf_name: str,
+    save_path_kb: Path = project_path / "assets/knowledge_bases/knowledge_base.kb",
+    langs_to_fetch: list[str] | None = None,
+):
+    if langs_to_fetch is None:
+        langs_to_fetch = ["da", "en"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
langs_to_fetch: list[str] = ["da", "en"],
def main(
trf_name: str,
save_path_kb: Path = project_path / "assets/knowledge_bases/knowledge_base.kb",
langs_to_fetch: list[str] | None = None,
):
if langs_to_fetch is None:
langs_to_fetch = ["da", "en"]
🧰 Tools
🪛 Ruff (0.8.2)

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@@ -148,7 +147,7 @@
return aliases


def _load_ents_from_data(nlp: Language, splits: List[str] = ["dev", "train"]):
def _load_ents_from_data(nlp: Language, splits: list[str] = ["dev", "train"]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid mutable default for splits.

Similar concern regarding default list usage. Replace with None and initialize within the function.

-def _load_ents_from_data(nlp: Language, splits: list[str] = ["dev", "train"]):
+def _load_ents_from_data(nlp: Language, splits: list[str] | None = None):
+    if splits is None:
+        splits = ["dev", "train"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _load_ents_from_data(nlp: Language, splits: list[str] = ["dev", "train"]):
def _load_ents_from_data(nlp: Language, splits: list[str] | None = None):
if splits is None:
splits = ["dev", "train"]
🧰 Tools
🪛 Ruff (0.8.2)

150-150: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@KennethEnevoldsen KennethEnevoldsen merged commit f8913d0 into main Dec 26, 2024
9 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the ci--Update-ruff-and-cruft-template branch December 26, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant