feat: use typeline for IO #24

clintval · 2024-11-08T22:50:28Z

Summary by CodeRabbit

Release Notes

New Features
- Introduced BedWriter and BedReader classes for handling BED records.
- Added support for Python 3.10 in the project.
Documentation
- Updated README with new code examples and compatibility information.
Bug Fixes
- Corrected job dependencies in GitHub workflows for improved CI/CD processes.
Tests
- Added comprehensive unit tests for the new BedReader and BedWriter functionalities.
- Updated existing tests to reflect changes in class structures and type annotations.
Chores
- Updated project version from 0.2.0 to 0.3.0 and adjusted dependencies accordingly.

clintval · 2024-11-08T22:55:37Z

@coderabbitai full review please!

coderabbitai · 2024-11-08T22:55:55Z

@clintval: I will perform a full review of the pull request.

✅ Actions performed

Full review triggered.

coderabbitai · 2024-11-08T23:16:37Z

Warning

Rate limit exceeded

@clintval has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 23 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b6b34a8 and 3084fc5.

Walkthrough

This pull request introduces several changes across multiple files in the repository. A new code owner, @clintval, is designated in the .github/CODEOWNERS file. Workflow files are updated for clarity and functionality, including renaming jobs and adjusting dependencies. The README.md is revised for improved documentation, while the bedspec package undergoes structural changes, including renaming and importing adjustments. The BedReader and BedWriter classes are restructured and tested, with new test files added to validate their functionality. The pyproject.toml file reflects updates in versioning and dependencies.

Changes

File	Change Summary
`.github/CODEOWNERS`	Added `@clintval` as a code owner for all files in the repository.
`.github/workflows/publish_bedspec.yml`	Updated tag pattern for push events, renamed job `tests` to `unit-tests`, updated job dependencies, and corrected output reference in `make-github-release`.
`.github/workflows/tests.yml`	Renamed workflow from "Code checks" to "unit tests" and updated Python version matrix to include Python 3.10.
`README.md`	Updated Python version badge, added `basedpyright` badge, reformatted code examples for clarity, and adjusted syntax highlighting.
`bedspec/__init__.py`	Replaced `GenomicSpan` with `BedType`, added `ReferenceSpan`, and updated `__all__` list.
`bedspec/_bedspec.py`	Renamed `GenomicSpan` to `ReferenceSpan`, modified `BedStrand` class, updated method signatures, and standardized length calculations.
`bedspec/_io.py`	Deleted file containing `BedWriter` and `BedReader` classes.
`bedspec/_reader.py`	Introduced `BedReader` class for reading BED records, including methods for handling comments and complex types.
`bedspec/_writer.py`	Introduced `BedWriter` class for writing BED records, including custom encoding and comment writing methods.
`bedspec/overlap/__init__.py`	Removed `QueryReferenceSpanType` import and updated `__all__` to include only `OverlapDetector`.
`bedspec/overlap/_overlap.py`	Updated `OverlapDetector` class with new type annotations and modified methods for consistency.
`pyproject.toml`	Updated package version to `0.3.0`, changed Python version constraint, added new dependency `typeline`, and updated various sections for consistency and new configurations.
`tests/test_bedspec.py`	Replaced `GenomicSpan` with `ReferenceSpan` in tests and updated assertions for length checks.
`tests/test_io.py`	Deleted file containing tests for `BedWriter` and `BedReader`.
`tests/test_overlap.py`	Removed a specific test case and updated the method for adding features to `OverlapDetector`.
`tests/test_reader.py`	Introduced new tests for `BedReader` functionality, covering various scenarios for reading BED records.
`tests/test_writer.py`	Introduced new tests for `BedWriter`, validating writing functionality and handling of comments.

Possibly related PRs

feat: simplify how the BedReader and BedWriter work #21: The changes in this PR involve modifications to the BedReader and BedWriter classes, which are directly related to the changes in the main PR that added @clintval as a code owner for files that include these classes.

Poem

In the burrow, changes bloom,
Code owners rise, dispelling gloom.
With tests anew and docs refined,
Our bedspec's path is well-defined.
Hops of joy, let code unite,
For every bug, we’ll win the fight! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (28)

.github/CODEOWNERS (1)

1-1: Consider adding a team or additional maintainers for redundancy.

The CODEOWNERS syntax is correct, but having a single code owner (@clintval) for all files (*) could create:

A review bottleneck when the owner is unavailable

A single point of failure (bus factor of 1)

Consider:

Adding a team (e.g., * @org/bedspec-maintainers)

Or including additional maintainers as backup reviewers
bedspec/__init__.py (1)
22-43: Consider grouping related exports in all.

While the __all__ list is comprehensive, consider grouping related items together (e.g., all Bed types, interfaces, IO classes) and adding section comments for better organization.

Here's a suggested reorganization:
 __all__ = [
+    # Core Bed implementations
     "Bed2",
     "Bed3",
     "Bed4",
     "Bed5",
     "Bed6",
     "Bed12",
-    "BedColor",
-    "BedGraph",
-    "BedLike",
     "BedPE",
-    "BedStrand",
-    "BedType",
-    "Named",
     "PairBed",
     "PointBed",
-    "ReferenceSpan",
     "SimpleBed",
+    # Protocols and Types
+    "BedLike",
+    "BedType",
+    "Named",
+    "ReferenceSpan",
     "Stranded",
+    # Enums and Constants
+    "BedColor",
+    "BedStrand",
+    # Special formats
+    "BedGraph",
+    # I/O Classes
     "BedReader",
     "BedWriter",
 ]
bedspec/_writer.py (3)
11-12: Enhance class docstring with more details.

The current docstring is minimal. Consider expanding it to include:

Detailed description of the class's purpose

Usage examples

Any configuration options or parameters

Example enhancement:
-    """A writer of BED records."""
+    """A writer for BED (Browser Extensible Data) format records.
+    
+    This class provides functionality to write BED records to a file, supporting
+    various BED formats and custom encodings for different field types.
+    
+    Example:
+        >>> with BedWriter(path) as writer:
+        ...     writer.write(bed_record)
+        ...     writer.write_comment("Track name=example")
+    """
14-23: Consider improving type hints and addressing the type checker warning.

The implementation is correct, but there are two potential improvements:

The type hint for collections could be more specific

The pyright ignore comment suggests an underlying type issue that could be addressed

Consider this enhancement:
-    def _encode(self, item: Any) -> Any:
+    def _encode(self, item: None | BedColor | list[Any] | set[Any] | tuple[Any, ...] | Any) -> str | Any:
         """A callback for over_readriding the encoding of builtin types and custom types."""
         if item is None:
             return "."
         if isinstance(item, (list, set, tuple)):
-            return ",".join(map(str, item))  # pyright: ignore[reportUnknownArgumentType]
+            return ",".join(str(x) for x in item)
         if isinstance(item, BedColor):
             return str(item)
         return super()._encode(item=item)
25-29: Consider enhancing error handling and removing unused assignment.

The implementation works but could be improved in several ways:

Remove the unused _ assignment

Add error handling for write operations

Add return type hint

Consider this enhancement:
-    def write_comment(self, comment: str) -> None:
+    def write_comment(self, comment: str) -> None:
         """Write a comment to the BED output.
+        
+        Args:
+            comment: The comment text to write.
+            
+        Raises:
+            IOError: If writing to the file fails.
+        """
         for line in comment.splitlines():
             prefix = "" if any(line.startswith(prefix) for prefix in COMMENT_PREFIXES) else "# "
-            _ = self._handle.write(f"{prefix}{line}\n")
+            self._handle.write(f"{prefix}{line}\n")
.github/workflows/tests.yml (2)
1-1: Consider a more comprehensive workflow name.

The name "unit tests" might be too specific as the workflow appears to run comprehensive checks (via check-all task). Consider a more accurate name like "code-quality" or "code-checks" to better reflect all the checks being performed.
-name: unit tests
+name: code-quality
Line range hint 13-20: Consider adding workflow optimizations for better reliability and efficiency.

The workflow could benefit from additional configurations:

Add fail-fast strategy to stop all matrix jobs if one fails

Set timeout limits to prevent hung jobs

Add concurrency control to cancel redundant workflow runs
 jobs:
   Tests:
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     strategy:
+      fail-fast: true
       matrix:
         PYTHON_VERSION: ["3.10", "3.11", "3.12"]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
.github/workflows/publish_bedspec.yml (1)
Line range hint 52-63: Consider adding dependency caching for improved efficiency.

The poetry installation and dependency steps could benefit from caching to speed up the workflow execution.

Consider adding cache steps:
      - uses: actions/setup-python@v5
        with:
          python-version: 3.12

+     - name: Cache poetry dependencies
+       uses: actions/cache@v3
+       with:
+         path: .venv
+         key: ${{ runner.os }}-poetry-${{ hashFiles('poetry.lock') }}
+         restore-keys: |
+           ${{ runner.os }}-poetry-

      - name: Install poetry
        run: |
README.md (3)
34-41: Consider enhancing the example with file content visualization.

While the example correctly demonstrates BedWriter usage, it would be more informative to show the actual content written to the file.
 >>> with BedWriter.from_path(temp_file.name, Bed3) as writer:
 ...     writer.write(bed)
+>>> 
+>>> # Show the content of the file:
+>>> print(temp_file.read())
+chr1    2       8
77-88: Consider expanding the example to demonstrate more operations.

While the example effectively shows basic overlap detection, it would be beneficial to demonstrate the other operations mentioned below (enclosed_by, enclosing, overlapping).
 >>> detector.overlaps(my_feature)
 True
+>>> 
+>>> # Demonstrate other operations:
+>>> detector.overlapping(my_feature)
+[Bed3(refname='chr1', start=1, end=4)]
+>>> detector.enclosed_by(Bed3("chr1", start=0, end=10))
+[Bed3(refname='chr1', start=1, end=4), Bed3(refname='chr1', start=5, end=9)]
105-115: Consider adding usage example for the custom BED type.

While the class definition is clear, showing how to create and use a Bed3Plus1 instance would make the example more complete.
 ...     end: int
 ...     my_custom_field: float | None
+>>> 
+>>> # Example usage:
+>>> custom_bed = Bed3Plus1("chr1", 100, 200, my_custom_field=42.0)
+>>> print(custom_bed)
+Bed3Plus1(refname='chr1', start=100, end=200, my_custom_field=42.0)
bedspec/overlap/_overlap.py (3)
Line range hint 21-33: Consider enhancing the docstring.

While the docstring is good, it could be even better by:

Adding type information for the required properties

Including example usage

Documenting the performance characteristics

Here's a suggested enhancement:
     """Detects and returns overlaps between a collection of reference features and query feature.

     The overlap detector may be built with any feature-like Python object that has the following
     properties:

-      * `refname`: The reference sequence name
-      * `start`: A 0-based start position
-      * `end`: A 0-based half-open end position
+      * `refname` (str): The reference sequence name
+      * `start` (int): A 0-based start position
+      * `end` (int): A 0-based half-open end position
+
+    Performance characteristics:
+      * Building: O(n log n) where n is the number of features
+      * Querying: O(log n + m) where m is the number of overlapping features
+
+    Example:
+        >>> detector = OverlapDetector()
+        >>> detector.add(Feature("chr1", 100, 200))
+        >>> list(detector.overlapping(Feature("chr1", 150, 250)))
     """
34-39: Consider improving type annotations for cgranges.

The implementation is solid, but there are several type ignore comments for cgranges. Consider:

Creating type stubs for cgranges

Using a more specific type for the tree dictionary

Example type stub for cgranges:
# cgranges.pyi
class cgranges:
    def add(self, refname: str, start: int, end: int, idx: int) -> None: ...
    def index(self) -> None: ...
    def overlap(self, refname: str, start: int, end: int) -> list[tuple[str, int, int, int]]: ...
Line range hint 56-81: Add error handling for invalid queries.

The query methods could benefit from additional error handling:

Validate feature coordinates (start <= end)

Handle None values

Add logging for performance monitoring

Example implementation:
     def overlapping(self, feature: ReferenceSpan) -> Iterator[ReferenceSpanType]:
         """Yields all the overlapping features for a given query feature."""
+        if feature is None:
+            raise ValueError("Query feature cannot be None")
+        if feature.start > feature.end:
+            raise ValueError("Invalid coordinates: start > end")
+
         refname: Refname = feature.refname
tests/test_reader.py (6)
14-29: Consider adding error case tests.

While the happy path is well tested, consider adding tests for:

Invalid file paths

Permission denied scenarios

Malformed BED records
def test_bed_reader_raises_on_invalid_path() -> None:
    with pytest.raises(FileNotFoundError):
        BedReader.from_path("non_existent.bed", Bed3)
31-47: LGTM! Comprehensive comment handling tests.

Good coverage of different comment types and empty lines. Consider adding a test for inline comments if supported by the format.

49-57: Consider adding more test cases for string fields.

While null handling is tested, consider adding tests for:

Actual string values

Empty strings

Special characters in strings

59-67: Add edge cases for numeric fields.

Consider adding tests for:

Maximum/minimum values

Zero values

Negative numbers (if applicable)

Invalid numeric formats

69-77: Add error handling tests for context manager.

Consider testing:

Exception handling within the context

Resource cleanup after exceptions
def test_bed_reader_context_manager_handles_errors(tmp_path: Path) -> None:
    path = tmp_path / "test.bed"
    path.write_text("invalid data")
    
    with pytest.raises(ValueError):
        with BedReader.from_path(path, Bed4) as reader:
            list(reader)  # Should raise on invalid data
79-103: LGTM! Comprehensive roundtrip testing.

Excellent test coverage for complex BED12 records with multiple field types. Consider adding tests for:

Empty lists in block_sizes and block_starts

Color edge cases (max values)

Validation of block_count matching actual block sizes
pyproject.toml (1)
143-146: Consider enhancing pyright configuration.

While the basic configuration is good, consider adding more pyright-specific settings for stricter type checking:
 [tool.pyright]
 include = ["bedspec/"]
 reportAny = false
 pythonVersion = "3.10"
+strict = ["bedspec/"]
+typeCheckingMode = "strict"
tests/test_writer.py (4)
17-52: Consider adding file existence check.

The parameterized test is well-structured and covers all BED types comprehensively. Consider adding an assertion to verify file existence before reading its contents:
 def test_bed_writer_can_write_all_bed_types(bed: BedLike, expected: str, tmp_path: Path) -> None:
     """Test that the BED writer can write all BED types."""
     with open(tmp_path / "test.bed", "w") as handle:
         writer: BedWriter = BedWriter(handle, type(bed))
         writer.write(bed)
 
+    test_file = tmp_path / "test.bed"
+    assert test_file.exists(), "BED file was not created"
+    assert test_file.stat().st_size > 0, "BED file is empty"
-    assert Path(tmp_path / "test.bed").read_text() == expected
+    assert test_file.read_text() == expected
80-102: Consider consolidating type validation tests.

The generic and non-generic type validation tests have significant overlap in functionality. Consider consolidating them using parametrization:
-def test_bed_writer_remembers_the_type_it_will_write(tmp_path: Path) -> None:
-    """Test that the BED writer remembers the type it can only write."""
-    with open(tmp_path / "test.bed", "w") as handle:
-        writer: BedWriter = BedWriter(handle, Bed2)
-        writer.write(Bed2(refname="chr1", start=1))
-        with pytest.raises(
-            ValueError,
-            match="Expected Bed2 but found Bed3!",
-        ):
-            writer.write(Bed3(refname="chr1", start=1, end=2))
-
-
-def test_bed_writer_remembers_the_type_it_will_write_generic(tmp_path: Path) -> None:
-    """Test that the generically parameterized BED writer remembers the type it can only write."""
-    with open(tmp_path / "test.bed", "w") as handle:
-        writer = BedWriter(handle, Bed2)
-        writer.write(Bed2("chr1", 1))
-        with pytest.raises(
-            ValueError,
-            match="Expected Bed2 but found Bed3!",
-        ):
-            writer.write(Bed3(refname="chr1", start=1, end=2))  # type: ignore[arg-type]
+@pytest.mark.parametrize(
+    "initial_bed,invalid_bed,type_hint",
+    [
+        (Bed2(refname="chr1", start=1), Bed3(refname="chr1", start=1, end=2), True),
+        (Bed2("chr1", 1), Bed3(refname="chr1", start=1, end=2), False),
+    ],
+)
+def test_bed_writer_type_validation(
+    tmp_path: Path,
+    initial_bed: Bed2,
+    invalid_bed: Bed3,
+    type_hint: bool,
+) -> None:
+    """Test that the BED writer enforces type validation for both generic and non-generic usage."""
+    with open(tmp_path / "test.bed", "w") as handle:
+        writer = BedWriter(handle, Bed2)
+        if type_hint:
+            writer: BedWriter
+        writer.write(initial_bed)
+        with pytest.raises(ValueError, match="Expected Bed2 but found Bed3!"):
+            if not type_hint:
+                # type: ignore[arg-type]
+            writer.write(invalid_bed)
104-139: Consider adding edge case tests for comments.

The comment writing tests are thorough but could benefit from additional edge cases:
def test_bed_writer_write_comment_edge_cases(tmp_path: Path) -> None:
    """Test edge cases for comment writing."""
    with open(tmp_path / "test.bed", "w") as handle:
        writer = BedWriter(handle, Bed2)
        # Empty comment
        writer.write_comment("")
        # Comment with only whitespace
        writer.write_comment("   ")
        # Comment with special characters
        writer.write_comment("!@#$%^&*()")
        writer.write(Bed2(refname="chr1", start=1))

    expected = "# \n#    \n# !@#$%^&*()\nchr1\t1\n"
    assert Path(tmp_path / "test.bed").read_text() == expected
141-148: Add verification for proper file closure.

While the context manager test verifies writing functionality, it should also confirm that the file handle is properly closed:
 def test_bed_writer_can_be_used_as_context_manager(tmp_path: Path) -> None:
     """Test that the BED writer can be used as a context manager."""
+    file_handle = None
     with BedWriter(open(tmp_path / "test.bed", "w"), Bed2) as handle:
+        file_handle = handle._handle  # Store reference to underlying file handle
         handle.write(Bed2(refname="chr1", start=1))
         handle.write(Bed2(refname="chr2", start=2))
 
     expected = "chr1\t1\nchr2\t2\n"
     assert Path(tmp_path / "test.bed").read_text() == expected
+    assert file_handle.closed, "File handle should be closed after exiting context"
tests/test_bedspec.py (1)
155-155: Consider adding edge case tests for length calculations.

The transition to using len() is good and the current test cases validate basic lengths well. Consider adding tests for:

Empty intervals (if allowed)

Large intervals to check for any integer overflow issues

Edge cases where start and end positions are at chromosome boundaries
def test_bed_length_edge_cases() -> None:
    """Test edge cases for BED length calculations."""
    # Large interval
    assert len(Bed3(refname="chr1", start=1, end=1_000_000)) == 999_999
    
    # Maximum possible coordinates (if applicable to your use case)
    assert len(Bed3(refname="chr1", start=1, end=2**31-1)) == 2**31-2
Also applies to: 172-174
bedspec/_bedspec.py (2)
35-35: Consider using StrEnum for BedStrand

The BedStrand class is currently defined as class BedStrand(str, Enum). If you are targeting Python 3.11 or using typing_extensions, you can use StrEnum for clearer intent and to avoid manually inheriting from str. This enhances readability and aligns with modern Python practices.

Apply this diff to use StrEnum:
-from enum import Enum
+from enum import StrEnum

-class BedStrand(str, Enum):
+class BedStrand(StrEnum):
Ensure you import StrEnum appropriately:
-from enum import Enum
+from enum import StrEnum
46-49: Handle unexpected values in BedStrand.opposite method

Currently, the opposite method assumes self is either BedStrand.Positive or BedStrand.Negative. To make the code more robust against future additions to BedStrand, consider adding an else clause to handle unexpected values. This would help in catching errors if new strands are added in the future.

Apply this diff to handle unexpected values:
 def opposite(self) -> "BedStrand":
     if self is BedStrand.Positive:
         return BedStrand.Negative
     elif self is BedStrand.Negative:
         return BedStrand.Positive
+    else:
+        raise ValueError(f"Invalid BedStrand value: {self}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0db2106 and b6b34a8.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

.github/CODEOWNERS (1 hunks)
.github/workflows/publish_bedspec.yml (3 hunks)
.github/workflows/tests.yml (2 hunks)
README.md (3 hunks)
bedspec/__init__.py (1 hunks)
bedspec/_bedspec.py (11 hunks)
bedspec/_io.py (0 hunks)
bedspec/_reader.py (1 hunks)
bedspec/_writer.py (1 hunks)
bedspec/overlap/__init__.py (1 hunks)
bedspec/overlap/_overlap.py (2 hunks)
pyproject.toml (9 hunks)
tests/test_bedspec.py (4 hunks)
tests/test_io.py (0 hunks)
tests/test_overlap.py (1 hunks)
tests/test_reader.py (1 hunks)
tests/test_writer.py (1 hunks)

💤 Files with no reviewable changes (2)

bedspec/_io.py
tests/test_io.py

🔇 Additional comments (41)

bedspec/overlap/__init__.py (2)

1-1: LGTM! Clean import statement.

The relative import follows Python best practices and clearly indicates the module's dependency.

3-5: LGTM! Well-structured public API definition.

The __all__ declaration explicitly defines the module's public interface, which is a good practice for controlling what gets exported during from module import * operations. This change improves API clarity and encapsulation.

Let's verify that this change doesn't break existing imports:

✅ Verification successful

Change is safe - no impact on existing imports

The git history shows that QueryReferenceSpanType was previously imported but not included in __all__. The test file and codebase search confirm that there are no direct imports of this type. The only imports found are for OverlapDetector, which remains in the public API.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any direct imports of QueryReferenceSpanType that might be affected
# by its removal from the public API

# Search for any imports of QueryReferenceSpanType
rg "from.*bedspec\.overlap.*import.*QueryReferenceSpanType" || \
rg "from.*bedspec\.overlap.*import.*\*.*" || \
rg "import.*bedspec\.overlap"

Length of output: 137

Script:

#!/bin/bash
# Let's try a broader search to understand the usage patterns
# 1. Find all Python files that might import from bedspec.overlap
fd '\.py$' --exec grep -l "bedspec.overlap" {} \;

# 2. Look for any references to QueryReferenceSpanType in Python files
rg "QueryReferenceSpanType" --type py

# 3. Check the actual content of the overlap module to understand what was previously exported
fd -g 'overlap/__init__.py' --exec cat {} \;

Length of output: 162

Script:

#!/bin/bash
# Let's examine the test file that imports from bedspec.overlap
cat tests/test_overlap.py

# Also check the git history to see what was previously in __all__
git log -p bedspec/overlap/__init__.py

Length of output: 6335

bedspec/__init__.py (2)

19-20: Approve splitting IO functionality into dedicated modules.

Moving BedReader and BedWriter from ._io to dedicated ._reader and ._writer modules follows the Single Responsibility Principle, making the codebase more maintainable.

Let's verify that the old _io.py module has been removed and the new modules exist:

#!/bin/bash
# Description: Verify IO module restructuring
# Test 1: Old _io.py should not exist
# Test 2: New _reader.py and _writer.py should exist with the moved classes

# Check if old _io.py still exists (should not)
fd -t f "_io.py" bedspec/

# Verify BedReader in new location
ast-grep --pattern 'class BedReader:
  $$$
'

# Verify BedWriter in new location
ast-grep --pattern 'class BedWriter:
  $$$
'

12-12: Verify impact of GenomicSpan → ReferenceSpan rename.

The renaming of GenomicSpan to ReferenceSpan and addition of BedType appear to be part of an API evolution. The __all__ list is properly updated to reflect these changes.

Let's verify the impact of these changes:

Also applies to: 16-16, 22-43

✅ Verification successful

The rename from GenomicSpan to ReferenceSpan has been properly implemented across the codebase

The search results confirm that:

The old GenomicSpan name has been completely removed from the codebase
ReferenceSpan is consistently used in all relevant files:
- bedspec/__init__.py: Proper export
- bedspec/_bedspec.py: Protocol definition and usage
- bedspec/overlap/_overlap.py: Type imports and usage
- tests/test_bedspec.py: Test cases updated

The git history shows this was part of a deliberate API evolution change, and all references have been properly updated.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to GenomicSpan
# Test: Search for GenomicSpan references that might need updating

# Look for any remaining GenomicSpan references
rg "GenomicSpan" -l

# Look for type annotations using ReferenceSpan
rg "ReferenceSpan" -A 2

Length of output: 119

Script:

#!/bin/bash
# Let's try a more comprehensive search for both old and new terms
# Search for GenomicSpan case-insensitively and with word boundaries
rg -w -i "GenomicSpan"

# Search for ReferenceSpan usage
rg -w "ReferenceSpan"

# Check git history for the rename
git log -p --all -S "GenomicSpan"

# Look for any potential test files or documentation that might reference these terms
fd -e py -e md -e rst | xargs rg -l "GenomicSpan|ReferenceSpan"

Length of output: 36739

bedspec/_writer.py (1)

1-8: LGTM! Well-organized imports following best practices.

The imports are properly organized, separating standard library, third-party, and project-specific imports.

.github/workflows/tests.yml (1)

19-19: Verify Python 3.10 compatibility with all dependencies.

Adding Python 3.10 to the test matrix is good for broader compatibility. However, we should ensure all project dependencies support Python 3.10.

Let's check the project's dependencies and their Python version requirements:

✅ Verification successful

Python 3.10 compatibility is verified for all dependencies

Based on the dependency analysis:

Project's Python requirement in pyproject.toml is ^3.10.0,<4.0.0
All dependencies have minimum Python version requirements that are compatible with Python 3.10:
- Most packages require >=3.8
- Some require >=3.7
- The highest minimum requirement is from typeline and the project itself at >=3.10.0

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Python version compatibility in pyproject.toml and poetry.lock

# Test 1: Check Python version specification in pyproject.toml
echo "Checking Python version requirements in pyproject.toml:"
rg "python = " pyproject.toml -A 2

# Test 2: Check if poetry.lock exists and examine dependencies
if [ -f poetry.lock ]; then
    echo "Checking dependencies in poetry.lock:"
    rg "python-versions = " poetry.lock -A 2
fi

Length of output: 3886

.github/workflows/publish_bedspec.yml (3)

5-5: LGTM! Clear and explicit version tag pattern.

The updated tag pattern '[0-9]+.[0-9]+.[0-9]+' is more explicit and maintains semantic versioning compatibility.

131-131: LGTM! Correct changelog output reference.

The update to use needs.make-changelog.outputs.release_body correctly aligns with the job name and its output.

27-28: Verify workflow reference for renamed job.

The job renaming from 'tests' to 'unit-tests' is more descriptive and all dependencies are correctly updated. However, we should verify that the referenced workflow file matches this change.

#!/bin/bash
# Description: Verify the referenced workflow matches the new job name
# Test: Check if tests.yml workflow has matching job names
rg "name: unit.*tests" .github/workflows/tests.yml

Also applies to: 35-35, 40-40

README.md (4)

5-7: LGTM: Badge updates reflect enhanced compatibility and type checking.

The updated badges accurately reflect the broader Python version compatibility (3.10+) and the addition of basedpyright type checking.

25-28: LGTM: Clear and well-formatted example.

The example effectively demonstrates Bed3 object creation with proper interactive Python shell formatting.

47-53: LGTM: Clear demonstration of BedReader usage.

The example effectively shows file reading, iteration, and object representation.

61-70: LGTM: Comprehensive listing of supported BED types.

The example clearly shows all available BED format classes with proper import statements.

bedspec/overlap/_overlap.py (3)

2-14: LGTM! Improved type safety and documentation.

The updated imports and type definitions show good practices:

Using override decorator for type safety
Clear type aliases with documentation
Proper import organization

41-44: LGTM! Clean iterator implementation.

Good use of:

@override decorator for type safety
chain for efficient iteration over multiple lists

46-54: Consider atomic operations for thread safety.

The feature addition process modifies multiple data structures separately. In a multi-threaded environment, this could lead to inconsistencies.

Let's check for concurrent usage:

Consider:

Adding thread safety documentation
Using a lock if concurrent access is needed
Making the operations atomic

tests/test_reader.py (1)

1-12: LGTM! Well-organized imports.

The imports are properly organized with standard library imports separated from package imports. All imported entities are used within the test file.

tests/test_overlap.py (2)

Line range hint 1-116: Verify if hashability testing is still needed.

The test suite provides comprehensive coverage of the OverlapDetector's functionality. However, the removal of the hashability test case needs verification to ensure it aligns with the current implementation requirements.

Let's check if the implementation still requires hashable features:

#!/bin/bash
# Description: Check if the implementation requires hashable features
# Test: Look for type hints or collections that would require hashability
# Expect: Evidence of hashable requirements if they exist

echo "Searching for potential hashable requirements in OverlapDetector..."
rg -A 5 'class OverlapDetector' 

echo "Searching for Set or Dict usage with features..."
ast-grep --pattern 'Set[$_]'
ast-grep --pattern 'Dict[$_]'

37-38: LGTM! The changes improve type safety and align with the updated API.

The explicit type annotation and use of argument unpacking with add(*beds) is a good improvement over the previous add_all() approach. This change:

Makes the types more explicit and safer
Aligns with the updated OverlapDetector API

Let's verify that the add_all method has been removed from the implementation:

✅ Verification successful

Verified: The add_all method has been completely removed and replaced with add(*features)

The implementation in bedspec/overlap/_overlap.py shows that:

The OverlapDetector class now uses variadic arguments in add(*features) method
No traces of add_all method exist in the codebase
The test changes align perfectly with the implementation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that add_all method is no longer present in the codebase
# Test: Search for add_all method definition or usage. Expect: No occurrences.

echo "Searching for add_all method definition..."
ast-grep --pattern 'def add_all'

echo "Searching for add_all method usage..."
rg 'add_all\('

Length of output: 809

Script:

#!/bin/bash
# Let's search more specifically for add_all method and its usage
echo "Searching for add_all method definition..."
ast-grep --pattern 'def add_all'

echo "Searching for add_all method usage..."
rg 'add_all\('

# Let's also check the OverlapDetector implementation for the add method
echo "Checking OverlapDetector add method implementation..."
ast-grep --pattern 'def add'

# Let's also look at the actual implementation file
echo "Content of overlap detector implementation..."
fd "overlap" --type f --exec cat {}

Length of output: 9739

pyproject.toml (6)

3-3: LGTM! Version bump and metadata updates are appropriate.

The version bump to 0.3.0 and the addition of Python 3.10 support in classifiers align well with the dependency changes. The package metadata is well-structured.

Also applies to: 17-17, 20-20, 30-30

69-76: Task configuration improvements enhance testing coverage.

Good improvements:

Including markdown files in doctests ensures documentation stays up-to-date
Well-structured typing checks with both mypy and pyright

96-123: Test configuration is well-structured and comprehensive.

Good test setup:

80% minimum coverage requirement
Enabled doctest-plus for enhanced documentation testing
Well-configured doctest options for better output

151-151: LGTM! Good linting configuration updates.

The changes improve code quality checks:

Target version aligned with project requirements
Added McCabe complexity check with reasonable max-complexity of 10

Also applies to: 179-180

204-204: LGTM! Good addition of security commit parsing.

Adding security commit parsing improves changelog organization and security tracking.

33-34: Dependencies updated appropriately.

The changes look good:

Python version requirement broadened to 3.10+ while safely excluding Python 4
Addition of typeline aligns with the PR objective
Dev dependencies versions updated to latest compatible versions

Also applies to: 42-43

✅ Verification successful

Let me check the actual usage of typeline in the files to ensure it's properly implemented.

Let me verify how these typeline components are being used in the actual implementation.

Let me try a different pattern to check the class inheritance.

Dependencies are correctly integrated and utilized

The verification confirms:

typeline is properly imported and used in both reader/writer implementations
BedReader and BedWriter classes correctly inherit from TsvStructReader and TsvStructWriter
Python version requirements in CI workflows are compatible with the specified version range
The dependency addition is justified by actual implementation in the codebase

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify typeline usage in the codebase
echo "Checking typeline imports and usage..."
rg -l "from typeline|import typeline"

# Verify Python version compatibility in CI
echo "Checking CI Python version matrix..."
rg "python-version" .github/workflows/

Length of output: 707

Script:

#!/bin/bash
# Check the actual typeline usage in the files
echo "Checking typeline usage in _writer.py..."
rg -A 5 "from typeline|import typeline" bedspec/_writer.py

echo "\nChecking typeline usage in _reader.py..."
rg -A 5 "from typeline|import typeline" bedspec/_reader.py

Length of output: 698

Script:

#!/bin/bash
# Check for TsvStructWriter/Reader usage patterns
echo "Checking TsvStructWriter usage..."
ast-grep --pattern 'class $_ (TsvStructWriter)' bedspec/_writer.py

echo "\nChecking TsvStructReader usage..."
ast-grep --pattern 'class $_ (TsvStructReader)' bedspec/_reader.py

Length of output: 295

Script:

#!/bin/bash
# Check class definitions that use typeline components
echo "Checking classes in _writer.py..."
rg -A 3 "^class.*TsvStructWriter" bedspec/_writer.py

echo "\nChecking classes in _reader.py..."
rg -A 3 "^class.*TsvStructReader" bedspec/_reader.py

Length of output: 453

tests/test_writer.py (2)

1-15: LGTM! Well-organized imports.

The imports are clean, well-structured, and all imported types are utilized in the test cases.

65-78: LGTM! Well-structured path handling tests.

The tests effectively cover both Path and string path handling, with proper resource management using context managers.

tests/test_bedspec.py (2)

86-90: LGTM! Type changes are consistent.

The test function has been properly updated to use ReferenceSpan, with consistent type annotation and assertion.

18-18: Verify protocol renaming impact across the codebase.

The import change from GenomicSpan to ReferenceSpan appears to be part of a protocol renaming. Let's verify the consistency of this change.

✅ Verification successful

Based on the results, I can now generate the final response:

Protocol renaming from GenomicSpan to ReferenceSpan is complete and consistent

The search results confirm that:

No remaining references to GenomicSpan exist in the codebase
ReferenceSpan is properly used across the codebase:
- Defined as a Protocol in bedspec/_bedspec.py
- Correctly exported in bedspec/__init__.py
- Properly used in test files and overlap detection implementation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to GenomicSpan
# and verify ReferenceSpan is used consistently

echo "Checking for any remaining GenomicSpan references..."
rg "GenomicSpan"

echo "Verifying ReferenceSpan usage..."
rg "ReferenceSpan"

Length of output: 2088

bedspec/_reader.py (1)

22-84: The BedReader implementation looks solid

The BedReader class is well-structured and effectively extends TsvStructReader. The custom _decode method correctly handles the decoding of complex types, ensuring accurate parsing of BED records.

bedspec/_bedspec.py (11)

99-103: Ensure subclasses of PointBed are annotated with @dataclass

The __init_subclass__ method correctly enforces that any subclass of PointBed must be a dataclass. This helps maintain consistency and prevents subtle bugs that might arise from missing dataclass annotations.

105-105: Implement __len__ for PointBed

Adding the __len__ method provides a standard way to obtain the length of a PointBed instance using Python's built-in len() function. This enhances usability and makes the class more pythonic.

110-110: Update return type in territory method to ReferenceSpan

Changing the return type to Iterator[ReferenceSpan] aligns with the renaming of GenomicSpan to ReferenceSpan and ensures consistency across the codebase.

123-127: Ensure subclasses of SimpleBed are annotated with @dataclass

The __init_subclass__ method enforces that subclasses of SimpleBed are dataclasses, promoting consistency and preventing potential issues due to missing annotations.

134-134: Implement __len__ for SimpleBed

Providing the __len__ method allows for easy retrieval of a SimpleBed instance's length using len(), improving code readability and usability.

139-139: Update return type in territory method to ReferenceSpan

Updating the method aligns with the new ReferenceSpan protocol and maintains consistency.

155-159: Ensure subclasses of PairBed are annotated with @dataclass

The enforcement of @dataclass annotation for subclasses of PairBed ensures that all necessary dataclass features are available, enhancing consistency and reliability.

177-178: Update return type in territory method to ReferenceSpan

Consistently using ReferenceSpan enhances type checking and code clarity.

198-198: Use of Self in BedColor.from_string for better type hinting

Replacing the return type with Self allows subclasses of BedColor to use from_string and get the correct type hint, improving the extensibility of the class.

276-277: Allow None for block_sizes and block_starts in Bed12

By changing these fields to accept None, the class becomes more flexible and can represent BED12 records that may not have these attributes defined. Ensure the __post_init__ method properly handles None values to maintain data integrity.

358-358: Use of Self in BedPE.from_bed6 for accurate return type

Updating the return type to Self ensures that class methods return an instance of the correct subclass, improving type safety and supporting inheritance.

.github/workflows/publish_bedspec.yml

tests/test_writer.py

bedspec/_reader.py

clintval added 5 commits November 8, 2024 14:49

feat: use typeline for IO

bafa187

feat: support Py3.10

54f939a

chore: add classifier for 3.10

15560ee

chore: bump to 0.3.0

3917e80

chore: re-lock

08151be

Repository owner deleted a comment from coderabbitai bot Nov 8, 2024

fix: remove unncessary add_all in OverlapDetector

b6b34a8

coderabbitai bot reviewed Nov 8, 2024

View reviewed changes

.github/workflows/publish_bedspec.yml Show resolved Hide resolved

tests/test_writer.py Outdated Show resolved Hide resolved

bedspec/_reader.py Show resolved Hide resolved

bedspec/_reader.py Show resolved Hide resolved

clintval added 2 commits November 8, 2024 15:31

feat: unfreeze dataclasses for speed

394f383

chore: suit the code rabbit

3084fc5

clintval merged commit 523b9d4 into main Nov 8, 2024
8 checks passed

clintval deleted the cv_use_typeline branch November 8, 2024 23:34

coderabbitai bot mentioned this pull request Dec 2, 2024

feat: remove cgranges for superintervals #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use typeline for IO #24

feat: use typeline for IO #24

clintval commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

clintval commented Nov 8, 2024

coderabbitai bot commented Nov 8, 2024

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

feat: use typeline for IO #24

feat: use typeline for IO #24

Conversation

clintval commented Nov 8, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

clintval commented Nov 8, 2024

coderabbitai bot commented Nov 8, 2024

coderabbitai bot commented Nov 8, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

clintval commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)