Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simplify how the BedReader and BedWriter work #21

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

clintval
Copy link
Owner

@clintval clintval commented Nov 6, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced documentation in README.md for using BED features, including new examples and updated naming conventions.
    • Introduced new test cases for validating BED strand representations, colors, and the new BED12 format.
  • Bug Fixes

    • Improved error messages in validation tests for Bed3 and BedPE.
  • Documentation

    • Restructured CONTRIBUTING.md for clarity on development and testing instructions.
    • Updated README.md and pyproject.toml with new sections, dependencies, and configurations.
  • Tests

    • Expanded test coverage with new functions and updated existing tests to align with naming conventions and improve clarity.
  • Chores

    • Updated dependencies in pyproject.toml and modified workflow configurations for improved testing processes.

@clintval clintval force-pushed the cv_better_serde branch 3 times, most recently from 767833b to b87c765 Compare November 6, 2024 20:22
@clintval
Copy link
Owner Author

clintval commented Nov 6, 2024

@coderabbitai review

Copy link

coderabbitai bot commented Nov 6, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Repository owner deleted a comment from coderabbitai bot Nov 6, 2024
Copy link

coderabbitai bot commented Nov 6, 2024

Warning

Rate limit exceeded

@clintval has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0731483 and 8891081.

Walkthrough

The changes include updates to workflow configurations, documentation, and code structure within the bedspec project. The GitHub Actions workflow has been renamed and modified for dependency management and testing. The CONTRIBUTING.md file has been restructured to clarify development commands, while the README.md has been enhanced with new examples. The core library files have seen significant restructuring, including renaming attributes and introducing new protocols. Several test files have been updated to reflect these changes, with new tests added and old ones removed to improve coverage and clarity.

Changes

File Change Summary
.github/workflows/tests.yml Workflow renamed to "Code checks"; updated POETRY_VERSION to 1.8; job renamed to Tests; installation method for Poetry changed to pipx; new step added for poethepoet; testing command updated to poetry task check-all.
CONTRIBUTING.md Section renamed to "Primary Development Commands"; installation instructions for Poetry removed; new linting and formatting commands added; commands for running tests clarified; new section for shortcut task commands added.
README.md New section "Building a BED Feature" added; examples updated to simplify usage; class name and field name changes; updated assertion statements in overlap detection examples.
bedspec/__init__.py New imports added from _bedspec; imports removed from _types to streamline structure.
bedspec/_bedspec.py New protocols and enumerations added; existing classes updated to replace contig with refname; new methods added for field retrieval.
bedspec/_io.py BedWriter and BedReader classes updated for better type handling; new methods added for JSON conversion; type variables renamed for clarity.
bedspec/_types.py File deleted; contained protocols and abstract classes related to BED records.
bedspec/_typing.py File deleted; contained utility classes and functions for type handling.
bedspec/overlap/_overlap.py Protocols updated to include NotImplemented for properties; type variable renamed for clarity; OverlapDetector class updated.
build.py Minor formatting changes; no logic alterations.
pyproject.toml Version updated to 0.2.0; new dependencies added; development dependencies updated; new sections for task management introduced.
tests/test_bedspec.py New tests added for BED functionalities; parameter names updated for consistency; error messages enhanced.
tests/test_io.py Parameter names updated from contig to refname; new tests added for Bed12; type hints updated.
tests/test_overlap.py Parameter names updated from contig to refname across multiple tests; error messages refined.
tests/test_types.py File deleted; contained comprehensive tests for BED classes.
tests/test_typing.py File deleted; contained tests for type handling functions.
bedspec/overlap/__init__.py New import added for QueryReferenceSpanType; import for GenomicSpanLike removed.

Poem

🐰 In the garden of code, changes bloom bright,
With workflows and tests, all set just right.
From contig to refname, a name change so grand,
As we hop through the updates, a new world at hand!
With poethepoet now guiding our way,
Let's celebrate progress, hip-hip-hooray! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (29)
bedspec/__init__.py (1)

Line range hint 1-1: Consider documenting the purpose of noqa.

While the # ruff: noqa: F401 comment is correct for suppressing unused import warnings (as these are meant to be re-exported), it might be helpful to add a brief comment explaining why these imports are intentionally marked as used.

-# ruff: noqa: F401
+# ruff: noqa: F401  # These imports are intentionally re-exported as part of the public API
.github/workflows/tests.yml (3)

1-1: Consider a more specific workflow name.

The name "Code checks" is quite generic. Since this workflow primarily runs tests, consider a more specific name like "Tests" or "Unit Tests" to better reflect its purpose. This helps with clarity when viewing GitHub Actions dashboard.

Also applies to: 12-12


15-20: Consider testing against multiple Python versions.

Currently, the workflow only tests against Python 3.12. Consider adding more Python versions (e.g., 3.10, 3.11) to ensure compatibility across different Python environments. This is especially important if this is a library meant to be used by others.

Example matrix configuration:

   strategy:
     matrix:
-      PYTHON_VERSION: ["3.12"]
+      PYTHON_VERSION: ["3.10", "3.11", "3.12"]

55-56: Consider moving task runner installation before cache setup.

Installing poethepoet after the cache setup means it won't be included in the cache. Moving this installation before the cache setup could improve workflow execution time in subsequent runs.

CONTRIBUTING.md (2)

35-38: Fix hyphenation in compound adjective

The phrase "up to date" should be hyphenated when used as a compound adjective before a noun.

-To check the lock file is up to date:
+To check the lock file is up-to-date:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date: console poetry check --lock ``` #...

(UP_TO_DATE_HYPHEN)


61-61: Fix heading levels for consistent document structure

The section headings are using h6 (######) but should use h3 (###) to maintain proper heading hierarchy, as they follow an h2 section.

-###### For Running Individual Checks
+### For Running Individual Checks

-###### For Running All Checks
+### For Running All Checks

-###### For Running Individual Fixes
+### For Running Individual Fixes

-###### For Running All Fixes
+### For Running All Fixes

-###### For Running All Fixes and Checks
+### For Running All Fixes and Checks

Also applies to: 71-71, 77-77, 84-84, 90-90

🧰 Tools
🪛 Markdownlint

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

README.md (4)

33-36: Consider enhancing the writing example.

While the example is correct, consider expanding it to demonstrate more real-world scenarios:

 from bedspec import BedWriter
+from bedspec import Bed3
 
+# Create multiple BED records
+beds = [
+    Bed3("chr1", start=2, end=8),
+    Bed3("chr2", start=5, end=10),
+]
+
 with BedWriter.from_path("test.bed") as writer:
-    writer.write(bed)
+    # Write multiple records
+    for bed in beds:
+        writer.write(bed)

75-82: Consider adding type compatibility note.

The example effectively demonstrates overlap detection between different BED types. Consider adding a comment explaining that the detector works with any BED type that implements the required interval protocol.

 detector = OverlapDetector[Bed3]([bed1, bed2])
+# OverlapDetector works with any BED type implementing the interval protocol
 my_feature = Bed4("chr1", start=2, end=3, name="hi-mom")

85-91: Improve list formatting consistency.

The operations list should have consistent punctuation:

-The overlap detector supports the following operations:
+The overlap detector supports the following operations
 
 - `overlapping`: return all overlapping features
 - `overlaps`: test if any overlapping features exist
 - `those_enclosed_by`: return those enclosed by the input feature
 - `those_enclosing`: return those enclosing the input feature
🧰 Tools
🪛 LanguageTool

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)


103-108: Consider adding documentation for the custom BED type.

The example would benefit from additional documentation:

 @dataclass(eq=True)
 class Bed3Plus1(SimpleBed):
+    """BED3 format with an additional custom field.
+    
+    Inherits from SimpleBed which provides the core BED3 functionality.
+    """
     refname: str
     start: int
     end: int
-    my_custom_field: float | None
+    my_custom_field: float | None  # Custom score or measurement
pyproject.toml (3)

51-52: Consider pinning major versions for build dependencies.

While using >= is flexible, it might lead to unexpected build issues if major versions introduce breaking changes. Consider using ^ instead for more predictable builds:

-requires = ["poetry-core>=1.8", "setuptools>=70.1.1", "Cython>=3.0.10"]
+requires = ["poetry-core^1.8.0", "setuptools^70.1.1", "Cython^3.0.10"]

37-38: Address TODO comments in configuration.

There are two TODO items that should be addressed:

  1. Satisfying basedpyright warnings and adding to CI
  2. Getting doctests to work and run on README

Would you like me to help create GitHub issues to track these tasks?

Also applies to: 42-43


92-107: Consider documenting strict type checking rationale.

The mypy configuration is very strict, which is good for code quality but might be challenging for contributors. Consider adding a comment explaining the rationale for these strict settings.

[tool.mypy]
+# Strict type checking configuration to ensure high code quality and catch potential issues early
files                       = ["build.py", "bedspec/", "tests/"]
bedspec/overlap/_overlap.py (1)

Line range hint 139-147: Consider caching the index state check.

The overlap-related methods (overlapping, overlaps, those_enclosing, those_enclosed_by) all potentially trigger index rebuilding. Consider caching the index state check to avoid repeated checks within the same query session.

 def overlapping(self, feature: GenomicSpanLike) -> Iterator[GenericGenomicSpanLike]:
     """Yields all the overlapping features for a given genomic span."""
     refname: Refname = self._reference_sequence_name(feature)
+    tree = self._refname_to_tree[refname]
 
-    if refname in self._refname_to_tree and not self._refname_to_is_indexed[refname]:
-        self._refname_to_tree[refname].index()  # index the tree if we find it is not indexed
+    if not self._refname_to_is_indexed.get(refname, True):
+        tree.index()
+        self._refname_to_is_indexed[refname] = True
 
-    for *_, idx in self._refname_to_tree[refname].overlap(refname, feature.start, feature.end):
+    for *_, idx in tree.overlap(refname, feature.start, feature.end):
         yield self._refname_to_features[refname][idx]

Also applies to: 149-150, 153-157, 159-163

tests/test_bedspec.py (4)

23-34: Consider adding edge cases for strand tests.

The tests cover basic functionality well. Consider adding test cases for:

  • Invalid strand characters
  • Case sensitivity (e.g., '+' vs 'plus')
  • None/empty values

Line range hint 88-126: Consider adding negative test cases and improve readability.

While the positive test cases are comprehensive, consider:

  1. Adding negative test cases for invalid instantiations
  2. Breaking down long assertions for better readability

Example refactor for better readability:

-    assert record.bed1 == Bed6(refname="chr1", start=1, end=2, name="foo", score=5, strand=BedStrand.Positive)  # fmt: skip  # noqa: E501
+    expected_bed1 = Bed6(
+        refname="chr1",
+        start=1,
+        end=2,
+        name="foo",
+        score=5,
+        strand=BedStrand.Positive
+    )
+    assert record.bed1 == expected_bed1

129-156: Consider adding edge cases for territory calculations.

The tests cover basic functionality well. Consider adding test cases for:

  • Zero-length territories
  • Overlapping territories
  • Territories spanning multiple chromosomes

Line range hint 187-205: Consider adding more test cases for paired BED territories.

The test covers basic functionality. Consider adding test cases for:

  • Same chromosome pairs
  • Reverse ordered coordinates
  • Adjacent territories
  • Overlapping territories
tests/test_io.py (1)

251-251: TODO comment needs to be addressed.

The comment indicates that BED12 spec compliance verification is missing. This is important for ensuring compatibility with other tools.

Would you like me to help create a test case that verifies BED12 spec compliance?

bedspec/_bedspec.py (5)

25-40: Enhance documentation for the BedStrand enum and its opposite method

The BedStrand enum now includes an opposite method. Consider adding more detailed docstrings or usage examples to clarify the purpose and usage of this method, improving code readability and maintainability.


43-64: Consolidate protocols to reduce complexity

Multiple protocols (GenomicSpan, Named, Stranded) are introduced to define structural interfaces. Evaluate if all these protocols are necessary or if they can be consolidated to simplify the codebase and reduce potential over-engineering.


80-90: Consider caching results in header and types functions for efficiency

The header and types functions generate lists of field names and types for BED records. If these functions are called frequently, consider implementing caching mechanisms to improve performance by avoiding redundant computations.


109-126: Improve validation error messages in SimpleBed's __post_init__ method

The __post_init__ method in SimpleBed raises a ValueError when start >= end or start < 0. Enhance the error message to provide specific details about the invalid values, aiding in debugging and ensuring clarity.

Apply this diff to refine the error message:

 def __post_init__(self) -> None:
     """Validate this linear BED record."""
     if self.start >= self.end or self.start < 0:
-        raise ValueError("start must be greater than 0 and less than end!")
+        raise ValueError(f"Invalid interval: start ({self.start}) must be >= 0 and less than end ({self.end})")

132-151: Unify validation logic in PairBed's __post_init__ method

The __post_init__ method in PairBed separately validates start1, end1, start2, and end2. Consider creating a shared validation function to reduce code duplication and ensure consistency in error handling.

bedspec/_io.py (5)

103-105: Avoid unnecessary type re-annotation and 'type: ignore' comment

In the bed_type setter, re-annotating self._bed_type and suppressing the type checker with # type: ignore[no-redef] is unnecessary since it was already annotated in the constructor.

Apply this diff to remove the redundant annotation and comment:

def bed_type(self, bed_type: type[BedType]) -> None:
-    self._bed_type: type[BedType] = bed_type  # type: ignore[no-redef]
+    self._bed_type = bed_type
     self._header: list[str] = header(cast(BedLike, bed_type))
     self._types: list[type | str | Any] = types(cast(BedLike, bed_type))

132-138: Consider using 'isinstance' for type checking

Using type(bed) is not self.bed_type checks for exact type matches, which might not allow subclasses of self.bed_type. Using isinstance provides more flexibility and adheres to the principles of polymorphism.

Apply this diff to use isinstance:

 def _maybe_setup_with(self, bed: BedType) -> None:
     """Perform post-initialization and record validation."""
     if self.bed_type is None:
         self.bed_type = type(bed)

-    if type(bed) is not self.bed_type:
+    if not isinstance(bed, self.bed_type):
         raise TypeError(
             f"BedWriter can only continue to write features of the same type."
             f" Will not write a {type(bed).__name__} after a"
             f" {self.bed_type.__name__}."
         )

249-255: Simplify '_pre_decode' method to avoid unnecessary JSON operations

The _pre_decode method serializes objects using json.dumps and then deserializes them later, which adds unnecessary overhead.

Consider returning the dictionary directly without JSON serialization:

 def _pre_decode(kind: type, obj: Any) -> Any:
     if obj == MISSING_FIELD and NoneType in get_args(kind):
         return None
     if kind is BedColor or BedColor in get_args(kind):
-        return json.dumps(as_dict(BedColor.from_string(cast(str, obj))))
+        return as_dict(BedColor.from_string(cast(str, obj)))
     if kind is BedStrand or BedStrand in get_args(kind):
-        return f'"{obj}"'
+        return obj
     return obj

213-217: Specify type parameters for 'DictReader' for clarity

Providing type parameters for DictReader improves type checking and code readability.

Adjust the initialization to include type parameters:

-self._reader: DictReader = DictReader(
+self._reader: DictReader[str] = DictReader(
     without_comments(),
     delimiter="\t",
     fieldnames=self._header,
 )

201-204: Consider generalizing field type handling

The attributes _header and _types are generated by casting bed_type to BedLike, which may be unnecessary if bed_type is already a BedLike. This could be simplified to improve readability.

If bed_type is guaranteed to be a BedLike, you might not need the cast.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 80fafdc and aed3112.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • .github/workflows/tests.yml (4 hunks)
  • CONTRIBUTING.md (1 hunks)
  • README.md (4 hunks)
  • bedspec/__init__.py (1 hunks)
  • bedspec/_bedspec.py (4 hunks)
  • bedspec/_io.py (4 hunks)
  • bedspec/_types.py (0 hunks)
  • bedspec/_typing.py (0 hunks)
  • bedspec/overlap/_overlap.py (6 hunks)
  • build.py (1 hunks)
  • pyproject.toml (4 hunks)
  • tests/test_bedspec.py (4 hunks)
  • tests/test_io.py (12 hunks)
  • tests/test_overlap.py (3 hunks)
  • tests/test_types.py (0 hunks)
  • tests/test_typing.py (0 hunks)
💤 Files with no reviewable changes (4)
  • bedspec/_types.py
  • bedspec/_typing.py
  • tests/test_types.py
  • tests/test_typing.py
✅ Files skipped from review due to trivial changes (1)
  • build.py
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date: console poetry check --lock ``` #...

(UP_TO_DATE_HYPHEN)

README.md

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint
CONTRIBUTING.md

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

🔇 Additional comments (30)
bedspec/__init__.py (1)

7-7: LGTM! Well-organized imports with improved type consolidation.

The consolidation of types and protocols from _types into _bedspec improves code organization. The new imports (Bed12, BedLike, GenomicSpan, Named, Stranded, header, types) are logically grouped and maintain a clean public API.

Let's verify that all the imported types are properly defined in _bedspec:

Also applies to: 10-10, 13-14, 18-20

✅ Verification successful

All imported types and utilities are properly defined in _bedspec

The verification confirms that all imported types (Bed12, BedLike, GenomicSpan, Named, Stranded) and utilities (header, types) are properly defined in _bedspec.py with appropriate implementations:

  • Protocol classes: GenomicSpan, Named, Stranded
  • Abstract base class: BedLike
  • Concrete class: Bed12
  • Utility functions: header and types for field introspection
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all imported types exist in _bedspec
# Expected: All types should be found in _bedspec.py

echo "Checking for type definitions in _bedspec.py..."
rg -l "class (Bed12|BedLike|GenomicSpan|Named|Stranded)" bedspec/_bedspec.py

echo "Checking for utility definitions..."
rg -l "(header|types) =" bedspec/_bedspec.py

Length of output: 351


Script:

#!/bin/bash
# Let's verify the actual implementations to ensure all imported types are properly defined
echo "Checking class definitions and their content:"
rg "class (Bed12|BedLike|GenomicSpan|Named|Stranded)" -A 2 bedspec/_bedspec.py

echo -e "\nChecking utility definitions:"
rg "(header|types) =" -A 2 bedspec/_bedspec.py

Length of output: 963

.github/workflows/tests.yml (2)

37-38: LGTM! Using pipx is the recommended approach.

The switch to pipx for installing Poetry follows best practices as it provides better isolation for Python applications.


59-59: Verify the contents of the check-all task.

The change from poetry run pytest to poetry task check-all suggests this task might include additional checks beyond testing. Please ensure all necessary checks (tests, linting, type checking, etc.) are included in this task.

✅ Verification successful

check-all task is well-defined and comprehensive

The check-all task is a sequence that includes all necessary quality checks:

  • Poetry lock file verification (poetry check --lock)
  • Code formatting check (ruff format)
  • Linting (ruff check)
  • Tests (pytest)
  • Type checking (mypy)

The change from direct pytest execution to check-all is actually an improvement as it ensures all quality checks are run together.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the definition of the check-all task
rg "check-all" -A 5 pyproject.toml

Length of output: 334


Script:

#!/bin/bash
# Let's check the definitions of individual check tasks
rg "check-(lock|format|lint|tests|typing)" -A 2 pyproject.toml

Length of output: 349

CONTRIBUTING.md (2)

43-60: Well-documented plugin installation and future compatibility notes!

The installation instructions are clear, and the note about Poetry v2.0.0 provides valuable context for future maintenance.


63-93: Excellent organization of task commands!

The commands are well-structured and logically organized, progressing from individual operations to comprehensive checks and fixes. This makes it easy for developers to find and use the appropriate commands for their needs.

README.md (1)

22-28: LGTM! Clear and concise example.

The new section effectively demonstrates the basic usage of Bed3 class.

pyproject.toml (2)

57-86: Well-structured task organization!

The poethepoet tasks are logically organized with clear separation between fix and check operations. The sequence tasks provide good automation for common development workflows.


Line range hint 159-192: Excellent conventional commits configuration!

The git-cliff configuration is well-structured with:

  • Clear commit type categorization
  • Good changelog formatting
  • Proper handling of breaking changes
bedspec/overlap/_overlap.py (3)

22-22: LGTM: Protocol method implementations are properly defined.

The addition of NotImplemented return statements in protocol methods follows best practices by making it explicit that these methods must be implemented by concrete classes.

Also applies to: 27-27, 37-37, 47-47, 57-57


112-120: LGTM: Improved reference sequence name detection.

The use of hasattr checks is more pythonic and flexible than isinstance checks, allowing for duck typing. The error message is clear and helpful.


69-70: Verify the impact of type variable renaming.

The renaming of _GenericGenomicSpanLike to GenericGenomicSpanLike (removing underscore prefix) suggests this is now part of the public API. Ensure all imports and usages are updated accordingly.

✅ Verification successful

Based on the search results, I can generate the final response as we have sufficient information:

Type variable renaming impact verified - no remaining references to old name

The renaming from _GenericGenomicSpanLike to GenericGenomicSpanLike has been properly implemented:

  • No references to the old name _GenericGenomicSpanLike found in the codebase
  • All usages of GenericGenomicSpanLike are contained within bedspec/overlap/_overlap.py where it's defined
  • The type variable is used consistently throughout the module in class definition, method signatures, and type hints
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to the old name
# and verify all imports of the new name are working

# Search for any remaining references to the old name
echo "Checking for old name references..."
rg "_GenericGenomicSpanLike"

# Search for imports of the new name to verify usage
echo "Checking new name usage..."
rg "GenericGenomicSpanLike"

Length of output: 1323

tests/test_overlap.py (8)

13-14: LGTM! Attribute rename is consistently applied.

The changes from contig to refname are correctly applied across the basic functionality tests while maintaining the original test coverage and assertions.

Also applies to: 21-22, 29-30


42-42: LGTM! Error message is more descriptive.

The test correctly verifies the hashable requirement with a more specific error message.

Also applies to: 49-49


54-55: LGTM! Edge case handling is well tested.

The test effectively verifies the behavior when an invalid reference name property is used.


70-71: LGTM! Feature addition and querying tests are comprehensive.

The changes maintain the test coverage while consistently applying the attribute rename.

Also applies to: 79-80


87-95: LGTM! Enclosing interval tests are thorough.

The test cases effectively cover various interval relationships while maintaining consistency with the attribute rename.


100-108: LGTM! Enclosed interval tests are comprehensive.

The test cases effectively verify the enclosed-by relationship with good coverage of edge cases.


113-115: LGTM! Overlap detection tests are well-structured.

The test cases provide comprehensive coverage of various overlap scenarios while maintaining consistency with the attribute rename.

Also applies to: 120-125


130-132: LGTM! Boolean overlap query tests are thorough.

The test cases effectively verify the overlap query functionality with good coverage of edge cases.

Also applies to: 137-142

tests/test_bedspec.py (2)

Line range hint 1-21: LGTM! Imports are well-organized and complete.

The imports are properly structured and include all necessary components for the tests.


55-86: LGTM! Comprehensive type hierarchy testing.

The tests effectively validate the class hierarchy, dataclass properties, and structural types. Good use of parametrization and type hints.

tests/test_io.py (4)

10-13: LGTM! Parameter renaming and new imports look good.

The renaming from 'contig' to 'refname' is more intuitive and aligns with common genomics terminology. The new imports support the enhanced test coverage.

Also applies to: 25-30


49-49: LGTM! Type annotation update improves flexibility.

The change to BedLike type makes the function more flexible while maintaining type safety through the protocol.


252-273: LGTM! Comprehensive test coverage for Bed12.

The test case thoroughly validates the roundtrip functionality of Bed12 records with complex types, including lists and custom types like BedColor.


Line range hint 277-329: Please clarify the removal of gzip support.

The removal of gzip-related tests suggests that compressed file support is being dropped. This could impact users who work with compressed BED files.

Let's check if this is documented elsewhere:

bedspec/_bedspec.py (3)

19-22: Verify the necessity and implementation of the DataclassInstance protocol

The DataclassInstance protocol is introduced to represent objects that are dataclass instances. Ensure that this protocol is essential and that all dataclass instances correctly adhere to it without redundancy.


66-77: Ensure proper implementation of the abstract base class BedLike

The BedLike class is introduced as a new abstract base class for BED records. Verify that all subclasses properly implement the required abstract methods, particularly territory(), to prevent any runtime errors due to missing implementations.


Line range hint 193-300: Verify all references from contig to refname have been updated

The classes have replaced the contig attribute with refname. Ensure that all references to contig across the codebase, including documentation and tests, have been updated to refname to maintain consistency and prevent attribute errors.

Run the following script to identify any remaining references to contig:

bedspec/_io.py (2)

7-7: Ensure 'NoneType' compatibility with supported Python versions

The NoneType class from the types module is only available starting from Python 3.10. If your project aims to support earlier Python versions, this import will cause compatibility issues.

Consider using type(None) as an alternative, which works across all Python versions.


13-13: 'TypeAlias' may not be available in Python versions prior to 3.10

The TypeAlias feature was introduced in Python 3.10. Using it in earlier versions will result in an ImportError.

If you need to maintain compatibility with Python versions earlier than 3.10, consider importing TypeAlias from the typing_extensions module.

Apply this diff to adjust the import:

-from typing import TypeAlias
+from typing_extensions import TypeAlias

README.md Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tests/test_bedspec.py Outdated Show resolved Hide resolved
tests/test_bedspec.py Show resolved Hide resolved
bedspec/_bedspec.py Outdated Show resolved Hide resolved
bedspec/_bedspec.py Outdated Show resolved Hide resolved
bedspec/_bedspec.py Outdated Show resolved Hide resolved
bedspec/_io.py Outdated Show resolved Hide resolved
bedspec/_io.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (10)
.github/workflows/tests.yml (1)

15-15: Consider adding Python 3.10 for broader compatibility.

While testing against Python 3.11 and 3.12 is good for catching issues with newer Python versions, many users might still be on Python 3.10. Adding it to the matrix would ensure broader compatibility.

  matrix:
-   PYTHON_VERSION: ["3.11", "3.12"]
+   PYTHON_VERSION: ["3.10", "3.11", "3.12"]

Also applies to: 19-19

README.md (1)

85-91: Consider adding a newline before the bullet points.

The content is correct, but adding a newline after the colon would improve readability.

 The overlap detector supports the following operations:
+
 - `overlapping`: return all overlapping features
🧰 Tools
🪛 LanguageTool

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)

pyproject.toml (3)

39-40: Address TODO comment regarding basedpyright integration.

The TODO comment suggests uncertainty about basedpyright integration. Since it's already listed as a dependency, we should either:

  1. Remove the TODO and integrate it into CI
  2. Remove the dependency if it's not going to be used

Would you like help setting up the basedpyright configuration and CI integration?


44-45: Address TODO comment regarding doctests.

The TODO comment about doctests should be addressed, especially since pytest-doctestplus is already included as a dependency.

Would you like help setting up doctests configuration for both the codebase and README?


134-156: Comprehensive linting configuration!

The ruff configuration is well-structured with a good selection of linters. Consider documenting these lint rules in CONTRIBUTING.md to help new contributors understand the expectations.

bedspec/_io.py (1)

142-156: Consider adding error handling for msgspec conversion.

While the _bed_to_dict method is well-implemented, it might benefit from explicit error handling for msgspec conversion failures.

 def _bed_to_dict(self, bed: BedType) -> dict[str, object]:
     """Convert a BED record into a shallow dictionary."""
     shallow = {name: self._encode(getattr(bed, name)) for name in self._header}
-    return cast(dict[str, object], to_builtins(shallow, order="deterministic"))
+    try:
+        return cast(dict[str, object], to_builtins(shallow, order="deterministic"))
+    except Exception as e:
+        raise ValueError(f"Failed to convert BED record to dictionary: {e}") from e
bedspec/_bedspec.py (2)

36-41: Simplify the opposite() method using a ternary operator

The method can be made more concise while maintaining readability.

     def opposite(self) -> "BedStrand":
         """Return the opposite BED strand."""
-        if self is BedStrand.Positive:
-            return BedStrand.Negative
-        else:
-            return BedStrand.Positive
+        return BedStrand.Negative if self is BedStrand.Positive else BedStrand.Positive

179-186: Enhance error handling in BedColor.from_string

The error handling could be more specific to help users understand the exact issue.

     @classmethod
     def from_string(cls, string: str) -> "BedColor":
         """Build a BED color instance from a string."""
         try:
-            r, g, b = map(int, string.split(","))
-        except ValueError as error:
-            raise ValueError(f"Invalid string '{string}'. Expected 'int,int,int'!") from error
+            parts = string.split(",")
+            if len(parts) != 3:
+                raise ValueError(f"Expected 3 comma-separated values, got {len(parts)}")
+            r, g, b = map(int, parts)
+        except ValueError as error:
+            if "invalid literal for int()" in str(error):
+                raise ValueError(f"All values must be integers in '{string}'") from error
+            raise
         return cls(r, g, b)
tests/test_bedspec.py (2)

179-184: Consider parameterizing the simple BED validation tests.

The current structure with multiple assertions in a single test function could be improved using pytest.mark.parametrize for better test isolation and clearer failure messages.

@pytest.mark.parametrize("start,end,description", [
    (-1, 5, "negative start"),
    (5, 5, "equal start and end"),
    (5, 0, "end less than start"),
])
def test_simple_bed_validates_start_and_end(start: int, end: int, description: str) -> None:
    """Test that a simple BED record validates its start and end."""
    with pytest.raises(ValueError, match="start must be greater than 0 and less than end!"):
        Bed3(refname="chr1", start=start, end=end)

189-201: Consider parameterizing the paired BED validation tests.

Similar to the simple BED validation tests, this could be improved using pytest.mark.parametrize for better maintainability and clearer test failures.

@pytest.mark.parametrize("test_case", [
    {"refname1": "chr1", "start1": -1, "end1": 5, "error_msg": "start1 must be greater than 0"},
    {"refname1": "chr1", "start1": 5, "end1": 5, "error_msg": "start1 must be greater than 0"},
    # ... add other test cases
])
def test_paired_bed_validates_start_and_end(test_case: dict) -> None:
    """Test that a paired BED record validates its start and end for both intervals."""
    with pytest.raises(ValueError, match=test_case["error_msg"]):
        BedPE(
            refname1=test_case["refname1"],
            start1=test_case["start1"],
            end1=test_case["end1"],
            refname2="chr1",
            start2=1,
            end2=2,
            name="foo",
            score=5,
            strand1=BedStrand.Positive,
            strand2=BedStrand.Positive,
        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between aed3112 and eba4608.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • .github/workflows/tests.yml (4 hunks)
  • README.md (5 hunks)
  • bedspec/__init__.py (1 hunks)
  • bedspec/_bedspec.py (4 hunks)
  • bedspec/_io.py (4 hunks)
  • pyproject.toml (3 hunks)
  • tests/test_bedspec.py (4 hunks)
  • tests/test_io.py (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • bedspec/init.py
  • tests/test_io.py
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (27)
.github/workflows/tests.yml (3)

1-1: LGTM: Workflow name and Poetry version updates are appropriate.

The workflow name change to "Code checks" better reflects its purpose, and updating to Poetry 1.8 keeps dependencies current.

Also applies to: 12-12


37-38: LGTM: Using pipx for Poetry installation is recommended.

The switch to pipx for installing Poetry follows best practices as it provides better isolation for Python CLI applications.


59-59: Verify the 'check-all' task definition.

The workflow now uses poetry task check-all instead of directly running pytest. This suggests a custom task definition that might include additional checks beyond testing.

Let's verify the task definition:

#!/bin/bash
# Check the definition of check-all task
rg "check-all" pyproject.toml
README.md (6)

5-5: LGTM: Python version badge update is accurate.


22-28: LGTM: Clear and concise example for BED feature creation.


33-49: LGTM: Writing and reading examples are clear and correct.

The previously reported typo in the console output has been fixed.


62-62: LGTM: BED types listing is comprehensive.


75-82: LGTM: Overlap detection example is clear and demonstrates proper usage.


103-108: LGTM: Custom BED type example is well-structured.

The example demonstrates proper use of dataclasses and type hints.

pyproject.toml (4)

59-88: Well-structured task organization!

The task hierarchy is well-organized with clear separation between fix and check operations. The sequence of operations in composite tasks follows a logical order.


Line range hint 164-192: Good conventional commit configuration!

The git-cliff configuration properly enforces semantic versioning through conventional commits and provides a clean changelog structure.


94-109: Verify impact of strict type checking settings.

The mypy configuration is appropriately strict, which is good for type safety. Let's verify if the codebase complies with these settings.

#!/bin/bash
# Check for any type: ignore comments that might bypass these strict settings
rg "type:\s*ignore" "bedspec/"

35-36: Verify msgspec usage in the codebase.

The addition of msgspec suggests serialization/deserialization functionality. Let's verify its usage.

✅ Verification successful

Let me search for msgspec usage with a different pattern since the previous search didn't yield relevant results.


Let me try one more search with different options:


Let's check the content of bedspec/_io.py since it's one of the files that contains msgspec:


msgspec dependency is properly used for data serialization/deserialization

The msgspec library is actively used in bedspec/_io.py for:

  1. Converting BED records using msgspec.convert
  2. Converting objects to built-in types using msgspec.to_builtins

These functions are essential parts of the BED file reading/writing functionality, specifically in:

  • BedWriter._bed_to_dict() method for serialization
  • BedReader.__iter__() method for deserialization

The dependency is justified and properly integrated into the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check how msgspec is being used in the codebase
rg -l 'import.*msgspec|from.*msgspec'

Length of output: 276


Script:

#!/bin/bash
# Search for msgspec usage with a broader pattern
rg "msgspec" --type python -A 3

# Also check the content of any potential configuration or setup files
fd "setup.py|setup.cfg" --exec cat {}

Length of output: 104


Script:

#!/bin/bash
# Search for any files containing msgspec string
rg -l "msgspec"

# Also let's look at the content of pyproject.toml to understand the context better
cat pyproject.toml

Length of output: 5299


Script:

#!/bin/bash
# Check the content of _io.py to see how msgspec is used
cat bedspec/_io.py

Length of output: 9925

bedspec/_io.py (5)

2-35: LGTM! Well-organized imports and type definitions.

The imports are logically grouped and the new type definitions (BedType and JsonType) are clear and well-documented. Good use of typing_extensions for compatibility.


83-108: Good type safety improvements in BedWriter.

The transition from _BedKind to BedType and the addition of proper type validation in the setter improves type safety. The initialization pattern allows for lazy setup while maintaining type correctness.


127-140: Well-implemented validation in _maybe_setup_with.

The method provides good runtime validation ensuring type consistency across writes. The error message is clear and helpful.


182-222: Good improvements to BedReader initialization and type handling.

The reader now requires explicit type information at initialization and properly sets up the CSV reader with comment filtering. The documentation includes helpful usage examples.


264-279: ⚠️ Potential issue

Manual JSON string construction needs improvement.

As mentioned in the past review comments, manual JSON string construction can be error-prone and less secure. Consider using a dictionary-based approach instead.

Apply this diff to improve safety and maintainability:

 def _csv_dict_to_json(self, record: dict[str, str]) -> JsonType:
     """Convert a CSV dictionary record to JSON using the known field types."""
-    key_values: list[str] = []
+    result: dict[str, Any] = {}
     for (name, value), field_type in zip(record.items(), self._types, strict=True):
         pre_decoded: str = self._pre_decode(cast(type, field_type), value)
         origin_type = get_origin(field_type)
-        if pre_decoded is None:
-            key_values.append(f'"{name}":null')
-        elif origin_type is list:
-            key_values.append(f'"{name}":[{pre_decoded.rstrip(",")}]')
-        elif field_type is str or str in get_args(field_type):
-            key_values.append(f'"{name}":"{pre_decoded}"')
-        else:
-            key_values.append(f'"{name}":{pre_decoded}')
-    json_string: JsonType = json.loads(f"{{{','.join(key_values)}}}")
-    return json_string
+        if pre_decoded is None:
+            result[name] = None
+        elif origin_type is list:
+            result[name] = [x.strip() for x in pre_decoded.split(",") if x.strip()]
+        elif field_type is str or str in get_args(field_type):
+            result[name] = pre_decoded
+        else:
+            result[name] = json.loads(pre_decoded) if isinstance(pre_decoded, str) else pre_decoded
+    return result
bedspec/_bedspec.py (4)

19-65: Well-structured protocol definitions!

The new protocols (DataclassInstance, GenomicSpan, Named, Stranded) are well-designed and provide clear interfaces for different BED record capabilities. The use of @runtime_checkable enables runtime type checking, which is valuable for debugging and validation.


Line range hint 67-162: Robust base class implementations with strong validation

The base classes provide excellent validation and clear abstractions. The __new__ method in BedLike ensures proper dataclass usage, while SimpleBed and PairBed include thorough validation in their __post_init__ methods.


81-90: Clean and effective utility function implementations

The header() and types() functions provide useful reflection capabilities with proper type hints and handle both instances and types correctly.


260-282: Comprehensive validation in Bed12.post_init

The validation logic for BED12 records is thorough and covers all essential cases:

  • Thick start/end consistency
  • Block count validation
  • Block sizes and starts validation
  • Block boundary validation

This addresses the previous review comment about implementing validation in Bed12's post_init method.

tests/test_bedspec.py (5)

24-30: LGTM! Well-structured test for BedStrand string behavior.

The test thoroughly verifies both string conversion and equality comparison for BedStrand.


32-35: LGTM! Clear test for BedStrand opposite functionality.

The test effectively verifies the bidirectional opposite strand relationship.


43-59: LGTM! Well-implemented parameterized test for BedColor validation.

The test follows the suggested structure from past reviews, using pytest.mark.parametrize for comprehensive coverage of invalid cases.


85-104: LGTM! Comprehensive structural type tests.

The tests effectively verify the type hierarchy and protocol implementations for GenomicSpan, Stranded, and DataclassInstance.


226-288: LGTM! Comprehensive BED12 validation test suite.

The test suite effectively covers all validation scenarios for BED12:

  • Thick start/end validation
  • Block count, sizes, and starts validation
  • Block size validation
  • Block start validation
  • Block end validation

The helper function make_bed12 is well-designed for testing different scenarios.

.github/workflows/tests.yml Show resolved Hide resolved
@clintval
Copy link
Owner Author

clintval commented Nov 6, 2024

@coderabbitai review everything again

Copy link

coderabbitai bot commented Nov 6, 2024

@clintval: I will review all the changes again as per your request.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (13)
CONTRIBUTING.md (2)

3-39: LGTM! Clear and well-organized development commands.

The commands are well-structured with clear descriptions for each operation.

Minor grammar fix needed:

-To check the lock file is up to date:
+To check the lock file is up-to-date:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date: console poetry check --lock ``` #...

(UP_TO_DATE_HYPHEN)


41-93: Great organization of task commands, but heading levels need adjustment.

The task commands are well-organized into logical groups with clear categorization. However, the heading hierarchy should be adjusted for better document structure.

Replace the h6 headings (######) with h3 headings (###) for proper hierarchy:

-###### For Running Individual Checks
+### For Running Individual Checks

-###### For Running All Checks
+### For Running All Checks

-###### For Running Individual Fixes
+### For Running Individual Fixes

-###### For Running All Fixes
+### For Running All Fixes

-###### For Running All Fixes and Checks
+### For Running All Fixes and Checks
🧰 Tools
🪛 Markdownlint

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

README.md (3)

22-28: Consider enhancing the example with parameter descriptions.

While the example is clear, it would be more helpful for newcomers if you added brief descriptions of what each parameter represents:

  • refname: The reference sequence name
  • start: The starting position (0-based)
  • end: The ending position (exclusive)

75-82: Consider adding an explanation of overlap behavior.

While the example is clear, it would be helpful to explain what constitutes an overlap (e.g., whether it includes adjacent intervals, zero-length intervals, etc.).


85-91: Fix list formatting for consistency.

The content is excellent, but consider standardizing the punctuation:

-The overlap detector supports the following operations:
+The overlap detector supports the following operations
🧰 Tools
🪛 LanguageTool

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)

pyproject.toml (2)

39-40: Address TODO comment regarding basedpyright integration.

The TODO comment suggests uncertainty about basedpyright integration. Since it's already listed as a dependency, we should either:

  1. Remove the TODO and integrate basedpyright into CI
  2. Remove the dependency if it's not going to be used

Would you like help setting up the basedpyright configuration and CI integration?


44-45: Address TODO comment regarding doctests.

The TODO suggests adding README doctests. This would improve documentation reliability.

Would you like help setting up pytest-doctestplus for README testing?

tests/test_overlap.py (1)

Line range hint 1-142: Consider adding additional edge cases

While the test coverage is comprehensive, consider adding these edge cases:

  1. Zero-length intervals (where start equals end)
  2. Adjacent but non-overlapping intervals
  3. Very large intervals that might stress the system
  4. Performance tests with a large number of intervals

Would you like me to help implement these additional test cases?

tests/test_bedspec.py (2)

179-184: Consider parameterizing the validation tests.

The validation tests for both simple BED and paired BED contain similar test cases that could be parameterized for better maintainability.

Example refactor for simple BED validation:

@pytest.mark.parametrize("start,end,msg", [
    (-1, 5, "start must be greater than 0 and less than end!"),
    (5, 5, "start must be greater than 0 and less than end!"),
    (5, 0, "start must be greater than 0 and less than end!")
])
def test_simple_bed_validates_start_and_end(start: int, end: int, msg: str) -> None:
    """Test that a simple BED record validates its start and end."""
    with pytest.raises(ValueError, match=msg):
        Bed3(refname="chr1", start=start, end=end)

Similar parameterization could be applied to the paired BED validation tests.

Also applies to: 190-201


226-288: Consider enhancing BED12 validation tests.

While the tests are comprehensive, they could be improved in the following ways:

  1. Use parameterization for similar test cases
  2. Add more descriptive error messages for better debugging

Example refactor for thick start/end validation:

@pytest.mark.parametrize("thick_start,thick_end", [
    (1, None),
    (None, 2)
])
def test_bed12_thick_coordinates_validation(thick_start: int | None, thick_end: int | None) -> None:
    """Test that thick start and end must be set together."""
    with pytest.raises(
        ValueError,
        match="thick_start and thick_end must both be None or both be set!"
    ):
        make_bed12(thick_start=thick_start, thick_end=thick_end)

Similar parameterization could be applied to other validation cases.

tests/test_io.py (1)

Compression support is being removed without proper cleanup

The search results show that while the gzip-related tests are commented out, the compression-related constants and file extensions are still defined in bedspec/_io.py. This inconsistency suggests an incomplete removal of compression support:

  • The commented-out tests in tests/test_io.py include all compression-related functionality
  • But bedspec/_io.py still maintains compression-related constants:
    • BGZ_EXTENSION, BGZIP_EXTENSION
    • GZ_EXTENSION, GZIP_EXTENSION
    • _BGZIP_EXTENSIONS, _GZIP_EXTENSIONS, _ALL_GZIP_COMPATIBLE_EXTENSIONS

Either:

  1. Remove all compression-related code and constants if compression support is being dropped
  2. Or restore the tests if compression support should be maintained
🔗 Analysis chain

Line range hint 278-332: Verify the intention behind removing gzip-related tests

The commented-out tests suggest a change in how compression is handled. Please clarify:

  1. Is compression support being removed or moved to a different module?
  2. Are these tests temporarily disabled during refactoring?
  3. Should we remove these commented-out tests entirely?

Let me generate additional scripts to check for gzip-related functionality since the previous script output seems unrelated.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if gzip functionality exists elsewhere in the codebase
echo "Searching for gzip-related code..."
rg -l "gzip" --type py

# Check if there are any other compression-related tests
echo "Searching for compression-related tests..."
rg -l "compress" --type py --glob "tests/**"

Length of output: 11135


Script:

#!/bin/bash
# Search for gzip-related code and imports
echo "Searching for gzip imports and usage..."
rg "import.*gzip|from.*gzip" --type py

# Search for compression-related functionality
echo "Searching for compression-related code..."
rg "compress|gzip|bgzip" --type py -C 2

# Check if these tests were moved elsewhere
echo "Searching for similar test patterns..."
rg "def test.*gzip|def test.*compress" --type py

Length of output: 5497

bedspec/_io.py (1)

106-110: Avoid shadowing instance variables by renaming parameters

In the bed_type setter method, the parameter bed_type shadows the instance variable self._bed_type. This necessitates the use of # type: ignore[no-redef] to suppress type checker warnings. To improve code clarity and eliminate the need for type ignores, consider renaming the parameter to avoid shadowing the instance variable.

Apply this diff to rename the parameter and remove the type ignore:

 def bed_type(self, new_bed_type: type[BedType]) -> None:
-    self._bed_type: type[BedType] = bed_type  # type: ignore[no-redef]
+    self._bed_type: type[BedType] = new_bed_type
     self._header: list[str] = header(cast(BedLike, new_bed_type))
     self._types: list[type | str | Any] = types(cast(BedLike, new_bed_type))
bedspec/_bedspec.py (1)

67-71: Reconsider using __new__ in BedLike to enforce dataclass decoration

Using __new__ to enforce that subclasses of BedLike are decorated with @dataclass might lead to unexpected behavior and is unconventional. Consider using a metaclass or class decoration to enforce this constraint, or rely on documentation and developer guidelines.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 80fafdc and 0325f9c.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • .github/workflows/tests.yml (4 hunks)
  • CONTRIBUTING.md (1 hunks)
  • README.md (5 hunks)
  • bedspec/__init__.py (1 hunks)
  • bedspec/_bedspec.py (4 hunks)
  • bedspec/_io.py (4 hunks)
  • bedspec/_types.py (0 hunks)
  • bedspec/_typing.py (0 hunks)
  • bedspec/overlap/_overlap.py (6 hunks)
  • build.py (1 hunks)
  • pyproject.toml (3 hunks)
  • tests/test_bedspec.py (4 hunks)
  • tests/test_io.py (12 hunks)
  • tests/test_overlap.py (3 hunks)
  • tests/test_types.py (0 hunks)
  • tests/test_typing.py (0 hunks)
💤 Files with no reviewable changes (4)
  • bedspec/_types.py
  • bedspec/_typing.py
  • tests/test_types.py
  • tests/test_typing.py
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date: console poetry check --lock ``` #...

(UP_TO_DATE_HYPHEN)

README.md

[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping: return all overlapping features - `over...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint
CONTRIBUTING.md

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

🔇 Additional comments (45)
bedspec/__init__.py (2)

7-7: LGTM: BED12 format support added

The addition of Bed12 import follows the existing pattern and completes the support for standard BED formats.


10-10: Verify protocol implementations after module restructuring

The consolidation of protocols into ._bedspec improves code organization. Let's verify the protocol definitions are properly implemented.

Consider adding a docstring at the top of the file to document the module's purpose and the relationship between these protocols and BED record types.

+"""
+bedspec: A Python library for working with BED files.
+
+This module provides a comprehensive set of types and protocols for handling
+BED format data, including:
+- BED record types (Bed2 through Bed12)
+- Core protocols (GenomicSpan, Named, Stranded)
+- Base classes and utilities
+"""
 # ruff: noqa: F401

Also applies to: 13-14, 18-18

✅ Verification successful

Protocol implementations are correctly defined and used in ._bedspec

The verification confirms that:

  • All protocols (GenomicSpan, Named, Stranded) are properly defined in _bedspec.py with appropriate docstrings
  • BedLike and other BED classes correctly implement these protocols
  • The protocol hierarchy is well-structured with clear inheritance patterns (e.g., SimpleBed implements BedLike and GenomicSpan)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify protocol definitions and implementations in _bedspec module

# Test: Check if protocols are properly defined
ast-grep --pattern 'protocol $name($_):
  $$$' 

# Test: Check protocol implementations
ast-grep --pattern 'class $name($protocol):
  $$$'

Length of output: 101


Script:

#!/bin/bash
# Let's try a different approach to verify the protocol definitions and implementations

# Check for protocol-like definitions in _bedspec.py
rg -A 5 "class.*Protocol|Protocol.*:" bedspec/_bedspec.py

# Check for imports and their usage
rg "from typing import Protocol" bedspec/
rg "BedLike|GenomicSpan|Named|Stranded" bedspec/_bedspec.py

Length of output: 1613

build.py (3)

10-10: LGTM! Dependencies list is properly consolidated.

The consolidation of the dependencies list into a single line improves readability while maintaining all necessary header and source file dependencies for the cgranges extension.


15-15: LGTM! Improved code organization.

The added blank line properly separates the extension module definition from the build function, following PEP 8 guidelines for function spacing.


19-22: LGTM! Enhanced dictionary formatting.

The closing brace placement on a new line improves readability of the dictionary structure in the setup_kwargs.update call.

.github/workflows/tests.yml (6)

1-1: LGTM! Descriptive workflow name.

The new name "Code checks" better reflects the workflow's comprehensive nature, as it includes both testing and quality checks.


19-19: LGTM! Enhanced Python version coverage.

Adding Python 3.11 to the test matrix improves compatibility verification across Python versions.


37-38: LGTM! Using recommended installation method.

Using pipx for Poetry installation is the recommended approach as it provides better isolation for Python CLI tools.


55-56: Consider moving poethepoet to dev dependencies.

The suggestion from the previous review about adding poethepoet to dev dependencies in pyproject.toml is still valid.


59-59: Verify the check-all task configuration.

The change to use poetry tasks is good, but let's verify the task definition.

#!/bin/bash
# Check if the check-all task is defined in pyproject.toml
rg "check-all" pyproject.toml

# Look for any task definitions to understand the full testing setup
rg "^\[tool\.poe\.tasks\]" -A 20 pyproject.toml

12-12: Verify Poetry version compatibility.

The upgrade to Poetry 1.8 is good, but let's verify compatibility with the project's dependencies.

✅ Verification successful

Poetry 1.8 upgrade is compatible with project requirements

The upgrade to Poetry 1.8 is safe as:

  • Project explicitly requires poetry-core>=1.8 in build-system requirements
  • No Poetry-specific features or configurations that would be incompatible with 1.8
  • Project's Python version requirement (^3.11.0) and dependencies are all supported by Poetry 1.8
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if pyproject.toml specifies any Poetry version constraints
rg "poetry.*version" pyproject.toml

# Check if there are any Poetry-specific configurations that might be version-sensitive
rg "^\[tool\.poetry\]" -A 10 pyproject.toml

Length of output: 746


Script:

#!/bin/bash
# Check pyproject.toml content
cat pyproject.toml

# Check for any Poetry-specific files or configurations
fd "poetry\..*|pyproject\.toml"

# Look for Python version requirements
rg "python.*version|^python\s*=" pyproject.toml

Length of output: 5371

CONTRIBUTING.md (1)

49-60: Excellent forward-looking documentation!

The note about Poetry v2.0.0 changes and future plugin handling is well-documented and provides valuable context for future maintenance.

README.md (4)

5-5: LGTM: Python version badge accurately updated.

The badge now correctly shows support for Python 3.11 and 3.12.


33-49: LGTM: Clear and accurate I/O examples.

The examples effectively demonstrate both reading and writing operations with proper context management.


62-62: LGTM: Comprehensive list of supported BED types.

The list now includes all available BED format variations.


103-108: LGTM: Improved custom type example.

The changes enhance the example by:

  • Using a more descriptive class name (Bed3Plus1)
  • Adding proper type hints with None support
  • Following consistent field naming (refname)
pyproject.toml (4)

59-88: LGTM! Well-structured task configuration.

The task organization is excellent, with:

  • Clear separation between fix and check tasks
  • Logical task grouping
  • Proper error handling with ignore_fail

134-156: LGTM! Comprehensive linting configuration.

The ruff configuration is well-thought-out with:

  • Essential linting rules enabled
  • Appropriate ignores for common false positives
  • Good balance between strictness and practicality

Line range hint 164-199: LGTM! Well-structured git-cliff configuration.

The configuration properly implements conventional commits with:

  • Clear commit categorization
  • Good changelog structure
  • Appropriate commit parsing rules

34-36: Verify compatibility of new dependencies.

The new dependencies look appropriate, but let's verify their compatibility:

✅ Verification successful

Dependencies are correctly used and compatible

The dependencies are properly integrated in the codebase:

  • msgspec is used for data conversion with convert and to_builtins functions
  • typing-extensions is used for the @override decorator in multiple files
  • Python 3.11 requirement aligns with typing-extensions support for Python 3.8+
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if msgspec and typing-extensions are used in the codebase
echo "Checking msgspec usage:"
rg "import msgspec|from msgspec" -l

echo -e "\nChecking typing-extensions usage:"
rg "import typing_extensions|from typing_extensions" -l

Length of output: 285


Script:

#!/bin/bash
# Check how these dependencies are used in the codebase
echo "Checking msgspec usage in bedspec/_io.py:"
rg "msgspec" bedspec/_io.py -B 2 -A 2

echo -e "\nChecking typing_extensions usage in bedspec/_io.py and _bedspec.py:"
rg "typing_extensions" bedspec/_io.py bedspec/_bedspec.py -B 2 -A 2

# Check Python version usage
echo -e "\nChecking Python version constraints in code:"
rg "python_version|python_requires|Python \d" -g "!*.pyc" -g "!*.txt" -g "!*.md"

Length of output: 1146

bedspec/overlap/_overlap.py (4)

22-22: LGTM: Protocol property implementations are correctly defined.

The addition of NotImplemented return statements in protocol properties follows best practices for abstract protocol definitions in Python, making it explicit that these properties must be implemented by concrete classes.

Also applies to: 27-27, 37-37, 47-47, 57-57


112-120: LGTM: Improved reference sequence name detection.

The switch to hasattr checks from isinstance checks is a good improvement:

  • More maintainable and cleaner approach
  • Better follows Python's duck typing principles
  • Clear error message when no property is found

Line range hint 83-163: LGTM: Well-structured overlap detection implementation.

The OverlapDetector class implementation is robust and well-documented:

  • Clear type hints using the new GenericGenomicSpanLike
  • Comprehensive docstrings explaining requirements
  • Efficient overlap detection using cgranges
  • Proper error handling and state management

69-70: Verify the impact of type variable rename.

The rename from _GenericGenomicSpanLike to GenericGenomicSpanLike (removing underscore prefix) suggests this is now intended for public use. Let's verify all references are updated.

✅ Verification successful

Let's verify the new name usage to ensure the rename was properly implemented.


Type variable rename is consistently implemented

The rename from _GenericGenomicSpanLike to GenericGenomicSpanLike has been properly implemented throughout the codebase. All references to this type variable are using the new name without the underscore prefix, and there are no lingering references to the old name. The type variable is used consistently in the OverlapDetector class and its methods.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to the old name
echo "Checking for any remaining references to _GenericGenomicSpanLike:"
rg "_GenericGenomicSpanLike"

Length of output: 169


Script:

#!/bin/bash
# Search for the new name to verify its usage
echo "Checking references to GenericGenomicSpanLike:"
rg "GenericGenomicSpanLike"

# Also check for any TypeVar definitions to ensure there are no duplicates
echo -e "\nChecking all TypeVar definitions:"
ast-grep --pattern 'TypeVar($$$)'

Length of output: 1956

tests/test_overlap.py (7)

13-14: LGTM: Basic test functions properly updated with refname

The attribute renaming from contig to refname is consistently applied across the basic test functions while maintaining their original test coverage and assertions.

Also applies to: 21-22, 29-30


42-42: LGTM: Improved error handling test cases

The error message is now more specific with the explicit match string "Genomic feature is not hashable but should be", which improves test clarity and maintainability.

Also applies to: 49-49


70-71: LGTM: Feature addition test cases properly updated

The test cases for adding and querying features have been correctly updated with the refname attribute while maintaining their original test coverage.

Also applies to: 79-80


87-95: LGTM: Comprehensive enclosing intervals test cases

The test cases thoroughly cover various scenarios for enclosing intervals, including edge cases, with consistent use of the refname attribute.


100-108: LGTM: Thorough enclosed by intervals test coverage

The test cases provide comprehensive coverage for enclosed by scenarios, complementing the enclosing tests with proper edge cases.


113-125: LGTM: Comprehensive overlap detection test cases

The test cases provide thorough coverage of various overlap scenarios across different chromosomes and positions, with a good balance of positive and negative cases.


130-142: LGTM: Boolean overlap query tests properly implemented

The test cases effectively verify the boolean overlap query functionality, maintaining consistency with the more detailed overlap detection tests.

tests/test_bedspec.py (5)

Line range hint 1-22: LGTM! Import statements are well-organized.

The imports are logically grouped and all appear to be used within the test file.


24-35: LGTM! Comprehensive strand tests.

The tests effectively cover both string representation and opposite strand functionality. Good test isolation with separate functions for distinct behaviors.


38-71: LGTM! Well-structured color validation tests.

The tests effectively use parameterization for validation cases and include comprehensive error handling. Implementation follows the suggested structure from previous reviews.


73-104: LGTM! Comprehensive type system verification.

Tests effectively verify the type hierarchy, dataclass behavior, and structural types using parameterization where appropriate.


Line range hint 203-223: LGTM! Clear and effective territory tests.

The tests effectively verify territory behavior with explicit expected values and clear assertions.

tests/test_io.py (5)

10-14: LGTM: Import additions are consistent with the changes

The addition of Bed12 and BedLike imports aligns with the new functionality being tested.


Line range hint 25-45: LGTM: Test cases are comprehensive and consistent

The parameter renaming from 'contig' to 'refname' is consistently applied across all test cases, and the test coverage spans various BED types with appropriate test data.


49-49: LGTM: Type hint improvement

The change from BedType to BedLike better represents the type constraints and aligns with the new abstract base class implementation.


251-275: LGTM: Comprehensive Bed12 roundtrip test

The new test thoroughly validates Bed12 functionality with:

  • Complete coverage of all Bed12 attributes
  • Proper testing of both write and read operations
  • Validation of complex types (lists, enums)

62-62: Verify consistent usage of 'refname' parameter

The parameter has been renamed from 'contig' to 'refname' throughout this file. Let's verify this change is consistent across the codebase.

#!/bin/bash
# Check for any remaining usage of 'contig' parameter
echo "Searching for remaining 'contig' usage in BED-related code..."
rg "contig.*=|def.*contig" --type py

# Check for consistent usage of 'refname'
echo "Verifying 'refname' usage in BED-related code..."
rg "refname.*=|def.*refname" --type py

Also applies to: 71-71, 88-88, 104-104, 120-120, 135-135, 154-154, 163-163, 188-188, 205-205, 223-223, 233-233, 243-243, 254-254

bedspec/_io.py (2)

266-279: Avoid manual JSON string construction

Constructing JSON strings manually by concatenating strings can lead to syntax errors and security vulnerabilities. It's safer and more efficient to build a dictionary and serialize it using json.dumps. The previous review comment regarding this issue is still applicable to this code segment.


240-250: Verify resource management in __iter__ method

In the __iter__ method of BedReader, self.close() is called after iteration. This may lead to unexpected behavior if the BedReader instance is intended to be reused or if multiple iterations are required. Additionally, if BedReader is used within a context manager, closing the file handle here might interfere with the context manager's exit handling. Please verify whether calling self.close() in this method is appropriate.

bedspec/_bedspec.py (2)

257-279: Verify the validation logic in Bed12.__post_init__ against BED12 specifications

The validation logic in __post_init__ should strictly adhere to the BED12 format specifications. Please review the following:

  • Block Starts: Confirm whether block_starts[0] must be 0. Some specifications allow block_starts to contain relative offsets, not necessarily starting at 0.
  • Block Boundaries: Check the calculation (self.start + self.block_starts[-1] + self.block_sizes[-1]) == self.end. Ensure that this correctly validates that the blocks align with the overall interval defined by start and end.

Would you like assistance in reviewing the BED12 specification and adjusting the validation logic accordingly?


36-38: ⚠️ Potential issue

Use == instead of is for string comparison in BedStrand.opposite

In the opposite method, comparing BedStrand values using is may not work as expected since BedStrand inherits from StrEnum, and its members are instances of str. It's safer to use == for comparison to ensure correct behavior.

Apply this diff to fix the comparison:

-def opposite(self) -> "BedStrand":
-    return BedStrand.Negative if self is BedStrand.Positive else BedStrand.Positive
+def opposite(self) -> "BedStrand":
+    return BedStrand.Negative if self == BedStrand.Positive else BedStrand.Positive

Likely invalid or redundant comment.

bedspec/_io.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
tests/test_io.py (1)

Based on the findings, I can now generate the final response:

Gzip compression support has been removed from the codebase

The codebase has been simplified to handle only plain text files:

  • All gzip-related tests have been removed
  • BedReader.from_path and BedWriter.from_path now only open files in plain text mode ("r"/"w")
  • While gzip extension constants are still defined in _io.py, there's no active code using them

This appears to be an intentional simplification of the I/O handling, removing compression support entirely rather than a temporary removal or relocation of tests.

🔗 Analysis chain

Line range hint 278-332: Verify the removal of gzip compression tests

The commented-out gzip compression tests have been removed. Please clarify:

  1. Is gzip compression support being deprecated?
  2. Are these tests being moved elsewhere?
  3. Is this a temporary removal pending refactoring?

Let me check the content of these files to understand how gzip compression is being handled.


Let me check if there are any new or replacement tests for handling compressed files, as the current findings only show the commented tests and extension definitions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if gzip functionality is referenced elsewhere in the codebase
echo "Searching for gzip-related code..."
rg -l 'gzip|compression' --type py

# Check for any related test files
echo "Checking for other compression-related test files..."
fd -e py -e test test_.*compression

Length of output: 302


Script:

#!/bin/bash
# Check the content of test_io.py for any active gzip tests
echo "=== Active gzip tests in test_io.py ==="
rg "def test.*gzip" tests/test_io.py -B1 -A1

# Check the implementation in _io.py
echo -e "\n=== Gzip handling in _io.py ==="
rg "gzip|compression" bedspec/_io.py -B2 -A2

Length of output: 2301


Script:

#!/bin/bash
# Look for any active tests related to file handling in test_io.py
echo "=== Active tests for file handling ==="
rg "^def test.*" tests/test_io.py -B1 -A1

# Check for any recent changes in the implementation of file handling
echo -e "\n=== File handling implementation in _io.py ==="
rg "class.*Reader|class.*Writer|def from_path|def open" bedspec/_io.py -B1 -A3

Length of output: 3508

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0325f9c and 9ac8724.

📒 Files selected for processing (1)
  • tests/test_io.py (12 hunks)
🔇 Additional comments (5)
tests/test_io.py (5)

10-13: LGTM: Clean import additions

The new imports for Bed12 and BedLike are properly organized and align with the updated type system.


Line range hint 25-44: LGTM: Consistent parameter renaming

The renaming from 'contig' to 'refname' is consistently applied across all test cases. The parametrization structure is clean and maintainable.


Line range hint 49-55: LGTM: Improved type flexibility

The change from BedType to BedLike appropriately broadens the accepted types while maintaining type safety. The simplified BedWriter instantiation improves readability.


Line range hint 173-200: LGTM: Simplified BedReader instantiation

The updated BedReader instantiation pattern is consistently applied and maintains proper error handling. The path-based and string-based initialization tests provide good coverage.


252-276: LGTM: Comprehensive Bed12 roundtrip testing

The new test thoroughly validates Bed12 functionality with complex types, including proper initialization, writing, and reading. All fields are properly tested and validated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (8)
CONTRIBUTING.md (2)

3-39: Consider adding more context about command usage and outcomes.

While the commands are well-organized, it would be helpful to add:

  • Expected outcomes or success criteria for each command
  • When developers should run these commands (e.g., before submitting PR, after changes)
  • What to do if the commands fail

41-93: Consider adding a table of contents for task commands.

The Shortcut Task Commands section contains multiple subsections with different types of commands. A table of contents would help developers quickly find the specific command they need.

Add this at the start of the section:

## Shortcut Task Commands

- [Individual Checks](#for-running-individual-checks)
- [All Checks](#for-running-all-checks)
- [Individual Fixes](#for-running-individual-fixes)
- [All Fixes](#for-running-all-fixes)
- [All Fixes and Checks](#for-running-all-fixes-and-checks)
🧰 Tools
🪛 Markdownlint

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

bedspec/_io.py (1)

134-139: Enhance error message clarity

The error message could be more descriptive by including the actual instance types.

-                f"BedWriter can only continue to write features of the same type."
-                f" Will not write a {type(bed).__name__} after a"
-                f" {self.bed_type.__name__}."
+                f"BedWriter can only write features of the same type. "
+                f"Attempted to write {type(bed).__name__} but writer was initialized with {self.bed_type.__name__}."
bedspec/_bedspec.py (3)

142-146: Consider enhancing PairBed validation.

The validation could be more comprehensive by checking for:

  1. Relative positions when intervals are on the same chromosome
  2. Negative coordinates

Consider adding this validation:

     def __post_init__(self) -> None:
         """Validate this pair of BED records."""
         if self.start1 >= self.end1 or self.start1 < 0:
             raise ValueError("start1 must be greater than 0 and less than end1!")
         if self.start2 >= self.end2 or self.start2 < 0:
             raise ValueError("start2 must be greater than 0 and less than end2!")
+        # When on same chromosome, ensure no overlaps
+        if self.refname1 == self.refname2:
+            if not (self.end1 <= self.start2 or self.end2 <= self.start1):
+                raise ValueError("When on same chromosome, intervals must not overlap!")

177-184: Enhance error handling in BedColor.from_string.

The error handling could be more specific about the type of error encountered.

Consider this enhanced error handling:

     @classmethod
     def from_string(cls, string: str) -> "BedColor":
         """Build a BED color instance from a string."""
         try:
             r, g, b = map(int, string.split(","))
         except ValueError as error:
-            raise ValueError(f"Invalid string '{string}'. Expected 'int,int,int'!") from error
+            if "," not in string:
+                raise ValueError(f"Invalid format in '{string}'. Expected 'r,g,b' with comma separators!") from error
+            if string.count(",") != 2:
+                raise ValueError(f"Invalid number of components in '{string}'. Expected exactly three values!") from error
+            raise ValueError(f"Invalid integer values in '{string}'. All components must be integers!") from error
         return cls(r, g, b)

258-280: Consider refactoring Bed12 validation for better maintainability.

The validation logic in __post_init__ is comprehensive but could be split into smaller, focused methods for better maintainability and readability.

Consider this refactoring:

     def __post_init__(self) -> None:
         """Validate this BED12 record."""
         super().__post_init__()
+        self._validate_thick_bounds()
+        self._validate_blocks()
+
+    def _validate_thick_bounds(self) -> None:
         if (self.thick_start is None) != (self.thick_end is None):
             raise ValueError("thick_start and thick_end must both be None or both be set!")
+
+    def _validate_blocks(self) -> None:
         if self.block_count is None:
             if self.block_sizes is not None or self.block_starts is not None:
                 raise ValueError("block_count, block_sizes, block_starts must all be set or unset!")
         else:
             if self.block_sizes is None or self.block_starts is None:
                 raise ValueError("block_count, block_sizes, block_starts must all be set or unset!")
             if self.block_count <= 0:
                 raise ValueError("When set, block_count must be greater than or equal to 1!")
             if self.block_count != len(self.block_sizes) or self.block_count != len(
                 self.block_starts
             ):
                 raise ValueError("Length of block_sizes and block_starts must equal block_count!")
             if self.block_starts[0] != 0:
                 raise ValueError("block_starts must start with 0!")
             if any(size <= 0 for size in self.block_sizes):
                 raise ValueError("All sizes in block_size must be greater than or equal to one!")
             if (self.start + self.block_starts[-1] + self.block_sizes[-1]) != self.end:
                 raise ValueError("The last defined block's end must be equal to the BED end!")
bedspec/overlap/_overlap.py (2)

16-27: Consider raising NotImplementedError in property methods

In the Span and ReferenceSpan protocols, the property methods start, end, and refname return NotImplemented. It is more idiomatic in Python to raise NotImplementedError to clearly indicate that these properties need to be implemented by concrete subclasses.

Apply this diff:

class Span(Hashable, Protocol):
    """A span with a start and an end. 0-based open-ended."""

    @property
    def start(self) -> int:
        """A 0-based start position."""
-        return NotImplemented
+        raise NotImplementedError

    @property
    def end(self) -> int:
        """A 0-based open-ended position."""
-        return NotImplemented
+        raise NotImplementedError

class ReferenceSpan(Span, Protocol):
    """A feature on a reference sequence."""

    @property
    def refname(self) -> str:
        """A reference sequence name."""
-        return NotImplemented
+        raise NotImplementedError

Also applies to: 31-37


40-44: Evaluate the necessity of separate TypeVars

Both ReferenceSpanType and GenericReferenceSpanType are TypeVars bounded to ReferenceSpan. Consider whether both are necessary or if they can be consolidated to simplify the type annotations, enhancing code readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9ac8724 and 3d4d2d2.

📒 Files selected for processing (6)
  • CONTRIBUTING.md (1 hunks)
  • bedspec/_bedspec.py (4 hunks)
  • bedspec/_io.py (4 hunks)
  • bedspec/overlap/__init__.py (1 hunks)
  • bedspec/overlap/_overlap.py (1 hunks)
  • tests/test_overlap.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • bedspec/overlap/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_overlap.py
🧰 Additional context used
🪛 Markdownlint
CONTRIBUTING.md

61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time

(MD001, heading-increment)

🔇 Additional comments (8)
CONTRIBUTING.md (1)

43-60: Well-documented future compatibility note!

The note about Poetry v2.0.0 changes is excellent. It provides clear documentation about future changes and includes the exact TOML configuration that will be needed.

bedspec/_io.py (4)

30-34: Well-defined type definitions!

The new type definitions BedType and JsonType are clear, well-documented, and appropriately constrained. The BedType TypeVar with bound=BedLike ensures type safety.


141-156: LGTM! Well-implemented serialization methods

The _bed_to_dict and _encode methods are well-implemented:

  • Uses msgspec's to_builtins with deterministic order for consistent output
  • Handles special cases (None, collections, BedColor) appropriately

239-248: LGTM! Efficient record iteration implementation

The __iter__ implementation is well-designed:

  • Uses msgspec's convert for efficient record conversion
  • Properly yields converted records
  • Automatically closes the reader when done

263-278: Reuse existing review for JSON construction

This code segment was previously reviewed and the suggestion to avoid manual JSON string construction is still valid.

bedspec/_bedspec.py (2)

20-24: LGTM! Well-structured protocol for dataclass validation.

The DataclassInstance protocol provides a clean way to enforce dataclass requirements at runtime.


27-39: LGTM! Well-implemented strand enum with useful helper method.

The BedStrand enum provides type safety for strand representation and includes a helpful opposite() method for strand manipulation.

bedspec/overlap/_overlap.py (1)

50-113: Well-structured use of generics and protocols

The implementation of the OverlapDetector using generics and protocols enhances type safety and flexibility. The refactoring improves the clarity of the code and makes it more maintainable.

CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
CONTRIBUTING.md (1)

3-39: LGTM! Consider adding expected output examples.

The Primary Development Commands section is well-organized and covers all essential development tasks. The commands are clearly presented with appropriate descriptions.

Consider adding example output or success criteria for each command to help developers verify correct execution.

bedspec/overlap/_overlap.py (2)

40-44: Enhance type variable documentation.

The docstrings could be more descriptive to explain:

  • The purpose and use case for each type variable
  • The distinction between QueryReferenceSpanType and GenericReferenceSpanType
  • When to use each type in the context of overlap detection

Consider updating the docstrings:

 QueryReferenceSpanType = TypeVar("QueryReferenceSpanType", bound=ReferenceSpan)
-"""A 0-based end-exclusive feature on a reference sequence."""
+"""Type variable for features being queried against the overlap detector."""

 GenericReferenceSpanType = TypeVar("GenericReferenceSpanType", bound=ReferenceSpan)
-"""A generic 0-based end-exclusive feature on a reference sequence."""
+"""Type variable for features stored within the overlap detector that can be matched against queries."""

75-85: Enhance error handling in the add method.

The error handling could be more comprehensive:

  • Consider validating start/end positions
  • Add type validation for start/end being integers
  • Verify that end > start

Consider adding these validations:

 def add(self, feature: GenericReferenceSpanType) -> None:
     """Add a feature to this overlap detector."""
     if not isinstance(feature, Hashable):
         raise ValueError(f"Genomic feature is not hashable but should be: {feature}")
+    
+    if not isinstance(feature.start, int) or not isinstance(feature.end, int):
+        raise ValueError(f"Start and end positions must be integers: {feature}")
+    
+    if feature.end <= feature.start:
+        raise ValueError(f"End position must be greater than start position: {feature}")

     refname: Refname = feature.refname
     feature_idx: int = len(self._refname_to_features[refname])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4d2d2 and 0731483.

📒 Files selected for processing (3)
  • CONTRIBUTING.md (1 hunks)
  • bedspec/overlap/__init__.py (1 hunks)
  • bedspec/overlap/_overlap.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bedspec/overlap/init.py
🔇 Additional comments (4)
CONTRIBUTING.md (1)

61-93: LGTM! Heading levels have been corrected.

The command sections are well-organized with proper heading hierarchy (h3) and consistent formatting. The grouping of commands (checks, fixes, combined operations) provides a clear workflow structure.

The heading levels issue raised in the previous review has been addressed.

bedspec/overlap/_overlap.py (3)

16-37: Well-structured protocol definitions!

The Span and ReferenceSpan protocols are well-designed with:

  • Clear separation of concerns between basic span properties and reference-specific properties
  • Proper use of @runtime_checkable for runtime type verification
  • Good inheritance hierarchy

50-70: Excellent implementation of the overlap detector!

The class demonstrates good design choices:

  • Efficient use of cgranges for overlap detection
  • Smart lazy indexing strategy per reference sequence
  • Clear separation of concerns in data structures

92-116: Well-implemented overlap detection methods!

The methods demonstrate:

  • Efficient use of generators
  • Clear and correct overlap detection logic
  • Proper handling of index updates

CONTRIBUTING.md Show resolved Hide resolved
@clintval clintval merged commit 0db2106 into main Nov 7, 2024
7 checks passed
@clintval clintval deleted the cv_better_serde branch November 7, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant