-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: simplify how the BedReader and BedWriter work #21
Conversation
767833b
to
b87c765
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
b87c765
to
aed3112
Compare
Warning Rate limit exceeded@clintval has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 5 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes include updates to workflow configurations, documentation, and code structure within the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (29)
bedspec/__init__.py (1)
Line range hint
1-1
: Consider documenting the purpose of noqa.While the
# ruff: noqa: F401
comment is correct for suppressing unused import warnings (as these are meant to be re-exported), it might be helpful to add a brief comment explaining why these imports are intentionally marked as used.-# ruff: noqa: F401 +# ruff: noqa: F401 # These imports are intentionally re-exported as part of the public API.github/workflows/tests.yml (3)
1-1
: Consider a more specific workflow name.The name "Code checks" is quite generic. Since this workflow primarily runs tests, consider a more specific name like "Tests" or "Unit Tests" to better reflect its purpose. This helps with clarity when viewing GitHub Actions dashboard.
Also applies to: 12-12
15-20
: Consider testing against multiple Python versions.Currently, the workflow only tests against Python 3.12. Consider adding more Python versions (e.g., 3.10, 3.11) to ensure compatibility across different Python environments. This is especially important if this is a library meant to be used by others.
Example matrix configuration:
strategy: matrix: - PYTHON_VERSION: ["3.12"] + PYTHON_VERSION: ["3.10", "3.11", "3.12"]
55-56
: Consider moving task runner installation before cache setup.Installing poethepoet after the cache setup means it won't be included in the cache. Moving this installation before the cache setup could improve workflow execution time in subsequent runs.
CONTRIBUTING.md (2)
35-38
: Fix hyphenation in compound adjectiveThe phrase "up to date" should be hyphenated when used as a compound adjective before a noun.
-To check the lock file is up to date: +To check the lock file is up-to-date:🧰 Tools
🪛 LanguageTool
[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age htmlTo check the lock file is up to date:
console poetry check --lock ``` #...(UP_TO_DATE_HYPHEN)
61-61
: Fix heading levels for consistent document structureThe section headings are using h6 (
######
) but should use h3 (###
) to maintain proper heading hierarchy, as they follow an h2 section.-###### For Running Individual Checks +### For Running Individual Checks -###### For Running All Checks +### For Running All Checks -###### For Running Individual Fixes +### For Running Individual Fixes -###### For Running All Fixes +### For Running All Fixes -###### For Running All Fixes and Checks +### For Running All Fixes and ChecksAlso applies to: 71-71, 77-77, 84-84, 90-90
🧰 Tools
🪛 Markdownlint
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time(MD001, heading-increment)
README.md (4)
33-36
: Consider enhancing the writing example.While the example is correct, consider expanding it to demonstrate more real-world scenarios:
from bedspec import BedWriter +from bedspec import Bed3 +# Create multiple BED records +beds = [ + Bed3("chr1", start=2, end=8), + Bed3("chr2", start=5, end=10), +] + with BedWriter.from_path("test.bed") as writer: - writer.write(bed) + # Write multiple records + for bed in beds: + writer.write(bed)
75-82
: Consider adding type compatibility note.The example effectively demonstrates overlap detection between different BED types. Consider adding a comment explaining that the detector works with any BED type that implements the required interval protocol.
detector = OverlapDetector[Bed3]([bed1, bed2]) +# OverlapDetector works with any BED type implementing the interval protocol my_feature = Bed4("chr1", start=2, end=3, name="hi-mom")
85-91
: Improve list formatting consistency.The operations list should have consistent punctuation:
-The overlap detector supports the following operations: +The overlap detector supports the following operations - `overlapping`: return all overlapping features - `overlaps`: test if any overlapping features exist - `those_enclosed_by`: return those enclosed by the input feature - `those_enclosing`: return those enclosing the input feature🧰 Tools
🪛 LanguageTool
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: -overlapping
: return all overlapping features - `over...(UNLIKELY_OPENING_PUNCTUATION)
103-108
: Consider adding documentation for the custom BED type.The example would benefit from additional documentation:
@dataclass(eq=True) class Bed3Plus1(SimpleBed): + """BED3 format with an additional custom field. + + Inherits from SimpleBed which provides the core BED3 functionality. + """ refname: str start: int end: int - my_custom_field: float | None + my_custom_field: float | None # Custom score or measurementpyproject.toml (3)
51-52
: Consider pinning major versions for build dependencies.While using
>=
is flexible, it might lead to unexpected build issues if major versions introduce breaking changes. Consider using^
instead for more predictable builds:-requires = ["poetry-core>=1.8", "setuptools>=70.1.1", "Cython>=3.0.10"] +requires = ["poetry-core^1.8.0", "setuptools^70.1.1", "Cython^3.0.10"]
37-38
: Address TODO comments in configuration.There are two TODO items that should be addressed:
- Satisfying basedpyright warnings and adding to CI
- Getting doctests to work and run on README
Would you like me to help create GitHub issues to track these tasks?
Also applies to: 42-43
92-107
: Consider documenting strict type checking rationale.The mypy configuration is very strict, which is good for code quality but might be challenging for contributors. Consider adding a comment explaining the rationale for these strict settings.
[tool.mypy] +# Strict type checking configuration to ensure high code quality and catch potential issues early files = ["build.py", "bedspec/", "tests/"]
bedspec/overlap/_overlap.py (1)
Line range hint
139-147
: Consider caching the index state check.The overlap-related methods (
overlapping
,overlaps
,those_enclosing
,those_enclosed_by
) all potentially trigger index rebuilding. Consider caching the index state check to avoid repeated checks within the same query session.def overlapping(self, feature: GenomicSpanLike) -> Iterator[GenericGenomicSpanLike]: """Yields all the overlapping features for a given genomic span.""" refname: Refname = self._reference_sequence_name(feature) + tree = self._refname_to_tree[refname] - if refname in self._refname_to_tree and not self._refname_to_is_indexed[refname]: - self._refname_to_tree[refname].index() # index the tree if we find it is not indexed + if not self._refname_to_is_indexed.get(refname, True): + tree.index() + self._refname_to_is_indexed[refname] = True - for *_, idx in self._refname_to_tree[refname].overlap(refname, feature.start, feature.end): + for *_, idx in tree.overlap(refname, feature.start, feature.end): yield self._refname_to_features[refname][idx]Also applies to: 149-150, 153-157, 159-163
tests/test_bedspec.py (4)
23-34
: Consider adding edge cases for strand tests.The tests cover basic functionality well. Consider adding test cases for:
- Invalid strand characters
- Case sensitivity (e.g., '+' vs 'plus')
- None/empty values
Line range hint
88-126
: Consider adding negative test cases and improve readability.While the positive test cases are comprehensive, consider:
- Adding negative test cases for invalid instantiations
- Breaking down long assertions for better readability
Example refactor for better readability:
- assert record.bed1 == Bed6(refname="chr1", start=1, end=2, name="foo", score=5, strand=BedStrand.Positive) # fmt: skip # noqa: E501 + expected_bed1 = Bed6( + refname="chr1", + start=1, + end=2, + name="foo", + score=5, + strand=BedStrand.Positive + ) + assert record.bed1 == expected_bed1
129-156
: Consider adding edge cases for territory calculations.The tests cover basic functionality well. Consider adding test cases for:
- Zero-length territories
- Overlapping territories
- Territories spanning multiple chromosomes
Line range hint
187-205
: Consider adding more test cases for paired BED territories.The test covers basic functionality. Consider adding test cases for:
- Same chromosome pairs
- Reverse ordered coordinates
- Adjacent territories
- Overlapping territories
tests/test_io.py (1)
251-251
: TODO comment needs to be addressed.The comment indicates that BED12 spec compliance verification is missing. This is important for ensuring compatibility with other tools.
Would you like me to help create a test case that verifies BED12 spec compliance?
bedspec/_bedspec.py (5)
25-40
: Enhance documentation for theBedStrand
enum and itsopposite
methodThe
BedStrand
enum now includes anopposite
method. Consider adding more detailed docstrings or usage examples to clarify the purpose and usage of this method, improving code readability and maintainability.
43-64
: Consolidate protocols to reduce complexityMultiple protocols (
GenomicSpan
,Named
,Stranded
) are introduced to define structural interfaces. Evaluate if all these protocols are necessary or if they can be consolidated to simplify the codebase and reduce potential over-engineering.
80-90
: Consider caching results inheader
andtypes
functions for efficiencyThe
header
andtypes
functions generate lists of field names and types for BED records. If these functions are called frequently, consider implementing caching mechanisms to improve performance by avoiding redundant computations.
109-126
: Improve validation error messages inSimpleBed
's__post_init__
methodThe
__post_init__
method inSimpleBed
raises aValueError
whenstart >= end
orstart < 0
. Enhance the error message to provide specific details about the invalid values, aiding in debugging and ensuring clarity.Apply this diff to refine the error message:
def __post_init__(self) -> None: """Validate this linear BED record.""" if self.start >= self.end or self.start < 0: - raise ValueError("start must be greater than 0 and less than end!") + raise ValueError(f"Invalid interval: start ({self.start}) must be >= 0 and less than end ({self.end})")
132-151
: Unify validation logic inPairBed
's__post_init__
methodThe
__post_init__
method inPairBed
separately validatesstart1
,end1
,start2
, andend2
. Consider creating a shared validation function to reduce code duplication and ensure consistency in error handling.bedspec/_io.py (5)
103-105
: Avoid unnecessary type re-annotation and 'type: ignore' commentIn the
bed_type
setter, re-annotatingself._bed_type
and suppressing the type checker with# type: ignore[no-redef]
is unnecessary since it was already annotated in the constructor.Apply this diff to remove the redundant annotation and comment:
def bed_type(self, bed_type: type[BedType]) -> None: - self._bed_type: type[BedType] = bed_type # type: ignore[no-redef] + self._bed_type = bed_type self._header: list[str] = header(cast(BedLike, bed_type)) self._types: list[type | str | Any] = types(cast(BedLike, bed_type))
132-138
: Consider using 'isinstance' for type checkingUsing
type(bed) is not self.bed_type
checks for exact type matches, which might not allow subclasses ofself.bed_type
. Usingisinstance
provides more flexibility and adheres to the principles of polymorphism.Apply this diff to use
isinstance
:def _maybe_setup_with(self, bed: BedType) -> None: """Perform post-initialization and record validation.""" if self.bed_type is None: self.bed_type = type(bed) - if type(bed) is not self.bed_type: + if not isinstance(bed, self.bed_type): raise TypeError( f"BedWriter can only continue to write features of the same type." f" Will not write a {type(bed).__name__} after a" f" {self.bed_type.__name__}." )
249-255
: Simplify '_pre_decode' method to avoid unnecessary JSON operationsThe
_pre_decode
method serializes objects usingjson.dumps
and then deserializes them later, which adds unnecessary overhead.Consider returning the dictionary directly without JSON serialization:
def _pre_decode(kind: type, obj: Any) -> Any: if obj == MISSING_FIELD and NoneType in get_args(kind): return None if kind is BedColor or BedColor in get_args(kind): - return json.dumps(as_dict(BedColor.from_string(cast(str, obj)))) + return as_dict(BedColor.from_string(cast(str, obj))) if kind is BedStrand or BedStrand in get_args(kind): - return f'"{obj}"' + return obj return obj
213-217
: Specify type parameters for 'DictReader' for clarityProviding type parameters for
DictReader
improves type checking and code readability.Adjust the initialization to include type parameters:
-self._reader: DictReader = DictReader( +self._reader: DictReader[str] = DictReader( without_comments(), delimiter="\t", fieldnames=self._header, )
201-204
: Consider generalizing field type handlingThe attributes
_header
and_types
are generated by castingbed_type
toBedLike
, which may be unnecessary ifbed_type
is already aBedLike
. This could be simplified to improve readability.If
bed_type
is guaranteed to be aBedLike
, you might not need thecast
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (16)
.github/workflows/tests.yml
(4 hunks)CONTRIBUTING.md
(1 hunks)README.md
(4 hunks)bedspec/__init__.py
(1 hunks)bedspec/_bedspec.py
(4 hunks)bedspec/_io.py
(4 hunks)bedspec/_types.py
(0 hunks)bedspec/_typing.py
(0 hunks)bedspec/overlap/_overlap.py
(6 hunks)build.py
(1 hunks)pyproject.toml
(4 hunks)tests/test_bedspec.py
(4 hunks)tests/test_io.py
(12 hunks)tests/test_overlap.py
(3 hunks)tests/test_types.py
(0 hunks)tests/test_typing.py
(0 hunks)
💤 Files with no reviewable changes (4)
- bedspec/_types.py
- bedspec/_typing.py
- tests/test_types.py
- tests/test_typing.py
✅ Files skipped from review due to trivial changes (1)
- build.py
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md
[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date:
console poetry check --lock ``` #...
(UP_TO_DATE_HYPHEN)
README.md
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping
: return all overlapping features - `over...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 Markdownlint
CONTRIBUTING.md
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time
(MD001, heading-increment)
🔇 Additional comments (30)
bedspec/__init__.py (1)
7-7
: LGTM! Well-organized imports with improved type consolidation.
The consolidation of types and protocols from _types
into _bedspec
improves code organization. The new imports (Bed12
, BedLike
, GenomicSpan
, Named
, Stranded
, header
, types
) are logically grouped and maintain a clean public API.
Let's verify that all the imported types are properly defined in _bedspec
:
Also applies to: 10-10, 13-14, 18-20
✅ Verification successful
All imported types and utilities are properly defined in _bedspec
The verification confirms that all imported types (Bed12
, BedLike
, GenomicSpan
, Named
, Stranded
) and utilities (header
, types
) are properly defined in _bedspec.py
with appropriate implementations:
- Protocol classes:
GenomicSpan
,Named
,Stranded
- Abstract base class:
BedLike
- Concrete class:
Bed12
- Utility functions:
header
andtypes
for field introspection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that all imported types exist in _bedspec
# Expected: All types should be found in _bedspec.py
echo "Checking for type definitions in _bedspec.py..."
rg -l "class (Bed12|BedLike|GenomicSpan|Named|Stranded)" bedspec/_bedspec.py
echo "Checking for utility definitions..."
rg -l "(header|types) =" bedspec/_bedspec.py
Length of output: 351
Script:
#!/bin/bash
# Let's verify the actual implementations to ensure all imported types are properly defined
echo "Checking class definitions and their content:"
rg "class (Bed12|BedLike|GenomicSpan|Named|Stranded)" -A 2 bedspec/_bedspec.py
echo -e "\nChecking utility definitions:"
rg "(header|types) =" -A 2 bedspec/_bedspec.py
Length of output: 963
.github/workflows/tests.yml (2)
37-38
: LGTM! Using pipx is the recommended approach.
The switch to pipx for installing Poetry follows best practices as it provides better isolation for Python applications.
59-59
: Verify the contents of the check-all
task.
The change from poetry run pytest
to poetry task check-all
suggests this task might include additional checks beyond testing. Please ensure all necessary checks (tests, linting, type checking, etc.) are included in this task.
✅ Verification successful
check-all
task is well-defined and comprehensive
The check-all
task is a sequence that includes all necessary quality checks:
- Poetry lock file verification (
poetry check --lock
) - Code formatting check (
ruff format
) - Linting (
ruff check
) - Tests (
pytest
) - Type checking (
mypy
)
The change from direct pytest execution to check-all
is actually an improvement as it ensures all quality checks are run together.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check the definition of the check-all task
rg "check-all" -A 5 pyproject.toml
Length of output: 334
Script:
#!/bin/bash
# Let's check the definitions of individual check tasks
rg "check-(lock|format|lint|tests|typing)" -A 2 pyproject.toml
Length of output: 349
CONTRIBUTING.md (2)
43-60
: Well-documented plugin installation and future compatibility notes!
The installation instructions are clear, and the note about Poetry v2.0.0 provides valuable context for future maintenance.
63-93
: Excellent organization of task commands!
The commands are well-structured and logically organized, progressing from individual operations to comprehensive checks and fixes. This makes it easy for developers to find and use the appropriate commands for their needs.
README.md (1)
22-28
: LGTM! Clear and concise example.
The new section effectively demonstrates the basic usage of Bed3
class.
pyproject.toml (2)
57-86
: Well-structured task organization!
The poethepoet tasks are logically organized with clear separation between fix and check operations. The sequence tasks provide good automation for common development workflows.
Line range hint 159-192
: Excellent conventional commits configuration!
The git-cliff configuration is well-structured with:
- Clear commit type categorization
- Good changelog formatting
- Proper handling of breaking changes
bedspec/overlap/_overlap.py (3)
22-22
: LGTM: Protocol method implementations are properly defined.
The addition of NotImplemented
return statements in protocol methods follows best practices by making it explicit that these methods must be implemented by concrete classes.
Also applies to: 27-27, 37-37, 47-47, 57-57
112-120
: LGTM: Improved reference sequence name detection.
The use of hasattr
checks is more pythonic and flexible than isinstance
checks, allowing for duck typing. The error message is clear and helpful.
69-70
: Verify the impact of type variable renaming.
The renaming of _GenericGenomicSpanLike
to GenericGenomicSpanLike
(removing underscore prefix) suggests this is now part of the public API. Ensure all imports and usages are updated accordingly.
✅ Verification successful
Based on the search results, I can generate the final response as we have sufficient information:
Type variable renaming impact verified - no remaining references to old name
The renaming from _GenericGenomicSpanLike
to GenericGenomicSpanLike
has been properly implemented:
- No references to the old name
_GenericGenomicSpanLike
found in the codebase - All usages of
GenericGenomicSpanLike
are contained withinbedspec/overlap/_overlap.py
where it's defined - The type variable is used consistently throughout the module in class definition, method signatures, and type hints
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to the old name
# and verify all imports of the new name are working
# Search for any remaining references to the old name
echo "Checking for old name references..."
rg "_GenericGenomicSpanLike"
# Search for imports of the new name to verify usage
echo "Checking new name usage..."
rg "GenericGenomicSpanLike"
Length of output: 1323
tests/test_overlap.py (8)
13-14
: LGTM! Attribute rename is consistently applied.
The changes from contig
to refname
are correctly applied across the basic functionality tests while maintaining the original test coverage and assertions.
Also applies to: 21-22, 29-30
42-42
: LGTM! Error message is more descriptive.
The test correctly verifies the hashable requirement with a more specific error message.
Also applies to: 49-49
54-55
: LGTM! Edge case handling is well tested.
The test effectively verifies the behavior when an invalid reference name property is used.
70-71
: LGTM! Feature addition and querying tests are comprehensive.
The changes maintain the test coverage while consistently applying the attribute rename.
Also applies to: 79-80
87-95
: LGTM! Enclosing interval tests are thorough.
The test cases effectively cover various interval relationships while maintaining consistency with the attribute rename.
100-108
: LGTM! Enclosed interval tests are comprehensive.
The test cases effectively verify the enclosed-by relationship with good coverage of edge cases.
113-115
: LGTM! Overlap detection tests are well-structured.
The test cases provide comprehensive coverage of various overlap scenarios while maintaining consistency with the attribute rename.
Also applies to: 120-125
130-132
: LGTM! Boolean overlap query tests are thorough.
The test cases effectively verify the overlap query functionality with good coverage of edge cases.
Also applies to: 137-142
tests/test_bedspec.py (2)
Line range hint 1-21
: LGTM! Imports are well-organized and complete.
The imports are properly structured and include all necessary components for the tests.
55-86
: LGTM! Comprehensive type hierarchy testing.
The tests effectively validate the class hierarchy, dataclass properties, and structural types. Good use of parametrization and type hints.
tests/test_io.py (4)
10-13
: LGTM! Parameter renaming and new imports look good.
The renaming from 'contig' to 'refname' is more intuitive and aligns with common genomics terminology. The new imports support the enhanced test coverage.
Also applies to: 25-30
49-49
: LGTM! Type annotation update improves flexibility.
The change to BedLike
type makes the function more flexible while maintaining type safety through the protocol.
252-273
: LGTM! Comprehensive test coverage for Bed12.
The test case thoroughly validates the roundtrip functionality of Bed12 records with complex types, including lists and custom types like BedColor.
Line range hint 277-329
: Please clarify the removal of gzip support.
The removal of gzip-related tests suggests that compressed file support is being dropped. This could impact users who work with compressed BED files.
Let's check if this is documented elsewhere:
bedspec/_bedspec.py (3)
19-22
: Verify the necessity and implementation of the DataclassInstance
protocol
The DataclassInstance
protocol is introduced to represent objects that are dataclass instances. Ensure that this protocol is essential and that all dataclass instances correctly adhere to it without redundancy.
66-77
: Ensure proper implementation of the abstract base class BedLike
The BedLike
class is introduced as a new abstract base class for BED records. Verify that all subclasses properly implement the required abstract methods, particularly territory()
, to prevent any runtime errors due to missing implementations.
Line range hint 193-300
: Verify all references from contig
to refname
have been updated
The classes have replaced the contig
attribute with refname
. Ensure that all references to contig
across the codebase, including documentation and tests, have been updated to refname
to maintain consistency and prevent attribute errors.
Run the following script to identify any remaining references to contig
:
bedspec/_io.py (2)
7-7
: Ensure 'NoneType' compatibility with supported Python versions
The NoneType
class from the types
module is only available starting from Python 3.10. If your project aims to support earlier Python versions, this import will cause compatibility issues.
Consider using type(None)
as an alternative, which works across all Python versions.
13-13
: 'TypeAlias' may not be available in Python versions prior to 3.10
The TypeAlias
feature was introduced in Python 3.10. Using it in earlier versions will result in an ImportError
.
If you need to maintain compatibility with Python versions earlier than 3.10, consider importing TypeAlias
from the typing_extensions
module.
Apply this diff to adjust the import:
-from typing import TypeAlias
+from typing_extensions import TypeAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (10)
.github/workflows/tests.yml (1)
15-15
: Consider adding Python 3.10 for broader compatibility.While testing against Python 3.11 and 3.12 is good for catching issues with newer Python versions, many users might still be on Python 3.10. Adding it to the matrix would ensure broader compatibility.
matrix: - PYTHON_VERSION: ["3.11", "3.12"] + PYTHON_VERSION: ["3.10", "3.11", "3.12"]Also applies to: 19-19
README.md (1)
85-91
: Consider adding a newline before the bullet points.The content is correct, but adding a newline after the colon would improve readability.
The overlap detector supports the following operations: + - `overlapping`: return all overlapping features
🧰 Tools
🪛 LanguageTool
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: -overlapping
: return all overlapping features - `over...(UNLIKELY_OPENING_PUNCTUATION)
pyproject.toml (3)
39-40
: Address TODO comment regarding basedpyright integration.The TODO comment suggests uncertainty about basedpyright integration. Since it's already listed as a dependency, we should either:
- Remove the TODO and integrate it into CI
- Remove the dependency if it's not going to be used
Would you like help setting up the basedpyright configuration and CI integration?
44-45
: Address TODO comment regarding doctests.The TODO comment about doctests should be addressed, especially since pytest-doctestplus is already included as a dependency.
Would you like help setting up doctests configuration for both the codebase and README?
134-156
: Comprehensive linting configuration!The ruff configuration is well-structured with a good selection of linters. Consider documenting these lint rules in CONTRIBUTING.md to help new contributors understand the expectations.
bedspec/_io.py (1)
142-156
: Consider adding error handling for msgspec conversion.While the
_bed_to_dict
method is well-implemented, it might benefit from explicit error handling for msgspec conversion failures.def _bed_to_dict(self, bed: BedType) -> dict[str, object]: """Convert a BED record into a shallow dictionary.""" shallow = {name: self._encode(getattr(bed, name)) for name in self._header} - return cast(dict[str, object], to_builtins(shallow, order="deterministic")) + try: + return cast(dict[str, object], to_builtins(shallow, order="deterministic")) + except Exception as e: + raise ValueError(f"Failed to convert BED record to dictionary: {e}") from ebedspec/_bedspec.py (2)
36-41
: Simplify the opposite() method using a ternary operatorThe method can be made more concise while maintaining readability.
def opposite(self) -> "BedStrand": """Return the opposite BED strand.""" - if self is BedStrand.Positive: - return BedStrand.Negative - else: - return BedStrand.Positive + return BedStrand.Negative if self is BedStrand.Positive else BedStrand.Positive
179-186
: Enhance error handling in BedColor.from_stringThe error handling could be more specific to help users understand the exact issue.
@classmethod def from_string(cls, string: str) -> "BedColor": """Build a BED color instance from a string.""" try: - r, g, b = map(int, string.split(",")) - except ValueError as error: - raise ValueError(f"Invalid string '{string}'. Expected 'int,int,int'!") from error + parts = string.split(",") + if len(parts) != 3: + raise ValueError(f"Expected 3 comma-separated values, got {len(parts)}") + r, g, b = map(int, parts) + except ValueError as error: + if "invalid literal for int()" in str(error): + raise ValueError(f"All values must be integers in '{string}'") from error + raise return cls(r, g, b)tests/test_bedspec.py (2)
179-184
: Consider parameterizing the simple BED validation tests.The current structure with multiple assertions in a single test function could be improved using pytest.mark.parametrize for better test isolation and clearer failure messages.
@pytest.mark.parametrize("start,end,description", [ (-1, 5, "negative start"), (5, 5, "equal start and end"), (5, 0, "end less than start"), ]) def test_simple_bed_validates_start_and_end(start: int, end: int, description: str) -> None: """Test that a simple BED record validates its start and end.""" with pytest.raises(ValueError, match="start must be greater than 0 and less than end!"): Bed3(refname="chr1", start=start, end=end)
189-201
: Consider parameterizing the paired BED validation tests.Similar to the simple BED validation tests, this could be improved using pytest.mark.parametrize for better maintainability and clearer test failures.
@pytest.mark.parametrize("test_case", [ {"refname1": "chr1", "start1": -1, "end1": 5, "error_msg": "start1 must be greater than 0"}, {"refname1": "chr1", "start1": 5, "end1": 5, "error_msg": "start1 must be greater than 0"}, # ... add other test cases ]) def test_paired_bed_validates_start_and_end(test_case: dict) -> None: """Test that a paired BED record validates its start and end for both intervals.""" with pytest.raises(ValueError, match=test_case["error_msg"]): BedPE( refname1=test_case["refname1"], start1=test_case["start1"], end1=test_case["end1"], refname2="chr1", start2=1, end2=2, name="foo", score=5, strand1=BedStrand.Positive, strand2=BedStrand.Positive, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (8)
.github/workflows/tests.yml
(4 hunks)README.md
(5 hunks)bedspec/__init__.py
(1 hunks)bedspec/_bedspec.py
(4 hunks)bedspec/_io.py
(4 hunks)pyproject.toml
(3 hunks)tests/test_bedspec.py
(4 hunks)tests/test_io.py
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- bedspec/init.py
- tests/test_io.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping
: return all overlapping features - `over...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (27)
.github/workflows/tests.yml (3)
1-1
: LGTM: Workflow name and Poetry version updates are appropriate.
The workflow name change to "Code checks" better reflects its purpose, and updating to Poetry 1.8 keeps dependencies current.
Also applies to: 12-12
37-38
: LGTM: Using pipx for Poetry installation is recommended.
The switch to pipx for installing Poetry follows best practices as it provides better isolation for Python CLI applications.
59-59
: Verify the 'check-all' task definition.
The workflow now uses poetry task check-all
instead of directly running pytest. This suggests a custom task definition that might include additional checks beyond testing.
Let's verify the task definition:
#!/bin/bash
# Check the definition of check-all task
rg "check-all" pyproject.toml
README.md (6)
5-5
: LGTM: Python version badge update is accurate.
22-28
: LGTM: Clear and concise example for BED feature creation.
33-49
: LGTM: Writing and reading examples are clear and correct.
The previously reported typo in the console output has been fixed.
62-62
: LGTM: BED types listing is comprehensive.
75-82
: LGTM: Overlap detection example is clear and demonstrates proper usage.
103-108
: LGTM: Custom BED type example is well-structured.
The example demonstrates proper use of dataclasses and type hints.
pyproject.toml (4)
59-88
: Well-structured task organization!
The task hierarchy is well-organized with clear separation between fix and check operations. The sequence of operations in composite tasks follows a logical order.
Line range hint 164-192
: Good conventional commit configuration!
The git-cliff configuration properly enforces semantic versioning through conventional commits and provides a clean changelog structure.
94-109
: Verify impact of strict type checking settings.
The mypy configuration is appropriately strict, which is good for type safety. Let's verify if the codebase complies with these settings.
#!/bin/bash
# Check for any type: ignore comments that might bypass these strict settings
rg "type:\s*ignore" "bedspec/"
35-36
: Verify msgspec usage in the codebase.
The addition of msgspec suggests serialization/deserialization functionality. Let's verify its usage.
✅ Verification successful
Let me search for msgspec usage with a different pattern since the previous search didn't yield relevant results.
Let me try one more search with different options:
Let's check the content of bedspec/_io.py since it's one of the files that contains msgspec:
msgspec dependency is properly used for data serialization/deserialization
The msgspec library is actively used in bedspec/_io.py for:
- Converting BED records using
msgspec.convert
- Converting objects to built-in types using
msgspec.to_builtins
These functions are essential parts of the BED file reading/writing functionality, specifically in:
BedWriter._bed_to_dict()
method for serializationBedReader.__iter__()
method for deserialization
The dependency is justified and properly integrated into the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check how msgspec is being used in the codebase
rg -l 'import.*msgspec|from.*msgspec'
Length of output: 276
Script:
#!/bin/bash
# Search for msgspec usage with a broader pattern
rg "msgspec" --type python -A 3
# Also check the content of any potential configuration or setup files
fd "setup.py|setup.cfg" --exec cat {}
Length of output: 104
Script:
#!/bin/bash
# Search for any files containing msgspec string
rg -l "msgspec"
# Also let's look at the content of pyproject.toml to understand the context better
cat pyproject.toml
Length of output: 5299
Script:
#!/bin/bash
# Check the content of _io.py to see how msgspec is used
cat bedspec/_io.py
Length of output: 9925
bedspec/_io.py (5)
2-35
: LGTM! Well-organized imports and type definitions.
The imports are logically grouped and the new type definitions (BedType
and JsonType
) are clear and well-documented. Good use of typing_extensions
for compatibility.
83-108
: Good type safety improvements in BedWriter.
The transition from _BedKind
to BedType
and the addition of proper type validation in the setter improves type safety. The initialization pattern allows for lazy setup while maintaining type correctness.
127-140
: Well-implemented validation in _maybe_setup_with.
The method provides good runtime validation ensuring type consistency across writes. The error message is clear and helpful.
182-222
: Good improvements to BedReader initialization and type handling.
The reader now requires explicit type information at initialization and properly sets up the CSV reader with comment filtering. The documentation includes helpful usage examples.
264-279
:
Manual JSON string construction needs improvement.
As mentioned in the past review comments, manual JSON string construction can be error-prone and less secure. Consider using a dictionary-based approach instead.
Apply this diff to improve safety and maintainability:
def _csv_dict_to_json(self, record: dict[str, str]) -> JsonType:
"""Convert a CSV dictionary record to JSON using the known field types."""
- key_values: list[str] = []
+ result: dict[str, Any] = {}
for (name, value), field_type in zip(record.items(), self._types, strict=True):
pre_decoded: str = self._pre_decode(cast(type, field_type), value)
origin_type = get_origin(field_type)
- if pre_decoded is None:
- key_values.append(f'"{name}":null')
- elif origin_type is list:
- key_values.append(f'"{name}":[{pre_decoded.rstrip(",")}]')
- elif field_type is str or str in get_args(field_type):
- key_values.append(f'"{name}":"{pre_decoded}"')
- else:
- key_values.append(f'"{name}":{pre_decoded}')
- json_string: JsonType = json.loads(f"{{{','.join(key_values)}}}")
- return json_string
+ if pre_decoded is None:
+ result[name] = None
+ elif origin_type is list:
+ result[name] = [x.strip() for x in pre_decoded.split(",") if x.strip()]
+ elif field_type is str or str in get_args(field_type):
+ result[name] = pre_decoded
+ else:
+ result[name] = json.loads(pre_decoded) if isinstance(pre_decoded, str) else pre_decoded
+ return result
bedspec/_bedspec.py (4)
19-65
: Well-structured protocol definitions!
The new protocols (DataclassInstance
, GenomicSpan
, Named
, Stranded
) are well-designed and provide clear interfaces for different BED record capabilities. The use of @runtime_checkable
enables runtime type checking, which is valuable for debugging and validation.
Line range hint 67-162
: Robust base class implementations with strong validation
The base classes provide excellent validation and clear abstractions. The __new__
method in BedLike
ensures proper dataclass usage, while SimpleBed
and PairBed
include thorough validation in their __post_init__
methods.
81-90
: Clean and effective utility function implementations
The header()
and types()
functions provide useful reflection capabilities with proper type hints and handle both instances and types correctly.
260-282
: Comprehensive validation in Bed12.post_init
The validation logic for BED12 records is thorough and covers all essential cases:
- Thick start/end consistency
- Block count validation
- Block sizes and starts validation
- Block boundary validation
This addresses the previous review comment about implementing validation in Bed12's post_init method.
tests/test_bedspec.py (5)
24-30
: LGTM! Well-structured test for BedStrand string behavior.
The test thoroughly verifies both string conversion and equality comparison for BedStrand.
32-35
: LGTM! Clear test for BedStrand opposite functionality.
The test effectively verifies the bidirectional opposite strand relationship.
43-59
: LGTM! Well-implemented parameterized test for BedColor validation.
The test follows the suggested structure from past reviews, using pytest.mark.parametrize for comprehensive coverage of invalid cases.
85-104
: LGTM! Comprehensive structural type tests.
The tests effectively verify the type hierarchy and protocol implementations for GenomicSpan, Stranded, and DataclassInstance.
226-288
: LGTM! Comprehensive BED12 validation test suite.
The test suite effectively covers all validation scenarios for BED12:
- Thick start/end validation
- Block count, sizes, and starts validation
- Block size validation
- Block start validation
- Block end validation
The helper function make_bed12
is well-designed for testing different scenarios.
@coderabbitai review everything again |
@clintval: I will review all the changes again as per your request. ✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (13)
CONTRIBUTING.md (2)
3-39
: LGTM! Clear and well-organized development commands.The commands are well-structured with clear descriptions for each operation.
Minor grammar fix needed:
-To check the lock file is up to date: +To check the lock file is up-to-date:🧰 Tools
🪛 LanguageTool
[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age htmlTo check the lock file is up to date:
console poetry check --lock ``` #...(UP_TO_DATE_HYPHEN)
41-93
: Great organization of task commands, but heading levels need adjustment.The task commands are well-organized into logical groups with clear categorization. However, the heading hierarchy should be adjusted for better document structure.
Replace the h6 headings (######) with h3 headings (###) for proper hierarchy:
-###### For Running Individual Checks +### For Running Individual Checks -###### For Running All Checks +### For Running All Checks -###### For Running Individual Fixes +### For Running Individual Fixes -###### For Running All Fixes +### For Running All Fixes -###### For Running All Fixes and Checks +### For Running All Fixes and Checks🧰 Tools
🪛 Markdownlint
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time(MD001, heading-increment)
README.md (3)
22-28
: Consider enhancing the example with parameter descriptions.While the example is clear, it would be more helpful for newcomers if you added brief descriptions of what each parameter represents:
refname
: The reference sequence namestart
: The starting position (0-based)end
: The ending position (exclusive)
75-82
: Consider adding an explanation of overlap behavior.While the example is clear, it would be helpful to explain what constitutes an overlap (e.g., whether it includes adjacent intervals, zero-length intervals, etc.).
85-91
: Fix list formatting for consistency.The content is excellent, but consider standardizing the punctuation:
-The overlap detector supports the following operations: +The overlap detector supports the following operations🧰 Tools
🪛 LanguageTool
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: -overlapping
: return all overlapping features - `over...(UNLIKELY_OPENING_PUNCTUATION)
pyproject.toml (2)
39-40
: Address TODO comment regarding basedpyright integration.The TODO comment suggests uncertainty about basedpyright integration. Since it's already listed as a dependency, we should either:
- Remove the TODO and integrate basedpyright into CI
- Remove the dependency if it's not going to be used
Would you like help setting up the basedpyright configuration and CI integration?
44-45
: Address TODO comment regarding doctests.The TODO suggests adding README doctests. This would improve documentation reliability.
Would you like help setting up pytest-doctestplus for README testing?
tests/test_overlap.py (1)
Line range hint
1-142
: Consider adding additional edge casesWhile the test coverage is comprehensive, consider adding these edge cases:
- Zero-length intervals (where start equals end)
- Adjacent but non-overlapping intervals
- Very large intervals that might stress the system
- Performance tests with a large number of intervals
Would you like me to help implement these additional test cases?
tests/test_bedspec.py (2)
179-184
: Consider parameterizing the validation tests.The validation tests for both simple BED and paired BED contain similar test cases that could be parameterized for better maintainability.
Example refactor for simple BED validation:
@pytest.mark.parametrize("start,end,msg", [ (-1, 5, "start must be greater than 0 and less than end!"), (5, 5, "start must be greater than 0 and less than end!"), (5, 0, "start must be greater than 0 and less than end!") ]) def test_simple_bed_validates_start_and_end(start: int, end: int, msg: str) -> None: """Test that a simple BED record validates its start and end.""" with pytest.raises(ValueError, match=msg): Bed3(refname="chr1", start=start, end=end)Similar parameterization could be applied to the paired BED validation tests.
Also applies to: 190-201
226-288
: Consider enhancing BED12 validation tests.While the tests are comprehensive, they could be improved in the following ways:
- Use parameterization for similar test cases
- Add more descriptive error messages for better debugging
Example refactor for thick start/end validation:
@pytest.mark.parametrize("thick_start,thick_end", [ (1, None), (None, 2) ]) def test_bed12_thick_coordinates_validation(thick_start: int | None, thick_end: int | None) -> None: """Test that thick start and end must be set together.""" with pytest.raises( ValueError, match="thick_start and thick_end must both be None or both be set!" ): make_bed12(thick_start=thick_start, thick_end=thick_end)Similar parameterization could be applied to other validation cases.
tests/test_io.py (1)
Compression support is being removed without proper cleanup
The search results show that while the gzip-related tests are commented out, the compression-related constants and file extensions are still defined in
bedspec/_io.py
. This inconsistency suggests an incomplete removal of compression support:
- The commented-out tests in
tests/test_io.py
include all compression-related functionality- But
bedspec/_io.py
still maintains compression-related constants:
BGZ_EXTENSION
,BGZIP_EXTENSION
GZ_EXTENSION
,GZIP_EXTENSION
_BGZIP_EXTENSIONS
,_GZIP_EXTENSIONS
,_ALL_GZIP_COMPATIBLE_EXTENSIONS
Either:
- Remove all compression-related code and constants if compression support is being dropped
- Or restore the tests if compression support should be maintained
🔗 Analysis chain
Line range hint
278-332
: Verify the intention behind removing gzip-related testsThe commented-out tests suggest a change in how compression is handled. Please clarify:
- Is compression support being removed or moved to a different module?
- Are these tests temporarily disabled during refactoring?
- Should we remove these commented-out tests entirely?
Let me generate additional scripts to check for gzip-related functionality since the previous script output seems unrelated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if gzip functionality exists elsewhere in the codebase echo "Searching for gzip-related code..." rg -l "gzip" --type py # Check if there are any other compression-related tests echo "Searching for compression-related tests..." rg -l "compress" --type py --glob "tests/**"Length of output: 11135
Script:
#!/bin/bash # Search for gzip-related code and imports echo "Searching for gzip imports and usage..." rg "import.*gzip|from.*gzip" --type py # Search for compression-related functionality echo "Searching for compression-related code..." rg "compress|gzip|bgzip" --type py -C 2 # Check if these tests were moved elsewhere echo "Searching for similar test patterns..." rg "def test.*gzip|def test.*compress" --type pyLength of output: 5497
bedspec/_io.py (1)
106-110
: Avoid shadowing instance variables by renaming parametersIn the
bed_type
setter method, the parameterbed_type
shadows the instance variableself._bed_type
. This necessitates the use of# type: ignore[no-redef]
to suppress type checker warnings. To improve code clarity and eliminate the need for type ignores, consider renaming the parameter to avoid shadowing the instance variable.Apply this diff to rename the parameter and remove the type ignore:
def bed_type(self, new_bed_type: type[BedType]) -> None: - self._bed_type: type[BedType] = bed_type # type: ignore[no-redef] + self._bed_type: type[BedType] = new_bed_type self._header: list[str] = header(cast(BedLike, new_bed_type)) self._types: list[type | str | Any] = types(cast(BedLike, new_bed_type))bedspec/_bedspec.py (1)
67-71
: Reconsider using__new__
inBedLike
to enforce dataclass decorationUsing
__new__
to enforce that subclasses ofBedLike
are decorated with@dataclass
might lead to unexpected behavior and is unconventional. Consider using a metaclass or class decoration to enforce this constraint, or rely on documentation and developer guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (16)
.github/workflows/tests.yml
(4 hunks)CONTRIBUTING.md
(1 hunks)README.md
(5 hunks)bedspec/__init__.py
(1 hunks)bedspec/_bedspec.py
(4 hunks)bedspec/_io.py
(4 hunks)bedspec/_types.py
(0 hunks)bedspec/_typing.py
(0 hunks)bedspec/overlap/_overlap.py
(6 hunks)build.py
(1 hunks)pyproject.toml
(3 hunks)tests/test_bedspec.py
(4 hunks)tests/test_io.py
(12 hunks)tests/test_overlap.py
(3 hunks)tests/test_types.py
(0 hunks)tests/test_typing.py
(0 hunks)
💤 Files with no reviewable changes (4)
- bedspec/_types.py
- bedspec/_typing.py
- tests/test_types.py
- tests/test_typing.py
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md
[uncategorized] ~35-~35: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...age html To check the lock file is up to date:
console poetry check --lock ``` #...
(UP_TO_DATE_HYPHEN)
README.md
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e following operations: - overlapping
: return all overlapping features - `over...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 Markdownlint
CONTRIBUTING.md
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time
(MD001, heading-increment)
🔇 Additional comments (45)
bedspec/__init__.py (2)
7-7
: LGTM: BED12 format support added
The addition of Bed12
import follows the existing pattern and completes the support for standard BED formats.
10-10
: Verify protocol implementations after module restructuring
The consolidation of protocols into ._bedspec
improves code organization. Let's verify the protocol definitions are properly implemented.
Consider adding a docstring at the top of the file to document the module's purpose and the relationship between these protocols and BED record types.
+"""
+bedspec: A Python library for working with BED files.
+
+This module provides a comprehensive set of types and protocols for handling
+BED format data, including:
+- BED record types (Bed2 through Bed12)
+- Core protocols (GenomicSpan, Named, Stranded)
+- Base classes and utilities
+"""
# ruff: noqa: F401
Also applies to: 13-14, 18-18
✅ Verification successful
Protocol implementations are correctly defined and used in ._bedspec
The verification confirms that:
- All protocols (
GenomicSpan
,Named
,Stranded
) are properly defined in_bedspec.py
with appropriate docstrings BedLike
and other BED classes correctly implement these protocols- The protocol hierarchy is well-structured with clear inheritance patterns (e.g.,
SimpleBed
implementsBedLike
andGenomicSpan
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify protocol definitions and implementations in _bedspec module
# Test: Check if protocols are properly defined
ast-grep --pattern 'protocol $name($_):
$$$'
# Test: Check protocol implementations
ast-grep --pattern 'class $name($protocol):
$$$'
Length of output: 101
Script:
#!/bin/bash
# Let's try a different approach to verify the protocol definitions and implementations
# Check for protocol-like definitions in _bedspec.py
rg -A 5 "class.*Protocol|Protocol.*:" bedspec/_bedspec.py
# Check for imports and their usage
rg "from typing import Protocol" bedspec/
rg "BedLike|GenomicSpan|Named|Stranded" bedspec/_bedspec.py
Length of output: 1613
build.py (3)
10-10
: LGTM! Dependencies list is properly consolidated.
The consolidation of the dependencies list into a single line improves readability while maintaining all necessary header and source file dependencies for the cgranges extension.
15-15
: LGTM! Improved code organization.
The added blank line properly separates the extension module definition from the build function, following PEP 8 guidelines for function spacing.
19-22
: LGTM! Enhanced dictionary formatting.
The closing brace placement on a new line improves readability of the dictionary structure in the setup_kwargs.update
call.
.github/workflows/tests.yml (6)
1-1
: LGTM! Descriptive workflow name.
The new name "Code checks" better reflects the workflow's comprehensive nature, as it includes both testing and quality checks.
19-19
: LGTM! Enhanced Python version coverage.
Adding Python 3.11 to the test matrix improves compatibility verification across Python versions.
37-38
: LGTM! Using recommended installation method.
Using pipx for Poetry installation is the recommended approach as it provides better isolation for Python CLI tools.
55-56
: Consider moving poethepoet to dev dependencies.
The suggestion from the previous review about adding poethepoet to dev dependencies in pyproject.toml is still valid.
59-59
: Verify the check-all task configuration.
The change to use poetry tasks is good, but let's verify the task definition.
#!/bin/bash
# Check if the check-all task is defined in pyproject.toml
rg "check-all" pyproject.toml
# Look for any task definitions to understand the full testing setup
rg "^\[tool\.poe\.tasks\]" -A 20 pyproject.toml
12-12
: Verify Poetry version compatibility.
The upgrade to Poetry 1.8 is good, but let's verify compatibility with the project's dependencies.
✅ Verification successful
Poetry 1.8 upgrade is compatible with project requirements
The upgrade to Poetry 1.8 is safe as:
- Project explicitly requires
poetry-core>=1.8
in build-system requirements - No Poetry-specific features or configurations that would be incompatible with 1.8
- Project's Python version requirement (^3.11.0) and dependencies are all supported by Poetry 1.8
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if pyproject.toml specifies any Poetry version constraints
rg "poetry.*version" pyproject.toml
# Check if there are any Poetry-specific configurations that might be version-sensitive
rg "^\[tool\.poetry\]" -A 10 pyproject.toml
Length of output: 746
Script:
#!/bin/bash
# Check pyproject.toml content
cat pyproject.toml
# Check for any Poetry-specific files or configurations
fd "poetry\..*|pyproject\.toml"
# Look for Python version requirements
rg "python.*version|^python\s*=" pyproject.toml
Length of output: 5371
CONTRIBUTING.md (1)
49-60
: Excellent forward-looking documentation!
The note about Poetry v2.0.0 changes and future plugin handling is well-documented and provides valuable context for future maintenance.
README.md (4)
5-5
: LGTM: Python version badge accurately updated.
The badge now correctly shows support for Python 3.11 and 3.12.
33-49
: LGTM: Clear and accurate I/O examples.
The examples effectively demonstrate both reading and writing operations with proper context management.
62-62
: LGTM: Comprehensive list of supported BED types.
The list now includes all available BED format variations.
103-108
: LGTM: Improved custom type example.
The changes enhance the example by:
- Using a more descriptive class name (Bed3Plus1)
- Adding proper type hints with None support
- Following consistent field naming (refname)
pyproject.toml (4)
59-88
: LGTM! Well-structured task configuration.
The task organization is excellent, with:
- Clear separation between fix and check tasks
- Logical task grouping
- Proper error handling with ignore_fail
134-156
: LGTM! Comprehensive linting configuration.
The ruff configuration is well-thought-out with:
- Essential linting rules enabled
- Appropriate ignores for common false positives
- Good balance between strictness and practicality
Line range hint 164-199
: LGTM! Well-structured git-cliff configuration.
The configuration properly implements conventional commits with:
- Clear commit categorization
- Good changelog structure
- Appropriate commit parsing rules
34-36
: Verify compatibility of new dependencies.
The new dependencies look appropriate, but let's verify their compatibility:
✅ Verification successful
Dependencies are correctly used and compatible
The dependencies are properly integrated in the codebase:
msgspec
is used for data conversion withconvert
andto_builtins
functionstyping-extensions
is used for the@override
decorator in multiple files- Python 3.11 requirement aligns with typing-extensions support for Python 3.8+
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if msgspec and typing-extensions are used in the codebase
echo "Checking msgspec usage:"
rg "import msgspec|from msgspec" -l
echo -e "\nChecking typing-extensions usage:"
rg "import typing_extensions|from typing_extensions" -l
Length of output: 285
Script:
#!/bin/bash
# Check how these dependencies are used in the codebase
echo "Checking msgspec usage in bedspec/_io.py:"
rg "msgspec" bedspec/_io.py -B 2 -A 2
echo -e "\nChecking typing_extensions usage in bedspec/_io.py and _bedspec.py:"
rg "typing_extensions" bedspec/_io.py bedspec/_bedspec.py -B 2 -A 2
# Check Python version usage
echo -e "\nChecking Python version constraints in code:"
rg "python_version|python_requires|Python \d" -g "!*.pyc" -g "!*.txt" -g "!*.md"
Length of output: 1146
bedspec/overlap/_overlap.py (4)
22-22
: LGTM: Protocol property implementations are correctly defined.
The addition of NotImplemented
return statements in protocol properties follows best practices for abstract protocol definitions in Python, making it explicit that these properties must be implemented by concrete classes.
Also applies to: 27-27, 37-37, 47-47, 57-57
112-120
: LGTM: Improved reference sequence name detection.
The switch to hasattr
checks from isinstance
checks is a good improvement:
- More maintainable and cleaner approach
- Better follows Python's duck typing principles
- Clear error message when no property is found
Line range hint 83-163
: LGTM: Well-structured overlap detection implementation.
The OverlapDetector
class implementation is robust and well-documented:
- Clear type hints using the new
GenericGenomicSpanLike
- Comprehensive docstrings explaining requirements
- Efficient overlap detection using cgranges
- Proper error handling and state management
69-70
: Verify the impact of type variable rename.
The rename from _GenericGenomicSpanLike
to GenericGenomicSpanLike
(removing underscore prefix) suggests this is now intended for public use. Let's verify all references are updated.
✅ Verification successful
Let's verify the new name usage to ensure the rename was properly implemented.
Type variable rename is consistently implemented
The rename from _GenericGenomicSpanLike
to GenericGenomicSpanLike
has been properly implemented throughout the codebase. All references to this type variable are using the new name without the underscore prefix, and there are no lingering references to the old name. The type variable is used consistently in the OverlapDetector
class and its methods.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to the old name
echo "Checking for any remaining references to _GenericGenomicSpanLike:"
rg "_GenericGenomicSpanLike"
Length of output: 169
Script:
#!/bin/bash
# Search for the new name to verify its usage
echo "Checking references to GenericGenomicSpanLike:"
rg "GenericGenomicSpanLike"
# Also check for any TypeVar definitions to ensure there are no duplicates
echo -e "\nChecking all TypeVar definitions:"
ast-grep --pattern 'TypeVar($$$)'
Length of output: 1956
tests/test_overlap.py (7)
13-14
: LGTM: Basic test functions properly updated with refname
The attribute renaming from contig
to refname
is consistently applied across the basic test functions while maintaining their original test coverage and assertions.
Also applies to: 21-22, 29-30
42-42
: LGTM: Improved error handling test cases
The error message is now more specific with the explicit match string "Genomic feature is not hashable but should be", which improves test clarity and maintainability.
Also applies to: 49-49
70-71
: LGTM: Feature addition test cases properly updated
The test cases for adding and querying features have been correctly updated with the refname
attribute while maintaining their original test coverage.
Also applies to: 79-80
87-95
: LGTM: Comprehensive enclosing intervals test cases
The test cases thoroughly cover various scenarios for enclosing intervals, including edge cases, with consistent use of the refname
attribute.
100-108
: LGTM: Thorough enclosed by intervals test coverage
The test cases provide comprehensive coverage for enclosed by scenarios, complementing the enclosing tests with proper edge cases.
113-125
: LGTM: Comprehensive overlap detection test cases
The test cases provide thorough coverage of various overlap scenarios across different chromosomes and positions, with a good balance of positive and negative cases.
130-142
: LGTM: Boolean overlap query tests properly implemented
The test cases effectively verify the boolean overlap query functionality, maintaining consistency with the more detailed overlap detection tests.
tests/test_bedspec.py (5)
Line range hint 1-22
: LGTM! Import statements are well-organized.
The imports are logically grouped and all appear to be used within the test file.
24-35
: LGTM! Comprehensive strand tests.
The tests effectively cover both string representation and opposite strand functionality. Good test isolation with separate functions for distinct behaviors.
38-71
: LGTM! Well-structured color validation tests.
The tests effectively use parameterization for validation cases and include comprehensive error handling. Implementation follows the suggested structure from previous reviews.
73-104
: LGTM! Comprehensive type system verification.
Tests effectively verify the type hierarchy, dataclass behavior, and structural types using parameterization where appropriate.
Line range hint 203-223
: LGTM! Clear and effective territory tests.
The tests effectively verify territory behavior with explicit expected values and clear assertions.
tests/test_io.py (5)
10-14
: LGTM: Import additions are consistent with the changes
The addition of Bed12
and BedLike
imports aligns with the new functionality being tested.
Line range hint 25-45
: LGTM: Test cases are comprehensive and consistent
The parameter renaming from 'contig' to 'refname' is consistently applied across all test cases, and the test coverage spans various BED types with appropriate test data.
49-49
: LGTM: Type hint improvement
The change from BedType
to BedLike
better represents the type constraints and aligns with the new abstract base class implementation.
251-275
: LGTM: Comprehensive Bed12 roundtrip test
The new test thoroughly validates Bed12 functionality with:
- Complete coverage of all Bed12 attributes
- Proper testing of both write and read operations
- Validation of complex types (lists, enums)
62-62
: Verify consistent usage of 'refname' parameter
The parameter has been renamed from 'contig' to 'refname' throughout this file. Let's verify this change is consistent across the codebase.
#!/bin/bash
# Check for any remaining usage of 'contig' parameter
echo "Searching for remaining 'contig' usage in BED-related code..."
rg "contig.*=|def.*contig" --type py
# Check for consistent usage of 'refname'
echo "Verifying 'refname' usage in BED-related code..."
rg "refname.*=|def.*refname" --type py
Also applies to: 71-71, 88-88, 104-104, 120-120, 135-135, 154-154, 163-163, 188-188, 205-205, 223-223, 233-233, 243-243, 254-254
bedspec/_io.py (2)
266-279
: Avoid manual JSON string construction
Constructing JSON strings manually by concatenating strings can lead to syntax errors and security vulnerabilities. It's safer and more efficient to build a dictionary and serialize it using json.dumps
. The previous review comment regarding this issue is still applicable to this code segment.
240-250
: Verify resource management in __iter__
method
In the __iter__
method of BedReader
, self.close()
is called after iteration. This may lead to unexpected behavior if the BedReader
instance is intended to be reused or if multiple iterations are required. Additionally, if BedReader
is used within a context manager, closing the file handle here might interfere with the context manager's exit handling. Please verify whether calling self.close()
in this method is appropriate.
bedspec/_bedspec.py (2)
257-279
: Verify the validation logic in Bed12.__post_init__
against BED12 specifications
The validation logic in __post_init__
should strictly adhere to the BED12 format specifications. Please review the following:
- Block Starts: Confirm whether
block_starts[0]
must be0
. Some specifications allowblock_starts
to contain relative offsets, not necessarily starting at0
. - Block Boundaries: Check the calculation
(self.start + self.block_starts[-1] + self.block_sizes[-1]) == self.end
. Ensure that this correctly validates that the blocks align with the overall interval defined bystart
andend
.
Would you like assistance in reviewing the BED12 specification and adjusting the validation logic accordingly?
36-38
:
Use ==
instead of is
for string comparison in BedStrand.opposite
In the opposite
method, comparing BedStrand
values using is
may not work as expected since BedStrand
inherits from StrEnum
, and its members are instances of str
. It's safer to use ==
for comparison to ensure correct behavior.
Apply this diff to fix the comparison:
-def opposite(self) -> "BedStrand":
- return BedStrand.Negative if self is BedStrand.Positive else BedStrand.Positive
+def opposite(self) -> "BedStrand":
+ return BedStrand.Negative if self == BedStrand.Positive else BedStrand.Positive
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/test_io.py (1)
Based on the findings, I can now generate the final response:
Gzip compression support has been removed from the codebase
The codebase has been simplified to handle only plain text files:
- All gzip-related tests have been removed
BedReader.from_path
andBedWriter.from_path
now only open files in plain text mode ("r"/"w")- While gzip extension constants are still defined in
_io.py
, there's no active code using themThis appears to be an intentional simplification of the I/O handling, removing compression support entirely rather than a temporary removal or relocation of tests.
🔗 Analysis chain
Line range hint
278-332
: Verify the removal of gzip compression testsThe commented-out gzip compression tests have been removed. Please clarify:
- Is gzip compression support being deprecated?
- Are these tests being moved elsewhere?
- Is this a temporary removal pending refactoring?
Let me check the content of these files to understand how gzip compression is being handled.
Let me check if there are any new or replacement tests for handling compressed files, as the current findings only show the commented tests and extension definitions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if gzip functionality is referenced elsewhere in the codebase echo "Searching for gzip-related code..." rg -l 'gzip|compression' --type py # Check for any related test files echo "Checking for other compression-related test files..." fd -e py -e test test_.*compressionLength of output: 302
Script:
#!/bin/bash # Check the content of test_io.py for any active gzip tests echo "=== Active gzip tests in test_io.py ===" rg "def test.*gzip" tests/test_io.py -B1 -A1 # Check the implementation in _io.py echo -e "\n=== Gzip handling in _io.py ===" rg "gzip|compression" bedspec/_io.py -B2 -A2Length of output: 2301
Script:
#!/bin/bash # Look for any active tests related to file handling in test_io.py echo "=== Active tests for file handling ===" rg "^def test.*" tests/test_io.py -B1 -A1 # Check for any recent changes in the implementation of file handling echo -e "\n=== File handling implementation in _io.py ===" rg "class.*Reader|class.*Writer|def from_path|def open" bedspec/_io.py -B1 -A3Length of output: 3508
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
tests/test_io.py
(12 hunks)
🔇 Additional comments (5)
tests/test_io.py (5)
10-13
: LGTM: Clean import additions
The new imports for Bed12
and BedLike
are properly organized and align with the updated type system.
Line range hint 25-44
: LGTM: Consistent parameter renaming
The renaming from 'contig' to 'refname' is consistently applied across all test cases. The parametrization structure is clean and maintainable.
Line range hint 49-55
: LGTM: Improved type flexibility
The change from BedType
to BedLike
appropriately broadens the accepted types while maintaining type safety. The simplified BedWriter instantiation improves readability.
Line range hint 173-200
: LGTM: Simplified BedReader instantiation
The updated BedReader instantiation pattern is consistently applied and maintains proper error handling. The path-based and string-based initialization tests provide good coverage.
252-276
: LGTM: Comprehensive Bed12 roundtrip testing
The new test thoroughly validates Bed12 functionality with complex types, including proper initialization, writing, and reading. All fields are properly tested and validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (8)
CONTRIBUTING.md (2)
3-39
: Consider adding more context about command usage and outcomes.While the commands are well-organized, it would be helpful to add:
- Expected outcomes or success criteria for each command
- When developers should run these commands (e.g., before submitting PR, after changes)
- What to do if the commands fail
41-93
: Consider adding a table of contents for task commands.The Shortcut Task Commands section contains multiple subsections with different types of commands. A table of contents would help developers quickly find the specific command they need.
Add this at the start of the section:
## Shortcut Task Commands - [Individual Checks](#for-running-individual-checks) - [All Checks](#for-running-all-checks) - [Individual Fixes](#for-running-individual-fixes) - [All Fixes](#for-running-all-fixes) - [All Fixes and Checks](#for-running-all-fixes-and-checks)🧰 Tools
🪛 Markdownlint
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time(MD001, heading-increment)
bedspec/_io.py (1)
134-139
: Enhance error message clarityThe error message could be more descriptive by including the actual instance types.
- f"BedWriter can only continue to write features of the same type." - f" Will not write a {type(bed).__name__} after a" - f" {self.bed_type.__name__}." + f"BedWriter can only write features of the same type. " + f"Attempted to write {type(bed).__name__} but writer was initialized with {self.bed_type.__name__}."bedspec/_bedspec.py (3)
142-146
: Consider enhancing PairBed validation.The validation could be more comprehensive by checking for:
- Relative positions when intervals are on the same chromosome
- Negative coordinates
Consider adding this validation:
def __post_init__(self) -> None: """Validate this pair of BED records.""" if self.start1 >= self.end1 or self.start1 < 0: raise ValueError("start1 must be greater than 0 and less than end1!") if self.start2 >= self.end2 or self.start2 < 0: raise ValueError("start2 must be greater than 0 and less than end2!") + # When on same chromosome, ensure no overlaps + if self.refname1 == self.refname2: + if not (self.end1 <= self.start2 or self.end2 <= self.start1): + raise ValueError("When on same chromosome, intervals must not overlap!")
177-184
: Enhance error handling in BedColor.from_string.The error handling could be more specific about the type of error encountered.
Consider this enhanced error handling:
@classmethod def from_string(cls, string: str) -> "BedColor": """Build a BED color instance from a string.""" try: r, g, b = map(int, string.split(",")) except ValueError as error: - raise ValueError(f"Invalid string '{string}'. Expected 'int,int,int'!") from error + if "," not in string: + raise ValueError(f"Invalid format in '{string}'. Expected 'r,g,b' with comma separators!") from error + if string.count(",") != 2: + raise ValueError(f"Invalid number of components in '{string}'. Expected exactly three values!") from error + raise ValueError(f"Invalid integer values in '{string}'. All components must be integers!") from error return cls(r, g, b)
258-280
: Consider refactoring Bed12 validation for better maintainability.The validation logic in
__post_init__
is comprehensive but could be split into smaller, focused methods for better maintainability and readability.Consider this refactoring:
def __post_init__(self) -> None: """Validate this BED12 record.""" super().__post_init__() + self._validate_thick_bounds() + self._validate_blocks() + + def _validate_thick_bounds(self) -> None: if (self.thick_start is None) != (self.thick_end is None): raise ValueError("thick_start and thick_end must both be None or both be set!") + + def _validate_blocks(self) -> None: if self.block_count is None: if self.block_sizes is not None or self.block_starts is not None: raise ValueError("block_count, block_sizes, block_starts must all be set or unset!") else: if self.block_sizes is None or self.block_starts is None: raise ValueError("block_count, block_sizes, block_starts must all be set or unset!") if self.block_count <= 0: raise ValueError("When set, block_count must be greater than or equal to 1!") if self.block_count != len(self.block_sizes) or self.block_count != len( self.block_starts ): raise ValueError("Length of block_sizes and block_starts must equal block_count!") if self.block_starts[0] != 0: raise ValueError("block_starts must start with 0!") if any(size <= 0 for size in self.block_sizes): raise ValueError("All sizes in block_size must be greater than or equal to one!") if (self.start + self.block_starts[-1] + self.block_sizes[-1]) != self.end: raise ValueError("The last defined block's end must be equal to the BED end!")bedspec/overlap/_overlap.py (2)
16-27
: Consider raisingNotImplementedError
in property methodsIn the
Span
andReferenceSpan
protocols, the property methodsstart
,end
, andrefname
returnNotImplemented
. It is more idiomatic in Python to raiseNotImplementedError
to clearly indicate that these properties need to be implemented by concrete subclasses.Apply this diff:
class Span(Hashable, Protocol): """A span with a start and an end. 0-based open-ended.""" @property def start(self) -> int: """A 0-based start position.""" - return NotImplemented + raise NotImplementedError @property def end(self) -> int: """A 0-based open-ended position.""" - return NotImplemented + raise NotImplementedError class ReferenceSpan(Span, Protocol): """A feature on a reference sequence.""" @property def refname(self) -> str: """A reference sequence name.""" - return NotImplemented + raise NotImplementedErrorAlso applies to: 31-37
40-44
: Evaluate the necessity of separateTypeVar
sBoth
ReferenceSpanType
andGenericReferenceSpanType
areTypeVar
s bounded toReferenceSpan
. Consider whether both are necessary or if they can be consolidated to simplify the type annotations, enhancing code readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
CONTRIBUTING.md
(1 hunks)bedspec/_bedspec.py
(4 hunks)bedspec/_io.py
(4 hunks)bedspec/overlap/__init__.py
(1 hunks)bedspec/overlap/_overlap.py
(1 hunks)tests/test_overlap.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- bedspec/overlap/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_overlap.py
🧰 Additional context used
🪛 Markdownlint
CONTRIBUTING.md
61-61: Expected: h3; Actual: h6
Heading levels should only increment by one level at a time
(MD001, heading-increment)
🔇 Additional comments (8)
CONTRIBUTING.md (1)
43-60
: Well-documented future compatibility note!
The note about Poetry v2.0.0 changes is excellent. It provides clear documentation about future changes and includes the exact TOML configuration that will be needed.
bedspec/_io.py (4)
30-34
: Well-defined type definitions!
The new type definitions BedType
and JsonType
are clear, well-documented, and appropriately constrained. The BedType
TypeVar with bound=BedLike
ensures type safety.
141-156
: LGTM! Well-implemented serialization methods
The _bed_to_dict
and _encode
methods are well-implemented:
- Uses msgspec's to_builtins with deterministic order for consistent output
- Handles special cases (None, collections, BedColor) appropriately
239-248
: LGTM! Efficient record iteration implementation
The __iter__
implementation is well-designed:
- Uses msgspec's convert for efficient record conversion
- Properly yields converted records
- Automatically closes the reader when done
263-278
: Reuse existing review for JSON construction
This code segment was previously reviewed and the suggestion to avoid manual JSON string construction is still valid.
bedspec/_bedspec.py (2)
20-24
: LGTM! Well-structured protocol for dataclass validation.
The DataclassInstance
protocol provides a clean way to enforce dataclass requirements at runtime.
27-39
: LGTM! Well-implemented strand enum with useful helper method.
The BedStrand
enum provides type safety for strand representation and includes a helpful opposite()
method for strand manipulation.
bedspec/overlap/_overlap.py (1)
50-113
: Well-structured use of generics and protocols
The implementation of the OverlapDetector
using generics and protocols enhances type safety and flexibility. The refactoring improves the clarity of the code and makes it more maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
CONTRIBUTING.md (1)
3-39
: LGTM! Consider adding expected output examples.The Primary Development Commands section is well-organized and covers all essential development tasks. The commands are clearly presented with appropriate descriptions.
Consider adding example output or success criteria for each command to help developers verify correct execution.
bedspec/overlap/_overlap.py (2)
40-44
: Enhance type variable documentation.The docstrings could be more descriptive to explain:
- The purpose and use case for each type variable
- The distinction between
QueryReferenceSpanType
andGenericReferenceSpanType
- When to use each type in the context of overlap detection
Consider updating the docstrings:
QueryReferenceSpanType = TypeVar("QueryReferenceSpanType", bound=ReferenceSpan) -"""A 0-based end-exclusive feature on a reference sequence.""" +"""Type variable for features being queried against the overlap detector.""" GenericReferenceSpanType = TypeVar("GenericReferenceSpanType", bound=ReferenceSpan) -"""A generic 0-based end-exclusive feature on a reference sequence.""" +"""Type variable for features stored within the overlap detector that can be matched against queries."""
75-85
: Enhance error handling in the add method.The error handling could be more comprehensive:
- Consider validating start/end positions
- Add type validation for start/end being integers
- Verify that end > start
Consider adding these validations:
def add(self, feature: GenericReferenceSpanType) -> None: """Add a feature to this overlap detector.""" if not isinstance(feature, Hashable): raise ValueError(f"Genomic feature is not hashable but should be: {feature}") + + if not isinstance(feature.start, int) or not isinstance(feature.end, int): + raise ValueError(f"Start and end positions must be integers: {feature}") + + if feature.end <= feature.start: + raise ValueError(f"End position must be greater than start position: {feature}") refname: Refname = feature.refname feature_idx: int = len(self._refname_to_features[refname])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
CONTRIBUTING.md
(1 hunks)bedspec/overlap/__init__.py
(1 hunks)bedspec/overlap/_overlap.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bedspec/overlap/init.py
🔇 Additional comments (4)
CONTRIBUTING.md (1)
61-93
: LGTM! Heading levels have been corrected.
The command sections are well-organized with proper heading hierarchy (h3) and consistent formatting. The grouping of commands (checks, fixes, combined operations) provides a clear workflow structure.
The heading levels issue raised in the previous review has been addressed.
bedspec/overlap/_overlap.py (3)
16-37
: Well-structured protocol definitions!
The Span
and ReferenceSpan
protocols are well-designed with:
- Clear separation of concerns between basic span properties and reference-specific properties
- Proper use of
@runtime_checkable
for runtime type verification - Good inheritance hierarchy
50-70
: Excellent implementation of the overlap detector!
The class demonstrates good design choices:
- Efficient use of cgranges for overlap detection
- Smart lazy indexing strategy per reference sequence
- Clear separation of concerns in data structures
92-116
: Well-implemented overlap detection methods!
The methods demonstrate:
- Efficient use of generators
- Clear and correct overlap detection logic
- Proper handling of index updates
0731483
to
1ec01ae
Compare
1ec01ae
to
c9cc0c7
Compare
Summary by CodeRabbit
New Features
README.md
for using BED features, including new examples and updated naming conventions.Bug Fixes
Bed3
andBedPE
.Documentation
CONTRIBUTING.md
for clarity on development and testing instructions.README.md
andpyproject.toml
with new sections, dependencies, and configurations.Tests
Chores
pyproject.toml
and modified workflow configurations for improved testing processes.