-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: remove cgranges for superintervals #28
Conversation
WalkthroughThe pull request introduces several updates across multiple workflow files in the repository. Key changes include the removal of the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
bedspec/overlap/_overlap.py (2)
38-38
: Consider explicit typing for IntervalSetThe defaultdict initialization with IntervalSet requires type ignore. Consider creating a type alias for better type safety:
+IntervalTree: TypeAlias = IntervalSet -self._refname_to_tree: dict[Refname, IntervalSet] = defaultdict(IntervalSet) +self._refname_to_tree: dict[Refname, IntervalTree] = defaultdict(IntervalTree)
65-67
: Consider performance impact of find_overlapsThe switch to superintervals' find_overlaps method may have different performance characteristics. Consider benchmarking critical operations.
Consider adding performance benchmarks to compare the old and new implementations, especially for large datasets.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (11)
.github/workflows/publish_bedspec.yml
(0 hunks).github/workflows/tests.yml
(0 hunks).github/workflows/wheels.yml
(1 hunks).gitmodules
(0 hunks)README.md
(1 hunks)bedspec/_bedspec.py
(8 hunks)bedspec/overlap/_overlap.py
(4 hunks)build.py
(0 hunks)cgranges
(0 hunks)pyproject.toml
(4 hunks)tests/test_overlap.py
(5 hunks)
💤 Files with no reviewable changes (5)
- .github/workflows/tests.yml
- cgranges
- .gitmodules
- build.py
- .github/workflows/publish_bedspec.yml
🔇 Additional comments (10)
.github/workflows/wheels.yml (2)
10-11
: LGTM: Simplified wheel building configuration
The changes appropriately simplify the wheel building process by:
- Using a standard ubuntu runner
- Supporting multiple Python versions (3.10, 3.11, 3.12)
Also applies to: 14-14
25-25
: Verify wheel building command
The simplified pip wheel
command aligns with removing C extensions, but ensure all package files are included.
✅ Verification successful
Wheel building command is appropriate for this package
The simplified pip wheel
command is correct for this package because:
- The project uses Poetry as the build backend (poetry.core.masonry.api)
- All Python package files are properly included via the
packages
configuration in pyproject.toml - No C extensions or additional files need to be included (only Python modules and standard metadata)
- The package structure is straightforward with all modules under the
bedspec
directory - The
include
directive in pyproject.toml properly specifies additional files (CONTRIBUTING.md, LICENSE)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if all package files are included in the wheel
pip wheel --no-deps -w wheelhouse . --dry-run
Length of output: 141
Script:
#!/bin/bash
# Check package structure and files
fd -t f -e py . | grep -v "test"
# Check setup.py/pyproject.toml for package configuration
fd "setup.py|pyproject.toml" -t f
Length of output: 217
Script:
#!/bin/bash
# Check for any MANIFEST.in file that controls package contents
fd "MANIFEST.in" -t f
# Look for any build system configuration
cat pyproject.toml 2>/dev/null || echo "pyproject.toml not found"
cat setup.py 2>/dev/null || echo "setup.py not found"
Length of output: 5499
Script:
#!/bin/bash
# Check the workflow file for context
cat .github/workflows/wheels.yml
Length of output: 804
README.md (1)
110-110
: LGTM: Updated dataclass example with modern syntax
The example correctly demonstrates:
- Simplified dataclass usage without explicit eq parameter
- Modern type annotation using union syntax (
float | None
)
bedspec/overlap/_overlap.py (2)
9-11
: Verify type checking suppressions
The type ignore comments suggest potential type checking issues with superintervals. Consider:
- Opening an issue with superintervals for type stubs
- Creating local type stubs for critical functionality
✅ Verification successful
Type checking suppressions are appropriate for now
The type ignore comments are currently necessary as:
- No type stubs (.pyi files) exist for superintervals in the package or project
- This is the only file importing superintervals, suggesting a localized type checking concern
- The suppressions are specific and well-scoped to the import statement
Consider opening an issue with superintervals for type stubs, but the current approach is a reasonable temporary solution.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if superintervals has type stubs in newer versions
pip index versions superintervals | head -n 5
Length of output: 141
Script:
#!/bin/bash
# Check if superintervals package has py.typed marker or type stub files
fd -e pyi -e typed superintervals || echo "No type stubs found"
# Check if there are any other files importing superintervals and how they handle types
rg "from superintervals import" -B 2 -A 2
# Look for any existing type stub files in the project
fd -e pyi . || echo "No type stub files in project"
Length of output: 430
52-52
: Verify interval boundary handling
The change from cgranges to superintervals includes a subtle modification in end position handling (feature.end - 1
). Ensure this maintains the same semantics as the previous implementation.
Also applies to: 55-55
✅ Verification successful
I'll search for test cases and implementation details related to interval overlaps and boundary handling to verify the semantics.
Interval boundary handling is correctly implemented
The verification shows that the end position handling (feature.end - 1
) in superintervals is consistent with the test cases. The test suite includes comprehensive boundary checks that verify:
- Non-overlapping intervals (
[0,1]
with[2,5]
) - Adjacent intervals (
[2,5]
with[5,6]
) - Overlapping intervals (
[2,5]
with[4,10]
)
All these test cases pass with the current implementation, confirming that the boundary semantics are maintained correctly.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all interval boundary tests
rg -l 'test.*overlap.*boundary' tests/
Length of output: 40
Script:
#!/bin/bash
# Search for tests related to overlapping intervals and boundary cases
rg -A 5 "test.*overlap" tests/
# Search for interval-related test assertions
rg -A 2 "assert.*overlap" tests/
# Look for interval boundary handling in the implementation
rg -A 3 "feature.end" bedspec/
Length of output: 6946
pyproject.toml (2)
30-30
: LGTM: Package and dependency updates align with architectural changes.
The removal of cgranges package and addition of superintervals as a dependency is consistent with the architectural shift in interval management.
Also applies to: 34-34
47-47
: Verify build system compatibility.
The simplified build system requirements might affect the package installation process.
✅ Verification successful
Poetry-core is sufficient for this package's build requirements
Based on the analysis of the project files:
- The package is a pure Python package with no compiled extensions
- Dependencies are standard Python packages without compilation requirements
- No setup.py or setup.cfg files are present
- No build instructions indicating special compilation needs in README.md
- Project uses standard Poetry configuration with typical Python development tools
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that poetry-core can handle the package build without additional requirements
# Test: Check if there are any Cython or compiled extensions remaining
fd -e pyx -e pxd -e pyd -e so
Length of output: 31
Script:
#!/bin/bash
# Let's check for any build-related files and dependencies that might require special handling
fd -t f "setup\.py|setup\.cfg"
# Check for any native/compiled dependencies in requirements
cat pyproject.toml
# Look for any build instructions or compilation steps in README
fd -t f "README\.md" --exec cat {}
Length of output: 8681
tests/test_overlap.py (2)
11-11
: LGTM: Improved test assertions using set comparisons.
The change from list to set comparisons makes the tests more robust by ensuring they pass regardless of the order of elements. This is particularly important when testing interval operations where the order might not be guaranteed or relevant.
Also applies to: 19-19, 29-29, 39-39, 47-47, 55-60, 68-73, 83-83, 85-90
110-123
: LGTM: Good test coverage for detector state management.
The new test function effectively verifies that the OverlapDetector maintains correct state and behavior when adding intervals both before and after performing queries.
bedspec/_bedspec.py (1)
184-184
: Verify implications of unsafe_hash=True.
The addition of unsafe_hash=True
to all BED classes allows them to be used in sets and as dictionary keys, which aligns with the test changes. However, this could have implications:
- Performance impact on hash computation
- Potential issues if mutable fields are modified after adding to sets/dicts
Consider documenting the implications of unsafe_hash=True
in the class docstrings, particularly regarding the behavior when objects are used as dictionary keys or set elements.
Also applies to: 212-212, 220-220, 229-229, 239-239, 250-250, 262-262, 304-304, 314-314
✅ Verification successful
The use of unsafe_hash=True is safe in this implementation
The codebase analysis reveals that all BED classes are implemented safely with respect to hashing:
- All classes use
slots=True
, preventing dynamic attribute addition - All fields are immutable types (str, int)
- No mutable default values or mutable fields are present
- Fields are properly defined with
field(kw_only=True)
where needed
The implementation ensures hash stability and the classes can be safely used as dictionary keys or in sets without the risks typically associated with unsafe_hash=True
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for mutable field usage that could be affected by unsafe_hash
# Test: Look for mutable field definitions in BED classes
rg -A 1 "field\(.*default=" bedspec/
Length of output: 38
Script:
#!/bin/bash
# Let's check for field definitions in dataclasses more broadly
ast-grep --pattern 'class $_ {
$$$
field($$$)
$$$
}'
# Also check for any mutable attributes in these classes
rg "class.*:.*\n.*=.*\[" bedspec/
# And look at the actual dataclass definitions
rg "@dataclass" -A 5 bedspec/
Length of output: 5173
8c1c6d3
to
e233333
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
bedspec/overlap/_overlap.py (1)
9-11
: Consider addressing type checking suppressionsThe type ignore comments suggest potential type checking issues with the superintervals package. Consider:
- Opening an issue with the superintervals package to add type stubs
- Creating local type stubs for the package
tests/test_overlap.py (2)
110-123
: Good addition of dynamic update testThe new test
test_we_can_add_to_the_overlap_detector_after_and_before_queries
is valuable as it verifies the detector's ability to handle dynamic updates correctly.Consider adding additional test cases for:
- Multiple additions between queries
- Updates to existing intervals
- Cross-reference overlaps
Line range hint
1-123
: Consider additional test coverageWhile the current test suite is comprehensive for functionality, consider adding:
- Error case tests (e.g., invalid intervals)
- Performance tests for large datasets
- Memory usage tests for large datasets
bedspec/_bedspec.py (3)
Line range hint
212-261
: LGTM! Consider documenting hash behaviorThe changes to make fields keyword-only and enable slots are good improvements. The inheritance hierarchy and protocol implementations are well-structured.
Consider adding a docstring note about the hash behavior, especially since these classes might be used in collections.
Example docstring addition for Bed3:
"""A BED3 record that describes a contiguous linear interval. Note: Instances are hashable and can be used in sets/dicts, but field values should not be modified after creation to maintain hash stability. """
Line range hint
262-303
: Consider performance optimization for block validationThe validation logic is thorough but could be optimized by:
- Using
all()
instead of multipleif any()
checks- Combining related validations to reduce iterations
def __post_init__(self) -> None: """Validate this BED12 record.""" super(Bed12, self).__post_init__() if (self.thick_start is None) != (self.thick_end is None): raise ValueError("thick_start and thick_end must both be None or both be set!") if self.block_count is None: if self.block_sizes is not None or self.block_starts is not None: raise ValueError("block_count, block_sizes, block_starts must all be set or unset!") else: if self.block_sizes is None or self.block_starts is None: raise ValueError("block_count, block_sizes, block_starts must all be set or unset!") if self.block_count <= 0: raise ValueError("When set, block_count must be greater than or equal to 1!") if self.block_count != len(self.block_sizes) or self.block_count != len( self.block_starts ): raise ValueError("Length of block_sizes and block_starts must equal block_count!") if self.block_starts[0] != 0: raise ValueError("block_starts must start with 0!") - if any(size <= 0 for size in self.block_sizes): + if not all(size > 0 for size in self.block_sizes): raise ValueError("All sizes in block_size must be greater than or equal to one!") if (self.start + self.block_starts[-1] + self.block_sizes[-1]) != self.end: raise ValueError("The last defined block's end must be equal to the BED end!")
Line range hint
304-359
: LGTM! Consider adding tests for from_bed6The new
from_bed6
class method is a good addition. The implementation correctly handles all fields and maintains type safety.Would you like me to help generate comprehensive test cases for the
from_bed6
method? The tests should cover:
- Basic conversion from two Bed6 instances
- Handling of optional fields (name, score)
- Preservation of strand information
- Edge cases with None values
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (11)
.github/workflows/publish_bedspec.yml
(0 hunks).github/workflows/tests.yml
(0 hunks).github/workflows/wheels.yml
(1 hunks).gitmodules
(0 hunks)README.md
(1 hunks)bedspec/_bedspec.py
(8 hunks)bedspec/overlap/_overlap.py
(5 hunks)build.py
(0 hunks)cgranges
(0 hunks)pyproject.toml
(4 hunks)tests/test_overlap.py
(5 hunks)
💤 Files with no reviewable changes (5)
- cgranges
- .gitmodules
- .github/workflows/tests.yml
- build.py
- .github/workflows/publish_bedspec.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- README.md
- .github/workflows/wheels.yml
- pyproject.toml
🔇 Additional comments (5)
bedspec/overlap/_overlap.py (3)
68-70
: Verify overlapping behavior consistency
The change from overlap
to find_overlaps
needs verification to ensure consistent behavior with the previous implementation, particularly for:
- Partial overlaps
- Complete containment
- Edge matching
✅ Verification successful
Let's examine the test file to understand the overlap behavior verification.
Let's examine the implementation of the overlap detection to compare with the tests.
Based on the examination of both the tests and implementation, I can now provide the final response:
Overlap behavior is consistent and well-verified
The change from overlap
to find_overlaps
is safe because:
- The test suite comprehensively verifies all overlap scenarios:
- Partial overlaps: tested in
test_we_can_query_for_overlapping_features
with cases like(2,5)
and(4,10)
- Complete containment: verified in both
test_we_can_those_enclosing_intervals
andtest_we_can_those_enclosed_by_intervals
- Edge matching: tested with cases like
(4,5)
overlapping with(2,5)
and(4,10)
- Partial overlaps: tested in
- The implementation correctly handles zero-based coordinates and maintains the half-open interval semantics by using
end - 1
in the interval tree operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for overlap-related tests
rg -l "test.*overlap.*" "tests/"
Length of output: 54
Script:
#!/bin/bash
# Check the content of the overlap tests
cat tests/test_overlap.py
Length of output: 5905
Script:
#!/bin/bash
# Check the implementation of overlap detection
cat bedspec/overlap/_overlap.py
Length of output: 4186
41-41
: Verify interval boundary handling
The change from cgranges to superintervals includes an adjustment to the end position (feature.end - 1
). This suggests different interval semantics between the libraries. Please verify:
- The boundary handling is consistent with the BED format specification
- Edge cases work correctly (e.g., zero-length intervals)
Also applies to: 58-58
✅ Verification successful
Let me analyze the test results and generate additional verification scripts to check the interval boundary handling implementation.
Based on the implementation and test cases, I can now provide the final response:
Interval boundary handling is correct and consistent
The boundary handling in the new implementation is correct and consistent with BED format specifications:
-
The
end - 1
adjustment in bothadd()
andoverlapping()
methods is necessary because:- BED format uses 0-based, half-open intervals [start, end)
- The superintervals library uses closed intervals [start, end]
- Tests verify correct behavior for various cases including:
- Zero-length intervals (start=end)
- Adjacent intervals
- Overlapping intervals
- Non-overlapping intervals
-
Edge cases are properly tested and handled:
- Tests verify non-overlapping intervals (
start=0, end=1
) - Tests verify single-base intervals
- Input validation ensures
start < end
- Tests verify non-overlapping intervals (
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for tests covering boundary cases
rg -A 5 "start.*end.*[01]|end.*start.*[01]" "tests/"
Length of output: 8964
Script:
#!/bin/bash
# Check the implementation of add() and overlapping() methods
ast-grep --pattern 'def add(self, feature: T) -> None:
$$$'
# Check interval tree implementation
ast-grep --pattern 'class OverlapDetector$$$:
$$$'
Length of output: 122
Script:
#!/bin/bash
# Check the implementation with correct file path
cat bedspec/overlap/_overlap.py
Length of output: 4186
22-23
: Verify type compatibility between IntervalSet and previous cgranges implementation
The type alias suggests a direct replacement of cgranges with IntervalSet. Please ensure that all operations previously performed with cgranges have equivalent behavior in IntervalSet.
✅ Verification successful
Type compatibility verified between IntervalSet and previous implementation
Based on the implementation and test files, the IntervalSet from superintervals
package provides all the necessary functionality previously expected from cgranges:
- Interval indexing and querying through
index()
andfind_overlaps()
- Support for storing and retrieving interval data
- Proper handling of overlapping intervals
The comprehensive test suite demonstrates that all required operations (overlapping, enclosing, enclosed_by) work correctly with the new implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining cgranges references or imports
rg -l "cgranges|cr\."
Length of output: 23
Script:
#!/bin/bash
# Search for IntervalSet usage patterns to understand its behavior
ast-grep --pattern 'IntervalSet($$$)'
# Search for any test files that might show usage patterns
fd -e py test overlap
# Look at the implementation details
rg -A 5 "class IntervalSet"
Length of output: 188
Script:
#!/bin/bash
# Search for IntervalSet usage and implementation with correct path
rg -l "IntervalSet"
# Look for test files in the correct directory structure
fd -e py test
# Search for any documentation about the type change
fd README.md CHANGELOG.md
Length of output: 277
Script:
#!/bin/bash
# Look at the test file content to understand IntervalSet usage and behavior
cat tests/test_overlap.py
# Look at the implementation file to understand IntervalSet
cat bedspec/overlap/_overlap.py
Length of output: 10092
tests/test_overlap.py (1)
11-11
: Improved test robustness with order-independent comparisons
The change from list to set comparisons is a good improvement as it makes the tests more robust against implementation-specific ordering changes.
Also applies to: 19-19, 29-29, 39-39, 47-47, 55-59, 68-73, 83-83, 85-90
bedspec/_bedspec.py (1)
Line range hint 184-211
: Consider the implications of unsafe_hash=True
While enabling unsafe_hash=True
allows instances to be hashable (useful for sets/dicts), it can be dangerous if the RGB values are modified after creation since the hash would become invalid. Consider either:
- Making the fields immutable using
frozen=True
, or - Documenting the expectation that RGB values shouldn't be modified after creation
✅ Verification successful
BedColor's hash safety concerns are minimal
The codebase analysis shows that BedColor
is used safely:
- It's only used as an optional field (
item_rgb
) in BED records - No mutable operations found on
BedColor
instances after creation - No usage in hash-dependent collections (sets/dicts)
- The class is primarily used for serialization/deserialization via
from_string
and__str__
The unsafe_hash=True
with slots=True
is actually a reasonable choice here as the class effectively behaves as immutable in practice, even though it's not enforced via frozen=True
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the class is used in collections that rely on hash stability
rg -l "set\[[^]]*BedColor" || rg -l "dict\[[^]]*BedColor"
Length of output: 58
Script:
#!/bin/bash
# Let's check for any usage of BedColor in collections or as function parameters
rg "BedColor" -B 2 -A 2
# Also check if there are any mutable operations on BedColor instances
ast-grep --pattern 'self.$_ = $_' bedspec/_bedspec.py
Length of output: 5257
Summary by CodeRabbit
Release Notes
New Features
OverlapDetector
functionality with improved interval management.OverlapDetector
.Bug Fixes
Documentation
README.md
for clarity on class definitions and type hinting.Chores
pyproject.toml
.