Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove cgranges for superintervals #28

Merged
merged 1 commit into from
Dec 2, 2024
Merged

Conversation

clintval
Copy link
Owner

@clintval clintval commented Dec 2, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Updated workflows for building and publishing packages, including support for multiple Python versions.
    • Enhanced OverlapDetector functionality with improved interval management.
    • Added a new test case for verifying feature addition in the OverlapDetector.
  • Bug Fixes

    • Improved test assertions by switching from list to set comparisons for better robustness.
  • Documentation

    • Updated README.md for clarity on class definitions and type hinting.
  • Chores

    • Removed unused submodules and files to streamline project configuration.
    • Updated project dependencies and build system requirements in pyproject.toml.

Copy link

coderabbitai bot commented Dec 2, 2024

Walkthrough

The pull request introduces several updates across multiple workflow files in the repository. Key changes include the removal of the submodules option from various GitHub Actions workflows, updates to the dependency management with the introduction of superintervals, and the removal of the cgranges submodule and associated files. Additionally, modifications to class definitions in the bedspec package enhance data handling and equality behavior. The test suite has been updated to reflect these changes, ensuring that assertions are robust against order variations.

Changes

File Change Summary
.github/workflows/publish_bedspec.yml Updated environment variable POETRY_VERSION to 1.6, removed submodules option from checkout step, added conditional execution for unit-tests, included Poetry dependency installation in build-sdist, updated artifact handling in publish-to-pypi, and modified make-changelog and make-github-release jobs.
.github/workflows/tests.yml Removed with clause for submodules in checkout step, maintaining the overall workflow structure for testing.
.github/workflows/wheels.yml Simplified job name, standardized runs-on to ubuntu-latest, updated strategy matrix for multiple Python versions, removed cibuildwheel, changed wheel building command to pip wheel, and updated artifact naming conventions.
.gitmodules Removed configuration for the cgranges submodule.
README.md Updated Bed3Plus1 class definition by removing eq=True from @dataclass and changing my_custom_field type to use union type syntax.
bedspec/_bedspec.py Modified several classes to remove eq=True and add unsafe_hash=True in @dataclass, updated BedPE class to include properties and a new from_bed6 method.
bedspec/overlap/_overlap.py Changed interval management from cgranges to superintervals, updating method calls and variable names accordingly.
build.py Deleted file responsible for building Cython extensions for cgranges.
cgranges Deleted file containing subproject commit reference for cgranges.
pyproject.toml Updated version to 0.5.0, added superintervals dependency, removed cgranges package, simplified build system requirements, and updated configuration for mypy and ruff.
tests/test_overlap.py Changed assertions from list to set comparisons in multiple test functions, added new test function for adding features to OverlapDetector before and after queries.

Possibly related PRs

  • feat: simplify how the BedReader and BedWriter work #21: The changes in .github/workflows/tests.yml regarding the removal of the submodules option in the actions/checkout@v4 step are related to the main PR, which also removed this option in the same workflow file.
  • feat: use typeline for IO #24: The modifications in .github/workflows/publish_bedspec.yml that involve changes to job dependencies and structure are relevant as they reflect similar workflow adjustments made in the main PR.
  • docs: fixup docs a bit #25: The updates to the BedReader class in bedspec/_reader.py regarding type handling and decoding logic are related to the main PR, which also involves changes to the workflow and handling of dependencies.
  • feat: update the typeline library #27: The version bump in pyproject.toml from 0.3.0 to 0.4.0 is relevant as it indicates a progression in the project that aligns with the updates made in the main PR.

🐇 In the meadow where changes bloom,
New workflows dance, dispelling gloom.
With superintervals, we leap and bound,
As cgranges fades, new paths are found.
Our tests now sing in set's embrace,
A brighter future, a faster pace! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
bedspec/overlap/_overlap.py (2)

38-38: Consider explicit typing for IntervalSet

The defaultdict initialization with IntervalSet requires type ignore. Consider creating a type alias for better type safety:

+IntervalTree: TypeAlias = IntervalSet
-self._refname_to_tree: dict[Refname, IntervalSet] = defaultdict(IntervalSet)
+self._refname_to_tree: dict[Refname, IntervalTree] = defaultdict(IntervalTree)

65-67: Consider performance impact of find_overlaps

The switch to superintervals' find_overlaps method may have different performance characteristics. Consider benchmarking critical operations.

Consider adding performance benchmarks to compare the old and new implementations, especially for large datasets.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 352a3e8 and 8c1c6d3.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .github/workflows/publish_bedspec.yml (0 hunks)
  • .github/workflows/tests.yml (0 hunks)
  • .github/workflows/wheels.yml (1 hunks)
  • .gitmodules (0 hunks)
  • README.md (1 hunks)
  • bedspec/_bedspec.py (8 hunks)
  • bedspec/overlap/_overlap.py (4 hunks)
  • build.py (0 hunks)
  • cgranges (0 hunks)
  • pyproject.toml (4 hunks)
  • tests/test_overlap.py (5 hunks)
💤 Files with no reviewable changes (5)
  • .github/workflows/tests.yml
  • cgranges
  • .gitmodules
  • build.py
  • .github/workflows/publish_bedspec.yml
🔇 Additional comments (10)
.github/workflows/wheels.yml (2)

10-11: LGTM: Simplified wheel building configuration

The changes appropriately simplify the wheel building process by:

  • Using a standard ubuntu runner
  • Supporting multiple Python versions (3.10, 3.11, 3.12)

Also applies to: 14-14


25-25: Verify wheel building command

The simplified pip wheel command aligns with removing C extensions, but ensure all package files are included.

✅ Verification successful

Wheel building command is appropriate for this package

The simplified pip wheel command is correct for this package because:

  • The project uses Poetry as the build backend (poetry.core.masonry.api)
  • All Python package files are properly included via the packages configuration in pyproject.toml
  • No C extensions or additional files need to be included (only Python modules and standard metadata)
  • The package structure is straightforward with all modules under the bedspec directory
  • The include directive in pyproject.toml properly specifies additional files (CONTRIBUTING.md, LICENSE)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if all package files are included in the wheel
pip wheel --no-deps -w wheelhouse . --dry-run

Length of output: 141


Script:

#!/bin/bash
# Check package structure and files
fd -t f -e py . | grep -v "test"

# Check setup.py/pyproject.toml for package configuration
fd "setup.py|pyproject.toml" -t f

Length of output: 217


Script:

#!/bin/bash
# Check for any MANIFEST.in file that controls package contents
fd "MANIFEST.in" -t f

# Look for any build system configuration
cat pyproject.toml 2>/dev/null || echo "pyproject.toml not found"
cat setup.py 2>/dev/null || echo "setup.py not found"

Length of output: 5499


Script:

#!/bin/bash
# Check the workflow file for context
cat .github/workflows/wheels.yml

Length of output: 804

README.md (1)

110-110: LGTM: Updated dataclass example with modern syntax

The example correctly demonstrates:

  • Simplified dataclass usage without explicit eq parameter
  • Modern type annotation using union syntax (float | None)
bedspec/overlap/_overlap.py (2)

9-11: Verify type checking suppressions

The type ignore comments suggest potential type checking issues with superintervals. Consider:

  1. Opening an issue with superintervals for type stubs
  2. Creating local type stubs for critical functionality
✅ Verification successful

Type checking suppressions are appropriate for now

The type ignore comments are currently necessary as:

  • No type stubs (.pyi files) exist for superintervals in the package or project
  • This is the only file importing superintervals, suggesting a localized type checking concern
  • The suppressions are specific and well-scoped to the import statement

Consider opening an issue with superintervals for type stubs, but the current approach is a reasonable temporary solution.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if superintervals has type stubs in newer versions
pip index versions superintervals | head -n 5

Length of output: 141


Script:

#!/bin/bash
# Check if superintervals package has py.typed marker or type stub files
fd -e pyi -e typed superintervals || echo "No type stubs found"

# Check if there are any other files importing superintervals and how they handle types
rg "from superintervals import" -B 2 -A 2

# Look for any existing type stub files in the project
fd -e pyi . || echo "No type stub files in project"

Length of output: 430


52-52: Verify interval boundary handling

The change from cgranges to superintervals includes a subtle modification in end position handling (feature.end - 1). Ensure this maintains the same semantics as the previous implementation.

Also applies to: 55-55

✅ Verification successful

I'll search for test cases and implementation details related to interval overlaps and boundary handling to verify the semantics.


Interval boundary handling is correctly implemented

The verification shows that the end position handling (feature.end - 1) in superintervals is consistent with the test cases. The test suite includes comprehensive boundary checks that verify:

  • Non-overlapping intervals ([0,1] with [2,5])
  • Adjacent intervals ([2,5] with [5,6])
  • Overlapping intervals ([2,5] with [4,10])

All these test cases pass with the current implementation, confirming that the boundary semantics are maintained correctly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all interval boundary tests
rg -l 'test.*overlap.*boundary' tests/

Length of output: 40


Script:

#!/bin/bash
# Search for tests related to overlapping intervals and boundary cases
rg -A 5 "test.*overlap" tests/

# Search for interval-related test assertions
rg -A 2 "assert.*overlap" tests/

# Look for interval boundary handling in the implementation
rg -A 3 "feature.end" bedspec/

Length of output: 6946

pyproject.toml (2)

30-30: LGTM: Package and dependency updates align with architectural changes.

The removal of cgranges package and addition of superintervals as a dependency is consistent with the architectural shift in interval management.

Also applies to: 34-34


47-47: Verify build system compatibility.

The simplified build system requirements might affect the package installation process.

✅ Verification successful

Poetry-core is sufficient for this package's build requirements

Based on the analysis of the project files:

  • The package is a pure Python package with no compiled extensions
  • Dependencies are standard Python packages without compilation requirements
  • No setup.py or setup.cfg files are present
  • No build instructions indicating special compilation needs in README.md
  • Project uses standard Poetry configuration with typical Python development tools
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that poetry-core can handle the package build without additional requirements

# Test: Check if there are any Cython or compiled extensions remaining
fd -e pyx -e pxd -e pyd -e so

Length of output: 31


Script:

#!/bin/bash
# Let's check for any build-related files and dependencies that might require special handling
fd -t f "setup\.py|setup\.cfg"

# Check for any native/compiled dependencies in requirements
cat pyproject.toml

# Look for any build instructions or compilation steps in README
fd -t f "README\.md" --exec cat {}

Length of output: 8681

tests/test_overlap.py (2)

11-11: LGTM: Improved test assertions using set comparisons.

The change from list to set comparisons makes the tests more robust by ensuring they pass regardless of the order of elements. This is particularly important when testing interval operations where the order might not be guaranteed or relevant.

Also applies to: 19-19, 29-29, 39-39, 47-47, 55-60, 68-73, 83-83, 85-90


110-123: LGTM: Good test coverage for detector state management.

The new test function effectively verifies that the OverlapDetector maintains correct state and behavior when adding intervals both before and after performing queries.

bedspec/_bedspec.py (1)

184-184: Verify implications of unsafe_hash=True.

The addition of unsafe_hash=True to all BED classes allows them to be used in sets and as dictionary keys, which aligns with the test changes. However, this could have implications:

  1. Performance impact on hash computation
  2. Potential issues if mutable fields are modified after adding to sets/dicts

Consider documenting the implications of unsafe_hash=True in the class docstrings, particularly regarding the behavior when objects are used as dictionary keys or set elements.

Also applies to: 212-212, 220-220, 229-229, 239-239, 250-250, 262-262, 304-304, 314-314

✅ Verification successful

The use of unsafe_hash=True is safe in this implementation

The codebase analysis reveals that all BED classes are implemented safely with respect to hashing:

  1. All classes use slots=True, preventing dynamic attribute addition
  2. All fields are immutable types (str, int)
  3. No mutable default values or mutable fields are present
  4. Fields are properly defined with field(kw_only=True) where needed

The implementation ensures hash stability and the classes can be safely used as dictionary keys or in sets without the risks typically associated with unsafe_hash=True.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for mutable field usage that could be affected by unsafe_hash

# Test: Look for mutable field definitions in BED classes
rg -A 1 "field\(.*default=" bedspec/

Length of output: 38


Script:

#!/bin/bash
# Let's check for field definitions in dataclasses more broadly
ast-grep --pattern 'class $_ {
  $$$
  field($$$)
  $$$
}'

# Also check for any mutable attributes in these classes
rg "class.*:.*\n.*=.*\[" bedspec/

# And look at the actual dataclass definitions
rg "@dataclass" -A 5 bedspec/

Length of output: 5173

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
bedspec/overlap/_overlap.py (1)

9-11: Consider addressing type checking suppressions

The type ignore comments suggest potential type checking issues with the superintervals package. Consider:

  1. Opening an issue with the superintervals package to add type stubs
  2. Creating local type stubs for the package
tests/test_overlap.py (2)

110-123: Good addition of dynamic update test

The new test test_we_can_add_to_the_overlap_detector_after_and_before_queries is valuable as it verifies the detector's ability to handle dynamic updates correctly.

Consider adding additional test cases for:

  1. Multiple additions between queries
  2. Updates to existing intervals
  3. Cross-reference overlaps

Line range hint 1-123: Consider additional test coverage

While the current test suite is comprehensive for functionality, consider adding:

  1. Error case tests (e.g., invalid intervals)
  2. Performance tests for large datasets
  3. Memory usage tests for large datasets
bedspec/_bedspec.py (3)

Line range hint 212-261: LGTM! Consider documenting hash behavior

The changes to make fields keyword-only and enable slots are good improvements. The inheritance hierarchy and protocol implementations are well-structured.

Consider adding a docstring note about the hash behavior, especially since these classes might be used in collections.

Example docstring addition for Bed3:

"""A BED3 record that describes a contiguous linear interval.

Note: Instances are hashable and can be used in sets/dicts, but field values
should not be modified after creation to maintain hash stability.
"""

Line range hint 262-303: Consider performance optimization for block validation

The validation logic is thorough but could be optimized by:

  1. Using all() instead of multiple if any() checks
  2. Combining related validations to reduce iterations
     def __post_init__(self) -> None:
         """Validate this BED12 record."""
         super(Bed12, self).__post_init__()
         if (self.thick_start is None) != (self.thick_end is None):
             raise ValueError("thick_start and thick_end must both be None or both be set!")
         if self.block_count is None:
             if self.block_sizes is not None or self.block_starts is not None:
                 raise ValueError("block_count, block_sizes, block_starts must all be set or unset!")
         else:
             if self.block_sizes is None or self.block_starts is None:
                 raise ValueError("block_count, block_sizes, block_starts must all be set or unset!")
             if self.block_count <= 0:
                 raise ValueError("When set, block_count must be greater than or equal to 1!")
             if self.block_count != len(self.block_sizes) or self.block_count != len(
                 self.block_starts
             ):
                 raise ValueError("Length of block_sizes and block_starts must equal block_count!")
             if self.block_starts[0] != 0:
                 raise ValueError("block_starts must start with 0!")
-            if any(size <= 0 for size in self.block_sizes):
+            if not all(size > 0 for size in self.block_sizes):
                 raise ValueError("All sizes in block_size must be greater than or equal to one!")
             if (self.start + self.block_starts[-1] + self.block_sizes[-1]) != self.end:
                 raise ValueError("The last defined block's end must be equal to the BED end!")

Line range hint 304-359: LGTM! Consider adding tests for from_bed6

The new from_bed6 class method is a good addition. The implementation correctly handles all fields and maintains type safety.

Would you like me to help generate comprehensive test cases for the from_bed6 method? The tests should cover:

  1. Basic conversion from two Bed6 instances
  2. Handling of optional fields (name, score)
  3. Preservation of strand information
  4. Edge cases with None values
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c1c6d3 and e233333.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .github/workflows/publish_bedspec.yml (0 hunks)
  • .github/workflows/tests.yml (0 hunks)
  • .github/workflows/wheels.yml (1 hunks)
  • .gitmodules (0 hunks)
  • README.md (1 hunks)
  • bedspec/_bedspec.py (8 hunks)
  • bedspec/overlap/_overlap.py (5 hunks)
  • build.py (0 hunks)
  • cgranges (0 hunks)
  • pyproject.toml (4 hunks)
  • tests/test_overlap.py (5 hunks)
💤 Files with no reviewable changes (5)
  • cgranges
  • .gitmodules
  • .github/workflows/tests.yml
  • build.py
  • .github/workflows/publish_bedspec.yml
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • .github/workflows/wheels.yml
  • pyproject.toml
🔇 Additional comments (5)
bedspec/overlap/_overlap.py (3)

68-70: Verify overlapping behavior consistency

The change from overlap to find_overlaps needs verification to ensure consistent behavior with the previous implementation, particularly for:

  1. Partial overlaps
  2. Complete containment
  3. Edge matching
✅ Verification successful

Let's examine the test file to understand the overlap behavior verification.


Let's examine the implementation of the overlap detection to compare with the tests.


Based on the examination of both the tests and implementation, I can now provide the final response:

Overlap behavior is consistent and well-verified

The change from overlap to find_overlaps is safe because:

  • The test suite comprehensively verifies all overlap scenarios:
    • Partial overlaps: tested in test_we_can_query_for_overlapping_features with cases like (2,5) and (4,10)
    • Complete containment: verified in both test_we_can_those_enclosing_intervals and test_we_can_those_enclosed_by_intervals
    • Edge matching: tested with cases like (4,5) overlapping with (2,5) and (4,10)
  • The implementation correctly handles zero-based coordinates and maintains the half-open interval semantics by using end - 1 in the interval tree operations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for overlap-related tests
rg -l "test.*overlap.*" "tests/"

Length of output: 54


Script:

#!/bin/bash
# Check the content of the overlap tests
cat tests/test_overlap.py

Length of output: 5905


Script:

#!/bin/bash
# Check the implementation of overlap detection
cat bedspec/overlap/_overlap.py

Length of output: 4186


41-41: Verify interval boundary handling

The change from cgranges to superintervals includes an adjustment to the end position (feature.end - 1). This suggests different interval semantics between the libraries. Please verify:

  1. The boundary handling is consistent with the BED format specification
  2. Edge cases work correctly (e.g., zero-length intervals)

Also applies to: 58-58

✅ Verification successful

Let me analyze the test results and generate additional verification scripts to check the interval boundary handling implementation.


Based on the implementation and test cases, I can now provide the final response:

Interval boundary handling is correct and consistent

The boundary handling in the new implementation is correct and consistent with BED format specifications:

  1. The end - 1 adjustment in both add() and overlapping() methods is necessary because:

    • BED format uses 0-based, half-open intervals [start, end)
    • The superintervals library uses closed intervals [start, end]
    • Tests verify correct behavior for various cases including:
      • Zero-length intervals (start=end)
      • Adjacent intervals
      • Overlapping intervals
      • Non-overlapping intervals
  2. Edge cases are properly tested and handled:

    • Tests verify non-overlapping intervals (start=0, end=1)
    • Tests verify single-base intervals
    • Input validation ensures start < end
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests covering boundary cases
rg -A 5 "start.*end.*[01]|end.*start.*[01]" "tests/"

Length of output: 8964


Script:

#!/bin/bash
# Check the implementation of add() and overlapping() methods
ast-grep --pattern 'def add(self, feature: T) -> None:
    $$$'

# Check interval tree implementation
ast-grep --pattern 'class OverlapDetector$$$:
    $$$'

Length of output: 122


Script:

#!/bin/bash
# Check the implementation with correct file path
cat bedspec/overlap/_overlap.py

Length of output: 4186


22-23: Verify type compatibility between IntervalSet and previous cgranges implementation

The type alias suggests a direct replacement of cgranges with IntervalSet. Please ensure that all operations previously performed with cgranges have equivalent behavior in IntervalSet.

✅ Verification successful

Type compatibility verified between IntervalSet and previous implementation

Based on the implementation and test files, the IntervalSet from superintervals package provides all the necessary functionality previously expected from cgranges:

  • Interval indexing and querying through index() and find_overlaps()
  • Support for storing and retrieving interval data
  • Proper handling of overlapping intervals

The comprehensive test suite demonstrates that all required operations (overlapping, enclosing, enclosed_by) work correctly with the new implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining cgranges references or imports
rg -l "cgranges|cr\." 

Length of output: 23


Script:

#!/bin/bash
# Search for IntervalSet usage patterns to understand its behavior
ast-grep --pattern 'IntervalSet($$$)'

# Search for any test files that might show usage patterns
fd -e py test overlap

# Look at the implementation details
rg -A 5 "class IntervalSet"

Length of output: 188


Script:

#!/bin/bash
# Search for IntervalSet usage and implementation with correct path
rg -l "IntervalSet"

# Look for test files in the correct directory structure
fd -e py test

# Search for any documentation about the type change
fd README.md CHANGELOG.md

Length of output: 277


Script:

#!/bin/bash
# Look at the test file content to understand IntervalSet usage and behavior
cat tests/test_overlap.py

# Look at the implementation file to understand IntervalSet
cat bedspec/overlap/_overlap.py

Length of output: 10092

tests/test_overlap.py (1)

11-11: Improved test robustness with order-independent comparisons

The change from list to set comparisons is a good improvement as it makes the tests more robust against implementation-specific ordering changes.

Also applies to: 19-19, 29-29, 39-39, 47-47, 55-59, 68-73, 83-83, 85-90

bedspec/_bedspec.py (1)

Line range hint 184-211: Consider the implications of unsafe_hash=True

While enabling unsafe_hash=True allows instances to be hashable (useful for sets/dicts), it can be dangerous if the RGB values are modified after creation since the hash would become invalid. Consider either:

  1. Making the fields immutable using frozen=True, or
  2. Documenting the expectation that RGB values shouldn't be modified after creation
✅ Verification successful

BedColor's hash safety concerns are minimal

The codebase analysis shows that BedColor is used safely:

  • It's only used as an optional field (item_rgb) in BED records
  • No mutable operations found on BedColor instances after creation
  • No usage in hash-dependent collections (sets/dicts)
  • The class is primarily used for serialization/deserialization via from_string and __str__

The unsafe_hash=True with slots=True is actually a reasonable choice here as the class effectively behaves as immutable in practice, even though it's not enforced via frozen=True.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the class is used in collections that rely on hash stability
rg -l "set\[[^]]*BedColor" || rg -l "dict\[[^]]*BedColor"

Length of output: 58


Script:

#!/bin/bash
# Let's check for any usage of BedColor in collections or as function parameters
rg "BedColor" -B 2 -A 2

# Also check if there are any mutable operations on BedColor instances
ast-grep --pattern 'self.$_ = $_' bedspec/_bedspec.py

Length of output: 5257

@clintval clintval merged commit 40110e5 into main Dec 2, 2024
7 checks passed
@clintval clintval deleted the cv_remove_cgranges branch December 2, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant