-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use typeline for IO #24
Conversation
@coderabbitai full review please! |
@clintval: I will perform a full review of the pull request. ✅ Actions performedFull review triggered. |
Warning Rate limit exceeded@clintval has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 23 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis pull request introduces several changes across multiple files in the repository. A new code owner, Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (28)
.github/CODEOWNERS (1)
1-1
: Consider adding a team or additional maintainers for redundancy.The CODEOWNERS syntax is correct, but having a single code owner (@clintval) for all files (
*
) could create:
- A review bottleneck when the owner is unavailable
- A single point of failure (bus factor of 1)
Consider:
- Adding a team (e.g.,
* @org/bedspec-maintainers
)- Or including additional maintainers as backup reviewers
bedspec/__init__.py (1)
22-43
: Consider grouping related exports in all.While the
__all__
list is comprehensive, consider grouping related items together (e.g., all Bed types, interfaces, IO classes) and adding section comments for better organization.Here's a suggested reorganization:
__all__ = [ + # Core Bed implementations "Bed2", "Bed3", "Bed4", "Bed5", "Bed6", "Bed12", - "BedColor", - "BedGraph", - "BedLike", "BedPE", - "BedStrand", - "BedType", - "Named", "PairBed", "PointBed", - "ReferenceSpan", "SimpleBed", + # Protocols and Types + "BedLike", + "BedType", + "Named", + "ReferenceSpan", "Stranded", + # Enums and Constants + "BedColor", + "BedStrand", + # Special formats + "BedGraph", + # I/O Classes "BedReader", "BedWriter", ]bedspec/_writer.py (3)
11-12
: Enhance class docstring with more details.The current docstring is minimal. Consider expanding it to include:
- Detailed description of the class's purpose
- Usage examples
- Any configuration options or parameters
Example enhancement:
- """A writer of BED records.""" + """A writer for BED (Browser Extensible Data) format records. + + This class provides functionality to write BED records to a file, supporting + various BED formats and custom encodings for different field types. + + Example: + >>> with BedWriter(path) as writer: + ... writer.write(bed_record) + ... writer.write_comment("Track name=example") + """
14-23
: Consider improving type hints and addressing the type checker warning.The implementation is correct, but there are two potential improvements:
- The type hint for collections could be more specific
- The pyright ignore comment suggests an underlying type issue that could be addressed
Consider this enhancement:
- def _encode(self, item: Any) -> Any: + def _encode(self, item: None | BedColor | list[Any] | set[Any] | tuple[Any, ...] | Any) -> str | Any: """A callback for over_readriding the encoding of builtin types and custom types.""" if item is None: return "." if isinstance(item, (list, set, tuple)): - return ",".join(map(str, item)) # pyright: ignore[reportUnknownArgumentType] + return ",".join(str(x) for x in item) if isinstance(item, BedColor): return str(item) return super()._encode(item=item)
25-29
: Consider enhancing error handling and removing unused assignment.The implementation works but could be improved in several ways:
- Remove the unused
_
assignment- Add error handling for write operations
- Add return type hint
Consider this enhancement:
- def write_comment(self, comment: str) -> None: + def write_comment(self, comment: str) -> None: """Write a comment to the BED output. + + Args: + comment: The comment text to write. + + Raises: + IOError: If writing to the file fails. + """ for line in comment.splitlines(): prefix = "" if any(line.startswith(prefix) for prefix in COMMENT_PREFIXES) else "# " - _ = self._handle.write(f"{prefix}{line}\n") + self._handle.write(f"{prefix}{line}\n").github/workflows/tests.yml (2)
1-1
: Consider a more comprehensive workflow name.The name "unit tests" might be too specific as the workflow appears to run comprehensive checks (via
check-all
task). Consider a more accurate name like "code-quality" or "code-checks" to better reflect all the checks being performed.-name: unit tests +name: code-quality
Line range hint
13-20
: Consider adding workflow optimizations for better reliability and efficiency.The workflow could benefit from additional configurations:
- Add fail-fast strategy to stop all matrix jobs if one fails
- Set timeout limits to prevent hung jobs
- Add concurrency control to cancel redundant workflow runs
jobs: Tests: runs-on: ubuntu-latest + timeout-minutes: 10 strategy: + fail-fast: true matrix: PYTHON_VERSION: ["3.10", "3.11", "3.12"] + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true.github/workflows/publish_bedspec.yml (1)
Line range hint
52-63
: Consider adding dependency caching for improved efficiency.The poetry installation and dependency steps could benefit from caching to speed up the workflow execution.
Consider adding cache steps:
- uses: actions/setup-python@v5 with: python-version: 3.12 + - name: Cache poetry dependencies + uses: actions/cache@v3 + with: + path: .venv + key: ${{ runner.os }}-poetry-${{ hashFiles('poetry.lock') }} + restore-keys: | + ${{ runner.os }}-poetry- - name: Install poetry run: |README.md (3)
34-41
: Consider enhancing the example with file content visualization.While the example correctly demonstrates BedWriter usage, it would be more informative to show the actual content written to the file.
>>> with BedWriter.from_path(temp_file.name, Bed3) as writer: ... writer.write(bed) +>>> +>>> # Show the content of the file: +>>> print(temp_file.read()) +chr1 2 8
77-88
: Consider expanding the example to demonstrate more operations.While the example effectively shows basic overlap detection, it would be beneficial to demonstrate the other operations mentioned below (enclosed_by, enclosing, overlapping).
>>> detector.overlaps(my_feature) True +>>> +>>> # Demonstrate other operations: +>>> detector.overlapping(my_feature) +[Bed3(refname='chr1', start=1, end=4)] +>>> detector.enclosed_by(Bed3("chr1", start=0, end=10)) +[Bed3(refname='chr1', start=1, end=4), Bed3(refname='chr1', start=5, end=9)]
105-115
: Consider adding usage example for the custom BED type.While the class definition is clear, showing how to create and use a Bed3Plus1 instance would make the example more complete.
... end: int ... my_custom_field: float | None +>>> +>>> # Example usage: +>>> custom_bed = Bed3Plus1("chr1", 100, 200, my_custom_field=42.0) +>>> print(custom_bed) +Bed3Plus1(refname='chr1', start=100, end=200, my_custom_field=42.0)bedspec/overlap/_overlap.py (3)
Line range hint
21-33
: Consider enhancing the docstring.While the docstring is good, it could be even better by:
- Adding type information for the required properties
- Including example usage
- Documenting the performance characteristics
Here's a suggested enhancement:
"""Detects and returns overlaps between a collection of reference features and query feature. The overlap detector may be built with any feature-like Python object that has the following properties: - * `refname`: The reference sequence name - * `start`: A 0-based start position - * `end`: A 0-based half-open end position + * `refname` (str): The reference sequence name + * `start` (int): A 0-based start position + * `end` (int): A 0-based half-open end position + + Performance characteristics: + * Building: O(n log n) where n is the number of features + * Querying: O(log n + m) where m is the number of overlapping features + + Example: + >>> detector = OverlapDetector() + >>> detector.add(Feature("chr1", 100, 200)) + >>> list(detector.overlapping(Feature("chr1", 150, 250))) """
34-39
: Consider improving type annotations for cgranges.The implementation is solid, but there are several type ignore comments for cgranges. Consider:
- Creating type stubs for cgranges
- Using a more specific type for the tree dictionary
Example type stub for cgranges:
# cgranges.pyi class cgranges: def add(self, refname: str, start: int, end: int, idx: int) -> None: ... def index(self) -> None: ... def overlap(self, refname: str, start: int, end: int) -> list[tuple[str, int, int, int]]: ...
Line range hint
56-81
: Add error handling for invalid queries.The query methods could benefit from additional error handling:
- Validate feature coordinates (start <= end)
- Handle None values
- Add logging for performance monitoring
Example implementation:
def overlapping(self, feature: ReferenceSpan) -> Iterator[ReferenceSpanType]: """Yields all the overlapping features for a given query feature.""" + if feature is None: + raise ValueError("Query feature cannot be None") + if feature.start > feature.end: + raise ValueError("Invalid coordinates: start > end") + refname: Refname = feature.refnametests/test_reader.py (6)
14-29
: Consider adding error case tests.While the happy path is well tested, consider adding tests for:
- Invalid file paths
- Permission denied scenarios
- Malformed BED records
def test_bed_reader_raises_on_invalid_path() -> None: with pytest.raises(FileNotFoundError): BedReader.from_path("non_existent.bed", Bed3)
31-47
: LGTM! Comprehensive comment handling tests.Good coverage of different comment types and empty lines. Consider adding a test for inline comments if supported by the format.
49-57
: Consider adding more test cases for string fields.While null handling is tested, consider adding tests for:
- Actual string values
- Empty strings
- Special characters in strings
59-67
: Add edge cases for numeric fields.Consider adding tests for:
- Maximum/minimum values
- Zero values
- Negative numbers (if applicable)
- Invalid numeric formats
69-77
: Add error handling tests for context manager.Consider testing:
- Exception handling within the context
- Resource cleanup after exceptions
def test_bed_reader_context_manager_handles_errors(tmp_path: Path) -> None: path = tmp_path / "test.bed" path.write_text("invalid data") with pytest.raises(ValueError): with BedReader.from_path(path, Bed4) as reader: list(reader) # Should raise on invalid data
79-103
: LGTM! Comprehensive roundtrip testing.Excellent test coverage for complex BED12 records with multiple field types. Consider adding tests for:
- Empty lists in block_sizes and block_starts
- Color edge cases (max values)
- Validation of block_count matching actual block sizes
pyproject.toml (1)
143-146
: Consider enhancing pyright configuration.While the basic configuration is good, consider adding more pyright-specific settings for stricter type checking:
[tool.pyright] include = ["bedspec/"] reportAny = false pythonVersion = "3.10" +strict = ["bedspec/"] +typeCheckingMode = "strict"tests/test_writer.py (4)
17-52
: Consider adding file existence check.The parameterized test is well-structured and covers all BED types comprehensively. Consider adding an assertion to verify file existence before reading its contents:
def test_bed_writer_can_write_all_bed_types(bed: BedLike, expected: str, tmp_path: Path) -> None: """Test that the BED writer can write all BED types.""" with open(tmp_path / "test.bed", "w") as handle: writer: BedWriter = BedWriter(handle, type(bed)) writer.write(bed) + test_file = tmp_path / "test.bed" + assert test_file.exists(), "BED file was not created" + assert test_file.stat().st_size > 0, "BED file is empty" - assert Path(tmp_path / "test.bed").read_text() == expected + assert test_file.read_text() == expected
80-102
: Consider consolidating type validation tests.The generic and non-generic type validation tests have significant overlap in functionality. Consider consolidating them using parametrization:
-def test_bed_writer_remembers_the_type_it_will_write(tmp_path: Path) -> None: - """Test that the BED writer remembers the type it can only write.""" - with open(tmp_path / "test.bed", "w") as handle: - writer: BedWriter = BedWriter(handle, Bed2) - writer.write(Bed2(refname="chr1", start=1)) - with pytest.raises( - ValueError, - match="Expected Bed2 but found Bed3!", - ): - writer.write(Bed3(refname="chr1", start=1, end=2)) - - -def test_bed_writer_remembers_the_type_it_will_write_generic(tmp_path: Path) -> None: - """Test that the generically parameterized BED writer remembers the type it can only write.""" - with open(tmp_path / "test.bed", "w") as handle: - writer = BedWriter(handle, Bed2) - writer.write(Bed2("chr1", 1)) - with pytest.raises( - ValueError, - match="Expected Bed2 but found Bed3!", - ): - writer.write(Bed3(refname="chr1", start=1, end=2)) # type: ignore[arg-type] +@pytest.mark.parametrize( + "initial_bed,invalid_bed,type_hint", + [ + (Bed2(refname="chr1", start=1), Bed3(refname="chr1", start=1, end=2), True), + (Bed2("chr1", 1), Bed3(refname="chr1", start=1, end=2), False), + ], +) +def test_bed_writer_type_validation( + tmp_path: Path, + initial_bed: Bed2, + invalid_bed: Bed3, + type_hint: bool, +) -> None: + """Test that the BED writer enforces type validation for both generic and non-generic usage.""" + with open(tmp_path / "test.bed", "w") as handle: + writer = BedWriter(handle, Bed2) + if type_hint: + writer: BedWriter + writer.write(initial_bed) + with pytest.raises(ValueError, match="Expected Bed2 but found Bed3!"): + if not type_hint: + # type: ignore[arg-type] + writer.write(invalid_bed)
104-139
: Consider adding edge case tests for comments.The comment writing tests are thorough but could benefit from additional edge cases:
def test_bed_writer_write_comment_edge_cases(tmp_path: Path) -> None: """Test edge cases for comment writing.""" with open(tmp_path / "test.bed", "w") as handle: writer = BedWriter(handle, Bed2) # Empty comment writer.write_comment("") # Comment with only whitespace writer.write_comment(" ") # Comment with special characters writer.write_comment("!@#$%^&*()") writer.write(Bed2(refname="chr1", start=1)) expected = "# \n# \n# !@#$%^&*()\nchr1\t1\n" assert Path(tmp_path / "test.bed").read_text() == expected
141-148
: Add verification for proper file closure.While the context manager test verifies writing functionality, it should also confirm that the file handle is properly closed:
def test_bed_writer_can_be_used_as_context_manager(tmp_path: Path) -> None: """Test that the BED writer can be used as a context manager.""" + file_handle = None with BedWriter(open(tmp_path / "test.bed", "w"), Bed2) as handle: + file_handle = handle._handle # Store reference to underlying file handle handle.write(Bed2(refname="chr1", start=1)) handle.write(Bed2(refname="chr2", start=2)) expected = "chr1\t1\nchr2\t2\n" assert Path(tmp_path / "test.bed").read_text() == expected + assert file_handle.closed, "File handle should be closed after exiting context"tests/test_bedspec.py (1)
155-155
: Consider adding edge case tests for length calculations.The transition to using
len()
is good and the current test cases validate basic lengths well. Consider adding tests for:
- Empty intervals (if allowed)
- Large intervals to check for any integer overflow issues
- Edge cases where start and end positions are at chromosome boundaries
def test_bed_length_edge_cases() -> None: """Test edge cases for BED length calculations.""" # Large interval assert len(Bed3(refname="chr1", start=1, end=1_000_000)) == 999_999 # Maximum possible coordinates (if applicable to your use case) assert len(Bed3(refname="chr1", start=1, end=2**31-1)) == 2**31-2Also applies to: 172-174
bedspec/_bedspec.py (2)
35-35
: Consider usingStrEnum
forBedStrand
The
BedStrand
class is currently defined asclass BedStrand(str, Enum)
. If you are targeting Python 3.11 or usingtyping_extensions
, you can useStrEnum
for clearer intent and to avoid manually inheriting fromstr
. This enhances readability and aligns with modern Python practices.Apply this diff to use
StrEnum
:-from enum import Enum +from enum import StrEnum -class BedStrand(str, Enum): +class BedStrand(StrEnum):Ensure you import
StrEnum
appropriately:-from enum import Enum +from enum import StrEnum
46-49
: Handle unexpected values inBedStrand.opposite
methodCurrently, the
opposite
method assumesself
is eitherBedStrand.Positive
orBedStrand.Negative
. To make the code more robust against future additions toBedStrand
, consider adding anelse
clause to handle unexpected values. This would help in catching errors if new strands are added in the future.Apply this diff to handle unexpected values:
def opposite(self) -> "BedStrand": if self is BedStrand.Positive: return BedStrand.Negative elif self is BedStrand.Negative: return BedStrand.Positive + else: + raise ValueError(f"Invalid BedStrand value: {self}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (17)
.github/CODEOWNERS
(1 hunks).github/workflows/publish_bedspec.yml
(3 hunks).github/workflows/tests.yml
(2 hunks)README.md
(3 hunks)bedspec/__init__.py
(1 hunks)bedspec/_bedspec.py
(11 hunks)bedspec/_io.py
(0 hunks)bedspec/_reader.py
(1 hunks)bedspec/_writer.py
(1 hunks)bedspec/overlap/__init__.py
(1 hunks)bedspec/overlap/_overlap.py
(2 hunks)pyproject.toml
(9 hunks)tests/test_bedspec.py
(4 hunks)tests/test_io.py
(0 hunks)tests/test_overlap.py
(1 hunks)tests/test_reader.py
(1 hunks)tests/test_writer.py
(1 hunks)
💤 Files with no reviewable changes (2)
- bedspec/_io.py
- tests/test_io.py
🔇 Additional comments (41)
bedspec/overlap/__init__.py (2)
1-1
: LGTM! Clean import statement.
The relative import follows Python best practices and clearly indicates the module's dependency.
3-5
: LGTM! Well-structured public API definition.
The __all__
declaration explicitly defines the module's public interface, which is a good practice for controlling what gets exported during from module import *
operations. This change improves API clarity and encapsulation.
Let's verify that this change doesn't break existing imports:
✅ Verification successful
Change is safe - no impact on existing imports
The git history shows that QueryReferenceSpanType
was previously imported but not included in __all__
. The test file and codebase search confirm that there are no direct imports of this type. The only imports found are for OverlapDetector
, which remains in the public API.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any direct imports of QueryReferenceSpanType that might be affected
# by its removal from the public API
# Search for any imports of QueryReferenceSpanType
rg "from.*bedspec\.overlap.*import.*QueryReferenceSpanType" || \
rg "from.*bedspec\.overlap.*import.*\*.*" || \
rg "import.*bedspec\.overlap"
Length of output: 137
Script:
#!/bin/bash
# Let's try a broader search to understand the usage patterns
# 1. Find all Python files that might import from bedspec.overlap
fd '\.py$' --exec grep -l "bedspec.overlap" {} \;
# 2. Look for any references to QueryReferenceSpanType in Python files
rg "QueryReferenceSpanType" --type py
# 3. Check the actual content of the overlap module to understand what was previously exported
fd -g 'overlap/__init__.py' --exec cat {} \;
Length of output: 162
Script:
#!/bin/bash
# Let's examine the test file that imports from bedspec.overlap
cat tests/test_overlap.py
# Also check the git history to see what was previously in __all__
git log -p bedspec/overlap/__init__.py
Length of output: 6335
bedspec/__init__.py (2)
19-20
: Approve splitting IO functionality into dedicated modules.
Moving BedReader
and BedWriter
from ._io
to dedicated ._reader
and ._writer
modules follows the Single Responsibility Principle, making the codebase more maintainable.
Let's verify that the old _io.py
module has been removed and the new modules exist:
#!/bin/bash
# Description: Verify IO module restructuring
# Test 1: Old _io.py should not exist
# Test 2: New _reader.py and _writer.py should exist with the moved classes
# Check if old _io.py still exists (should not)
fd -t f "_io.py" bedspec/
# Verify BedReader in new location
ast-grep --pattern 'class BedReader:
$$$
'
# Verify BedWriter in new location
ast-grep --pattern 'class BedWriter:
$$$
'
12-12
: Verify impact of GenomicSpan → ReferenceSpan rename.
The renaming of GenomicSpan
to ReferenceSpan
and addition of BedType
appear to be part of an API evolution. The __all__
list is properly updated to reflect these changes.
Let's verify the impact of these changes:
Also applies to: 16-16, 22-43
✅ Verification successful
The rename from GenomicSpan
to ReferenceSpan
has been properly implemented across the codebase
The search results confirm that:
- The old
GenomicSpan
name has been completely removed from the codebase ReferenceSpan
is consistently used in all relevant files:bedspec/__init__.py
: Proper exportbedspec/_bedspec.py
: Protocol definition and usagebedspec/overlap/_overlap.py
: Type imports and usagetests/test_bedspec.py
: Test cases updated
The git history shows this was part of a deliberate API evolution change, and all references have been properly updated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to GenomicSpan
# Test: Search for GenomicSpan references that might need updating
# Look for any remaining GenomicSpan references
rg "GenomicSpan" -l
# Look for type annotations using ReferenceSpan
rg "ReferenceSpan" -A 2
Length of output: 119
Script:
#!/bin/bash
# Let's try a more comprehensive search for both old and new terms
# Search for GenomicSpan case-insensitively and with word boundaries
rg -w -i "GenomicSpan"
# Search for ReferenceSpan usage
rg -w "ReferenceSpan"
# Check git history for the rename
git log -p --all -S "GenomicSpan"
# Look for any potential test files or documentation that might reference these terms
fd -e py -e md -e rst | xargs rg -l "GenomicSpan|ReferenceSpan"
Length of output: 36739
bedspec/_writer.py (1)
1-8
: LGTM! Well-organized imports following best practices.
The imports are properly organized, separating standard library, third-party, and project-specific imports.
.github/workflows/tests.yml (1)
19-19
: Verify Python 3.10 compatibility with all dependencies.
Adding Python 3.10 to the test matrix is good for broader compatibility. However, we should ensure all project dependencies support Python 3.10.
Let's check the project's dependencies and their Python version requirements:
✅ Verification successful
Python 3.10 compatibility is verified for all dependencies
Based on the dependency analysis:
- Project's Python requirement in
pyproject.toml
is^3.10.0,<4.0.0
- All dependencies have minimum Python version requirements that are compatible with Python 3.10:
- Most packages require
>=3.8
- Some require
>=3.7
- The highest minimum requirement is from
typeline
and the project itself at>=3.10.0
- Most packages require
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check Python version compatibility in pyproject.toml and poetry.lock
# Test 1: Check Python version specification in pyproject.toml
echo "Checking Python version requirements in pyproject.toml:"
rg "python = " pyproject.toml -A 2
# Test 2: Check if poetry.lock exists and examine dependencies
if [ -f poetry.lock ]; then
echo "Checking dependencies in poetry.lock:"
rg "python-versions = " poetry.lock -A 2
fi
Length of output: 3886
.github/workflows/publish_bedspec.yml (3)
5-5
: LGTM! Clear and explicit version tag pattern.
The updated tag pattern '[0-9]+.[0-9]+.[0-9]+'
is more explicit and maintains semantic versioning compatibility.
131-131
: LGTM! Correct changelog output reference.
The update to use needs.make-changelog.outputs.release_body
correctly aligns with the job name and its output.
27-28
: Verify workflow reference for renamed job.
The job renaming from 'tests' to 'unit-tests' is more descriptive and all dependencies are correctly updated. However, we should verify that the referenced workflow file matches this change.
#!/bin/bash
# Description: Verify the referenced workflow matches the new job name
# Test: Check if tests.yml workflow has matching job names
rg "name: unit.*tests" .github/workflows/tests.yml
Also applies to: 35-35, 40-40
README.md (4)
5-7
: LGTM: Badge updates reflect enhanced compatibility and type checking.
The updated badges accurately reflect the broader Python version compatibility (3.10+) and the addition of basedpyright type checking.
25-28
: LGTM: Clear and well-formatted example.
The example effectively demonstrates Bed3 object creation with proper interactive Python shell formatting.
47-53
: LGTM: Clear demonstration of BedReader usage.
The example effectively shows file reading, iteration, and object representation.
61-70
: LGTM: Comprehensive listing of supported BED types.
The example clearly shows all available BED format classes with proper import statements.
bedspec/overlap/_overlap.py (3)
2-14
: LGTM! Improved type safety and documentation.
The updated imports and type definitions show good practices:
- Using
override
decorator for type safety - Clear type aliases with documentation
- Proper import organization
41-44
: LGTM! Clean iterator implementation.
Good use of:
@override
decorator for type safetychain
for efficient iteration over multiple lists
46-54
: Consider atomic operations for thread safety.
The feature addition process modifies multiple data structures separately. In a multi-threaded environment, this could lead to inconsistencies.
Let's check for concurrent usage:
Consider:
- Adding thread safety documentation
- Using a lock if concurrent access is needed
- Making the operations atomic
tests/test_reader.py (1)
1-12
: LGTM! Well-organized imports.
The imports are properly organized with standard library imports separated from package imports. All imported entities are used within the test file.
tests/test_overlap.py (2)
Line range hint 1-116
: Verify if hashability testing is still needed.
The test suite provides comprehensive coverage of the OverlapDetector
's functionality. However, the removal of the hashability test case needs verification to ensure it aligns with the current implementation requirements.
Let's check if the implementation still requires hashable features:
#!/bin/bash
# Description: Check if the implementation requires hashable features
# Test: Look for type hints or collections that would require hashability
# Expect: Evidence of hashable requirements if they exist
echo "Searching for potential hashable requirements in OverlapDetector..."
rg -A 5 'class OverlapDetector'
echo "Searching for Set or Dict usage with features..."
ast-grep --pattern 'Set[$_]'
ast-grep --pattern 'Dict[$_]'
37-38
: LGTM! The changes improve type safety and align with the updated API.
The explicit type annotation and use of argument unpacking with add(*beds)
is a good improvement over the previous add_all()
approach. This change:
- Makes the types more explicit and safer
- Aligns with the updated
OverlapDetector
API
Let's verify that the add_all
method has been removed from the implementation:
✅ Verification successful
Verified: The add_all
method has been completely removed and replaced with add(*features)
The implementation in bedspec/overlap/_overlap.py
shows that:
- The
OverlapDetector
class now uses variadic arguments inadd(*features)
method - No traces of
add_all
method exist in the codebase - The test changes align perfectly with the implementation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that add_all method is no longer present in the codebase
# Test: Search for add_all method definition or usage. Expect: No occurrences.
echo "Searching for add_all method definition..."
ast-grep --pattern 'def add_all'
echo "Searching for add_all method usage..."
rg 'add_all\('
Length of output: 809
Script:
#!/bin/bash
# Let's search more specifically for add_all method and its usage
echo "Searching for add_all method definition..."
ast-grep --pattern 'def add_all'
echo "Searching for add_all method usage..."
rg 'add_all\('
# Let's also check the OverlapDetector implementation for the add method
echo "Checking OverlapDetector add method implementation..."
ast-grep --pattern 'def add'
# Let's also look at the actual implementation file
echo "Content of overlap detector implementation..."
fd "overlap" --type f --exec cat {}
Length of output: 9739
pyproject.toml (6)
3-3
: LGTM! Version bump and metadata updates are appropriate.
The version bump to 0.3.0 and the addition of Python 3.10 support in classifiers align well with the dependency changes. The package metadata is well-structured.
Also applies to: 17-17, 20-20, 30-30
69-76
: Task configuration improvements enhance testing coverage.
Good improvements:
- Including markdown files in doctests ensures documentation stays up-to-date
- Well-structured typing checks with both mypy and pyright
96-123
: Test configuration is well-structured and comprehensive.
Good test setup:
- 80% minimum coverage requirement
- Enabled doctest-plus for enhanced documentation testing
- Well-configured doctest options for better output
151-151
: LGTM! Good linting configuration updates.
The changes improve code quality checks:
- Target version aligned with project requirements
- Added McCabe complexity check with reasonable max-complexity of 10
Also applies to: 179-180
204-204
: LGTM! Good addition of security commit parsing.
Adding security commit parsing improves changelog organization and security tracking.
33-34
: Dependencies updated appropriately.
The changes look good:
- Python version requirement broadened to 3.10+ while safely excluding Python 4
- Addition of typeline aligns with the PR objective
- Dev dependencies versions updated to latest compatible versions
Also applies to: 42-43
✅ Verification successful
Let me check the actual usage of typeline in the files to ensure it's properly implemented.
Let me verify how these typeline components are being used in the actual implementation.
Let me try a different pattern to check the class inheritance.
Dependencies are correctly integrated and utilized
The verification confirms:
- typeline is properly imported and used in both reader/writer implementations
- BedReader and BedWriter classes correctly inherit from TsvStructReader and TsvStructWriter
- Python version requirements in CI workflows are compatible with the specified version range
- The dependency addition is justified by actual implementation in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify typeline usage in the codebase
echo "Checking typeline imports and usage..."
rg -l "from typeline|import typeline"
# Verify Python version compatibility in CI
echo "Checking CI Python version matrix..."
rg "python-version" .github/workflows/
Length of output: 707
Script:
#!/bin/bash
# Check the actual typeline usage in the files
echo "Checking typeline usage in _writer.py..."
rg -A 5 "from typeline|import typeline" bedspec/_writer.py
echo "\nChecking typeline usage in _reader.py..."
rg -A 5 "from typeline|import typeline" bedspec/_reader.py
Length of output: 698
Script:
#!/bin/bash
# Check for TsvStructWriter/Reader usage patterns
echo "Checking TsvStructWriter usage..."
ast-grep --pattern 'class $_ (TsvStructWriter)' bedspec/_writer.py
echo "\nChecking TsvStructReader usage..."
ast-grep --pattern 'class $_ (TsvStructReader)' bedspec/_reader.py
Length of output: 295
Script:
#!/bin/bash
# Check class definitions that use typeline components
echo "Checking classes in _writer.py..."
rg -A 3 "^class.*TsvStructWriter" bedspec/_writer.py
echo "\nChecking classes in _reader.py..."
rg -A 3 "^class.*TsvStructReader" bedspec/_reader.py
Length of output: 453
tests/test_writer.py (2)
1-15
: LGTM! Well-organized imports.
The imports are clean, well-structured, and all imported types are utilized in the test cases.
65-78
: LGTM! Well-structured path handling tests.
The tests effectively cover both Path and string path handling, with proper resource management using context managers.
tests/test_bedspec.py (2)
86-90
: LGTM! Type changes are consistent.
The test function has been properly updated to use ReferenceSpan
, with consistent type annotation and assertion.
18-18
: Verify protocol renaming impact across the codebase.
The import change from GenomicSpan
to ReferenceSpan
appears to be part of a protocol renaming. Let's verify the consistency of this change.
✅ Verification successful
Based on the results, I can now generate the final response:
Protocol renaming from GenomicSpan
to ReferenceSpan
is complete and consistent
The search results confirm that:
- No remaining references to
GenomicSpan
exist in the codebase ReferenceSpan
is properly used across the codebase:- Defined as a Protocol in
bedspec/_bedspec.py
- Correctly exported in
bedspec/__init__.py
- Properly used in test files and overlap detection implementation
- Defined as a Protocol in
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to GenomicSpan
# and verify ReferenceSpan is used consistently
echo "Checking for any remaining GenomicSpan references..."
rg "GenomicSpan"
echo "Verifying ReferenceSpan usage..."
rg "ReferenceSpan"
Length of output: 2088
bedspec/_reader.py (1)
22-84
: The BedReader
implementation looks solid
The BedReader
class is well-structured and effectively extends TsvStructReader
. The custom _decode
method correctly handles the decoding of complex types, ensuring accurate parsing of BED records.
bedspec/_bedspec.py (11)
99-103
: Ensure subclasses of PointBed
are annotated with @dataclass
The __init_subclass__
method correctly enforces that any subclass of PointBed
must be a dataclass. This helps maintain consistency and prevents subtle bugs that might arise from missing dataclass annotations.
105-105
: Implement __len__
for PointBed
Adding the __len__
method provides a standard way to obtain the length of a PointBed
instance using Python's built-in len()
function. This enhances usability and makes the class more pythonic.
110-110
: Update return type in territory
method to ReferenceSpan
Changing the return type to Iterator[ReferenceSpan]
aligns with the renaming of GenomicSpan
to ReferenceSpan
and ensures consistency across the codebase.
123-127
: Ensure subclasses of SimpleBed
are annotated with @dataclass
The __init_subclass__
method enforces that subclasses of SimpleBed
are dataclasses, promoting consistency and preventing potential issues due to missing annotations.
134-134
: Implement __len__
for SimpleBed
Providing the __len__
method allows for easy retrieval of a SimpleBed
instance's length using len()
, improving code readability and usability.
139-139
: Update return type in territory
method to ReferenceSpan
Updating the method aligns with the new ReferenceSpan
protocol and maintains consistency.
155-159
: Ensure subclasses of PairBed
are annotated with @dataclass
The enforcement of @dataclass
annotation for subclasses of PairBed
ensures that all necessary dataclass features are available, enhancing consistency and reliability.
177-178
: Update return type in territory
method to ReferenceSpan
Consistently using ReferenceSpan
enhances type checking and code clarity.
198-198
: Use of Self
in BedColor.from_string
for better type hinting
Replacing the return type with Self
allows subclasses of BedColor
to use from_string
and get the correct type hint, improving the extensibility of the class.
276-277
: Allow None
for block_sizes
and block_starts
in Bed12
By changing these fields to accept None
, the class becomes more flexible and can represent BED12 records that may not have these attributes defined. Ensure the __post_init__
method properly handles None
values to maintain data integrity.
358-358
: Use of Self
in BedPE.from_bed6
for accurate return type
Updating the return type to Self
ensures that class methods return an instance of the correct subclass, improving type safety and supporting inheritance.
Summary by CodeRabbit
Release Notes
New Features
BedWriter
andBedReader
classes for handling BED records.Documentation
Bug Fixes
Tests
BedReader
andBedWriter
functionalities.Chores