Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non cellranger tags #26

Closed

Conversation

mschilli87
Copy link
Contributor

@mschilli87 mschilli87 commented Feb 23, 2024

This is a combination of #24 and #25 in one PR.
Merging this will address all points raised in #23.


closes #23
closes #24
closes #25

While cellranger uses the `UB` SAM tag, other scRNA-seq analyses use
different SAM tags (*e.g.* `XM`) for the same information.
By parameterizing the existing `parse_read` function, this (and other
read parsing/filtering options) can be adjusted by the caller without
the need for boilerplate code required for a custom callback class.
At the same time, by providing the previously hard-coded values as
default parameters, compatibility with existing code is kept.
All functions using this callback have been extended by optional keyword
arguments to allow passing through the newly added parameters.

This addresses one out of two issues raised in
arogozhnikov#23.
While `BarcodeHandler.__init__` has (optional) parameters to adjust
barcode-related options (*e.g.* using non-`CB` SAM tags for the cell
barcodes, like `XC`), the static `from_file` function called it without
any of those parameters, making it impossible to adjust them when using
that function.
This is resolved by adding an optional keyword argument dictionary
parameters to the static function and passing all argument provided
therein, if any, down to the `__init__` call.

This addresses one out of two issues raised in
arogozhnikov#23.
This is a partial revert of commit f460476
which parameterized the `parse_read` callback and added keyword
arguments to all its callers to pass down the arguments like this:

```python
count_snps(... , parse_read=parse_read, umi_tag="XM")
```

Howerver, as pointed out by Alex Rogozhnikov, the keyword arguments are
not required since a lambda function could be used in their place
resulting which even results in more readable code:

```python
parse_read_xm = lambda read: parse_read(read, umi_tag="XM")
count_snps(... , parse_read=parse_read_xm)
```

Thus, this commit reverts all changes introducing keyword argument to
`parse_read` callers, leaving only the parameterization of that callback
itself.
@mschilli87
Copy link
Contributor Author

@arogozhnikov: I have rebased this on the updated PR #24. Thus, this now bundles all changes as per your feedback.

mschilli87 added a commit to mschilli87/Demuxafy that referenced this pull request Feb 26, 2024
Cell Ranger uses `CB` SAM tags for the cell barcode and `UB` SAM tags
for the UMI but other scRNA-seq pipelines use different tags (*e.g.*
`XC` and `XM`).
This commit introduces the `CELL_TAG` and `UMI_TAG` environment
variables to override the Cell Ranger style defaults.

Note: For Demuxalot, this requires upstream changes suggested in the
following PR: arogozhnikov/demuxalot#26.
@arogozhnikov
Copy link
Owner

Interestingly, I did not close this one, github closed it automatically

@mschilli87
Copy link
Contributor Author

@arogozhnikov:

Interestingly, I did not close this one, github closed it automatically

That's because this includes #25 and #24 so merging either only this or both of those had the same effect. Thus I added the special 'closes [PR/issue]' directive to my PRs so merging one will automatically close those that become irrelavant by doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How do I define which BAM tags contain the UMI / cell barcode?
2 participants