Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate2 converts E[0-9] to 0 in output column #219

Closed
4 tasks done
davised opened this issue Feb 4, 2023 · 2 comments
Closed
4 tasks done

mutate2 converts E[0-9] to 0 in output column #219

davised opened this issue Feb 4, 2023 · 2 comments
Labels

Comments

@davised
Copy link

davised commented Feb 4, 2023

Prerequisites

  • make sure you're are using the latest version by csvtk version
  • read the usage
$ csvtk version
csvtk v0.25.0

Describe your issue

  • describe the problem
    This is an odd one. I'm joining two columns, one column (sample) has text [A-H][1-9][0-2]?, the second has R[1-2]_001.fastq.gz. Most of the rows join fine, except for those that start with E. Here is the example data; I added examples with just the letter E and E0 at the top for testing purposes:
  • provide a reproducible example
    96_well_sample.txt
$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk pretty | head
fastq                          sample   suffix
----------------------------   ------   ---------------
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz
A1_S1_L001_R1_001.fastq.gz     A1       R1_001.fastq.gz
A1_S1_L001_R2_001.fastq.gz     A1       R2_001.fastq.gz
B1_S2_L001_R1_001.fastq.gz     B1       R1_001.fastq.gz
B1_S2_L001_R2_001.fastq.gz     B1       R2_001.fastq.gz
C1_S3_L001_R1_001.fastq.gz     C1       R1_001.fastq.gz
C1_S3_L001_R2_001.fastq.gz     C1       R2_001.fastq.gz

Here's what happens when I use mutate2 to join the columns:

$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk mutate2 -n 'output' -e '${sample} + "_" + ${suffix}' | csvtk pretty | head
fastq                          sample   suffix            output
----------------------------   ------   ---------------   -------------------
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz   E_R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz   0_R2_001.fastq.gz
A1_S1_L001_R1_001.fastq.gz     A1       R1_001.fastq.gz   A1_R1_001.fastq.gz
A1_S1_L001_R2_001.fastq.gz     A1       R2_001.fastq.gz   A1_R2_001.fastq.gz
B1_S2_L001_R1_001.fastq.gz     B1       R1_001.fastq.gz   B1_R1_001.fastq.gz
B1_S2_L001_R2_001.fastq.gz     B1       R2_001.fastq.gz   B1_R2_001.fastq.gz
C1_S3_L001_R1_001.fastq.gz     C1       R1_001.fastq.gz   C1_R1_001.fastq.gz
C1_S3_L001_R2_001.fastq.gz     C1       R2_001.fastq.gz   C1_R2_001.fastq.gz

Here are all of the E rows:

$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk mutate2 -n 'output' -e '${sample} + "_" + ${suffix}' | csvtk pretty | grep -E '^E'
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz   E_R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz   0_R2_001.fastq.gz
E1_S5_L001_R1_001.fastq.gz     E1       R1_001.fastq.gz   0_R1_001.fastq.gz
E1_S5_L001_R2_001.fastq.gz     E1       R2_001.fastq.gz   0_R2_001.fastq.gz
E2_S13_L001_R1_001.fastq.gz    E2       R1_001.fastq.gz   0_R1_001.fastq.gz
E2_S13_L001_R2_001.fastq.gz    E2       R2_001.fastq.gz   0_R2_001.fastq.gz
E3_S21_L001_R1_001.fastq.gz    E3       R1_001.fastq.gz   0_R1_001.fastq.gz
E3_S21_L001_R2_001.fastq.gz    E3       R2_001.fastq.gz   0_R2_001.fastq.gz
E4_S29_L001_R1_001.fastq.gz    E4       R1_001.fastq.gz   0_R1_001.fastq.gz
E4_S29_L001_R2_001.fastq.gz    E4       R2_001.fastq.gz   0_R2_001.fastq.gz
E5_S37_L001_R1_001.fastq.gz    E5       R1_001.fastq.gz   0_R1_001.fastq.gz
E5_S37_L001_R2_001.fastq.gz    E5       R2_001.fastq.gz   0_R2_001.fastq.gz
E6_S45_L001_R1_001.fastq.gz    E6       R1_001.fastq.gz   0_R1_001.fastq.gz
E6_S45_L001_R2_001.fastq.gz    E6       R2_001.fastq.gz   0_R2_001.fastq.gz
E7_S53_L001_R1_001.fastq.gz    E7       R1_001.fastq.gz   0_R1_001.fastq.gz
E7_S53_L001_R2_001.fastq.gz    E7       R2_001.fastq.gz   0_R2_001.fastq.gz
E8_S61_L001_R1_001.fastq.gz    E8       R1_001.fastq.gz   0_R1_001.fastq.gz
E8_S61_L001_R2_001.fastq.gz    E8       R2_001.fastq.gz   0_R2_001.fastq.gz
E9_S69_L001_R1_001.fastq.gz    E9       R1_001.fastq.gz   0_R1_001.fastq.gz
E9_S69_L001_R2_001.fastq.gz    E9       R2_001.fastq.gz   0_R2_001.fastq.gz
E10_S77_L001_R1_001.fastq.gz   E10      R1_001.fastq.gz   0_R1_001.fastq.gz
E10_S77_L001_R2_001.fastq.gz   E10      R2_001.fastq.gz   0_R2_001.fastq.gz
E11_S85_L001_R1_001.fastq.gz   E11      R1_001.fastq.gz   0_R1_001.fastq.gz
E11_S85_L001_R2_001.fastq.gz   E11      R2_001.fastq.gz   0_R2_001.fastq.gz
E12_S93_L001_R1_001.fastq.gz   E12      R1_001.fastq.gz   0_R1_001.fastq.gz
E12_S93_L001_R2_001.fastq.gz   E12      R2_001.fastq.gz   0_R2_001.fastq.gz

I've used this software quite a bit, but this is the first large bug I've found. I can use awk in the meantime to join the outputs.

Thank you for this software.

@shenwei356
Copy link
Owner

It's a bug, E1 was wrongly treated as a number in scientific notation. With the old version, you can also switch on the -s, --numeric-as-string to avoid this. Anyway, I've fixed this.

Please use the new binaries here

@shenwei356 shenwei356 added the bug label Feb 4, 2023
@davised
Copy link
Author

davised commented Feb 8, 2023

Ah scientific notation, of course. Thanks! This tool is so cool and makes pipelining very intuitive.

Thanks for the very prompt bug fixes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants