mutate2 converts E[0-9] to 0 in output column #219

davised · 2023-02-04T07:46:44Z

Prerequisites

make sure you're are using the latest version by csvtk version
read the usage

$ csvtk version
csvtk v0.25.0

Describe your issue

describe the problem
This is an odd one. I'm joining two columns, one column (sample) has text [A-H][1-9][0-2]?, the second has R[1-2]_001.fastq.gz. Most of the rows join fine, except for those that start with E. Here is the example data; I added examples with just the letter E and E0 at the top for testing purposes:
provide a reproducible example
96_well_sample.txt

$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk pretty | head
fastq                          sample   suffix
----------------------------   ------   ---------------
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz
A1_S1_L001_R1_001.fastq.gz     A1       R1_001.fastq.gz
A1_S1_L001_R2_001.fastq.gz     A1       R2_001.fastq.gz
B1_S2_L001_R1_001.fastq.gz     B1       R1_001.fastq.gz
B1_S2_L001_R2_001.fastq.gz     B1       R2_001.fastq.gz
C1_S3_L001_R1_001.fastq.gz     C1       R1_001.fastq.gz
C1_S3_L001_R2_001.fastq.gz     C1       R2_001.fastq.gz

Here's what happens when I use mutate2 to join the columns:

$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk mutate2 -n 'output' -e '${sample} + "_" + ${suffix}' | csvtk pretty | head
fastq                          sample   suffix            output
----------------------------   ------   ---------------   -------------------
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz   E_R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz   0_R2_001.fastq.gz
A1_S1_L001_R1_001.fastq.gz     A1       R1_001.fastq.gz   A1_R1_001.fastq.gz
A1_S1_L001_R2_001.fastq.gz     A1       R2_001.fastq.gz   A1_R2_001.fastq.gz
B1_S2_L001_R1_001.fastq.gz     B1       R1_001.fastq.gz   B1_R1_001.fastq.gz
B1_S2_L001_R2_001.fastq.gz     B1       R2_001.fastq.gz   B1_R2_001.fastq.gz
C1_S3_L001_R1_001.fastq.gz     C1       R1_001.fastq.gz   C1_R1_001.fastq.gz
C1_S3_L001_R2_001.fastq.gz     C1       R2_001.fastq.gz   C1_R2_001.fastq.gz

Here are all of the E rows:

$ cat 96_well_sample.txt | csvtk add-header -n fastq | csvtk sep -n 'sample,sequence,L001,suffix' --merge -f 1 -s '_' | csvtk cut -f fastq,sample,suffix | csvtk mutate2 -n 'output' -e '${sample} + "_" + ${suffix}' | csvtk pretty | grep -E '^E'
E_S1_L001_R1_001.fastq.gz      E        R1_001.fastq.gz   E_R1_001.fastq.gz
E0_S1_L001_R2_001.fastq.gz     E0       R2_001.fastq.gz   0_R2_001.fastq.gz
E1_S5_L001_R1_001.fastq.gz     E1       R1_001.fastq.gz   0_R1_001.fastq.gz
E1_S5_L001_R2_001.fastq.gz     E1       R2_001.fastq.gz   0_R2_001.fastq.gz
E2_S13_L001_R1_001.fastq.gz    E2       R1_001.fastq.gz   0_R1_001.fastq.gz
E2_S13_L001_R2_001.fastq.gz    E2       R2_001.fastq.gz   0_R2_001.fastq.gz
E3_S21_L001_R1_001.fastq.gz    E3       R1_001.fastq.gz   0_R1_001.fastq.gz
E3_S21_L001_R2_001.fastq.gz    E3       R2_001.fastq.gz   0_R2_001.fastq.gz
E4_S29_L001_R1_001.fastq.gz    E4       R1_001.fastq.gz   0_R1_001.fastq.gz
E4_S29_L001_R2_001.fastq.gz    E4       R2_001.fastq.gz   0_R2_001.fastq.gz
E5_S37_L001_R1_001.fastq.gz    E5       R1_001.fastq.gz   0_R1_001.fastq.gz
E5_S37_L001_R2_001.fastq.gz    E5       R2_001.fastq.gz   0_R2_001.fastq.gz
E6_S45_L001_R1_001.fastq.gz    E6       R1_001.fastq.gz   0_R1_001.fastq.gz
E6_S45_L001_R2_001.fastq.gz    E6       R2_001.fastq.gz   0_R2_001.fastq.gz
E7_S53_L001_R1_001.fastq.gz    E7       R1_001.fastq.gz   0_R1_001.fastq.gz
E7_S53_L001_R2_001.fastq.gz    E7       R2_001.fastq.gz   0_R2_001.fastq.gz
E8_S61_L001_R1_001.fastq.gz    E8       R1_001.fastq.gz   0_R1_001.fastq.gz
E8_S61_L001_R2_001.fastq.gz    E8       R2_001.fastq.gz   0_R2_001.fastq.gz
E9_S69_L001_R1_001.fastq.gz    E9       R1_001.fastq.gz   0_R1_001.fastq.gz
E9_S69_L001_R2_001.fastq.gz    E9       R2_001.fastq.gz   0_R2_001.fastq.gz
E10_S77_L001_R1_001.fastq.gz   E10      R1_001.fastq.gz   0_R1_001.fastq.gz
E10_S77_L001_R2_001.fastq.gz   E10      R2_001.fastq.gz   0_R2_001.fastq.gz
E11_S85_L001_R1_001.fastq.gz   E11      R1_001.fastq.gz   0_R1_001.fastq.gz
E11_S85_L001_R2_001.fastq.gz   E11      R2_001.fastq.gz   0_R2_001.fastq.gz
E12_S93_L001_R1_001.fastq.gz   E12      R1_001.fastq.gz   0_R1_001.fastq.gz
E12_S93_L001_R2_001.fastq.gz   E12      R2_001.fastq.gz   0_R2_001.fastq.gz

I've used this software quite a bit, but this is the first large bug I've found. I can use awk in the meantime to join the outputs.

Thank you for this software.

The text was updated successfully, but these errors were encountered:

shenwei356 · 2023-02-04T15:53:09Z

It's a bug, E1 was wrongly treated as a number in scientific notation. With the old version, you can also switch on the -s, --numeric-as-string to avoid this. Anyway, I've fixed this.

Please use the new binaries here

davised · 2023-02-08T07:20:01Z

Ah scientific notation, of course. Thanks! This tool is so cool and makes pipelining very intuitive.

Thanks for the very prompt bug fixes!

shenwei356 closed this as completed in 2bd88bb Feb 4, 2023

shenwei356 added the bug label Feb 4, 2023

shenwei356 mentioned this issue Jun 29, 2023

Update CSVTK to v0.26.0 bioconda/bioconda-recipes#41753

Merged

chenrui333 mentioned this issue Jun 29, 2023

csvtk 0.26.0 Homebrew/homebrew-core#135348

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutate2 converts E[0-9] to 0 in output column #219

mutate2 converts E[0-9] to 0 in output column #219

davised commented Feb 4, 2023 •

edited

Loading

shenwei356 commented Feb 4, 2023

davised commented Feb 8, 2023

mutate2 converts E[0-9] to 0 in output column #219

mutate2 converts E[0-9] to 0 in output column #219

Comments

davised commented Feb 4, 2023 • edited Loading

Prerequisites

Describe your issue

shenwei356 commented Feb 4, 2023

davised commented Feb 8, 2023

davised commented Feb 4, 2023 •

edited

Loading