You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
First of all, I would like to thank you for the gorgeous tool, it is very helpful and saves a lot of time!
There is a small bug in csvtk split (v0.18.2) - it duplicates headers across the output files if there are too many unique values in the column that is used for splitting.
Here is a reproducible example:
# Create temporary dir
cd `mktemp -d`
random-string()
{
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1
}
# Create dummy file (a little bit slow)
echo "Col1,Col2,Col3" > dummy.csv
for i in {1..5000}; do
echo "a,"$(random-string 2)",0" >> dummy.csv
echo "b,"$(random-string 2)",0" >> dummy.csv
echo "c,"$(random-string 2)",0" >> dummy.csv
done
# Preview
csvtk pretty dummy.csv | head
# Col1 Col2 Col3
# a 1Q 0
# b qA 0
# c UC 0
# a ld 0
# b bs 0
# c IK 0
# Split by column 2 (there are a lot of categories)
csvtk split dummy.csv -f Col2
# Preview one of the resulting files
csvtk pretty dummy-00.csv
# Col1 Col2 Col3
# b 00 0
# Col1 Col2 Col3
# b 00 0
# Col1 Col2 Col3
# c 00 0
As you may see, there are multiple column names in the output files.
OS - Ubuntu
csvtk - v.0.18.2 (installed via conda)
With best regards,
Vladimir
The text was updated successfully, but these errors were encountered:
Hello!
First of all, I would like to thank you for the gorgeous tool, it is very helpful and saves a lot of time!
There is a small bug in
csvtk split
(v0.18.2) - it duplicates headers across the output files if there are too many unique values in the column that is used for splitting.Here is a reproducible example:
As you may see, there are multiple column names in the output files.
OS - Ubuntu
csvtk - v.0.18.2 (installed via conda)
With best regards,
Vladimir
The text was updated successfully, but these errors were encountered: