Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvtk split bug - duplicated headers #83

Closed
vmikk opened this issue Aug 14, 2019 · 3 comments
Closed

csvtk split bug - duplicated headers #83

vmikk opened this issue Aug 14, 2019 · 3 comments

Comments

@vmikk
Copy link

vmikk commented Aug 14, 2019

Hello!
First of all, I would like to thank you for the gorgeous tool, it is very helpful and saves a lot of time!

There is a small bug in csvtk split (v0.18.2) - it duplicates headers across the output files if there are too many unique values in the column that is used for splitting.

Here is a reproducible example:

# Create temporary dir
cd `mktemp -d`

random-string()
{
  cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1
}

# Create dummy file (a little bit slow)
echo "Col1,Col2,Col3" > dummy.csv
for i in {1..5000}; do
  echo "a,"$(random-string 2)",0" >> dummy.csv
  echo "b,"$(random-string 2)",0" >> dummy.csv
  echo "c,"$(random-string 2)",0" >> dummy.csv
done

# Preview
csvtk pretty dummy.csv | head
# Col1   Col2   Col3
# a      1Q     0
# b      qA     0
# c      UC     0
# a      ld     0
# b      bs     0
# c      IK     0

# Split by column 2 (there are a lot of categories)
csvtk split dummy.csv -f Col2

# Preview one of the resulting files
csvtk pretty dummy-00.csv
# Col1   Col2   Col3
# b      00     0
# Col1   Col2   Col3
# b      00     0
# Col1   Col2   Col3
# c      00     0

As you may see, there are multiple column names in the output files.

OS - Ubuntu
csvtk - v.0.18.2 (installed via conda)

With best regards,
Vladimir

@shenwei356
Copy link
Owner

Sorry for this bug, I will fix it soon (tonight).

@shenwei356
Copy link
Owner

Fixed.

It happened when number of output groups exceed value of -g/--buf-groups.

Try the pre-release:

@vmikk
Copy link
Author

vmikk commented Aug 14, 2019

Thanks a lot for the ultrafast bugfix!
It is working properly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants