Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved handling of CSV comments #405

Merged
merged 1 commit into from
Mar 1, 2022
Merged

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Mar 1, 2022

as mentioned in #404 (comment) there seems to be a weird bug with how the geonames metatadata files are encoding comments, (I think!)

using this custom comment handler stream we're able to work around the issue, although I'm still not clear why the comment option from https://csv.js.org/parse/options/ (and sed '/^#/d') doesn't do the same thing 🤷

I've also taken the opportunity to do some simple housekeeping tasks:

  • upgrade csv-parse module
  • enabled bom option for csv-parse as we have done in other modules
  • remove the mkdirp module introduced in Used mkdirp in download_metadata script #185, since that time we've consolidated on Docker and Windows has made progress in its terminal utilities, I hope it's no longer required.
  • change the engines definition in package.json from >=l2.0.0 to >=12.0.0, @orangejulius is this just a typo?

The 'actual work' here is:

  • setting comment: '' in the csv options to disable stripping comments within that lib
  • adding split2 and the new through streams to handle this ourselves.

resolves #404

@missinglink missinglink requested a review from orangejulius March 1, 2022 14:39
@@ -33,22 +33,22 @@
"url": "https://github.com/pelias/geonames/issues"
},
"engines": {
"node": ">=l2.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow yeah, this is definitely a typo. It doesn't really affect much fortunately, but good catch!

@@ -6,7 +6,7 @@
"homepage": "https://pelias.io",
"license": "MIT",
"scripts": {
"download_metadata": "mkdirp metadata && node bin/updateMetadata.js",
"download_metadata": "mkdir -p metadata && node bin/updateMetadata.js",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better 👍

Copy link
Member

@orangejulius orangejulius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Downloading the metadata works for me now.

@missinglink
Copy link
Member Author

missinglink commented Mar 1, 2022

agh woops, so the xsv failure was my fault since I wan't explicitly telling it the file was TSV instead of CSV:

curl -s http://download.geonames.org/export/dump/countryInfo.txt | sed '/^#/d' | xsv cat -d '\t' rows

I suspect there's just a weird bug in csv-parse

@missinglink missinglink merged commit 299ed33 into master Mar 1, 2022
@missinglink missinglink deleted the handle-csv-comments branch March 1, 2022 14:46
@missinglink
Copy link
Member Author

opened an issue upstream adaltas/node-csv#325
hopefully we can remove these commits if a solution can be found within that lib natively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

download_metadata fails to download
2 participants