Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with loadVocabFromCsv #198

Closed
katy-sadowski opened this issue Jul 14, 2024 · 3 comments
Closed

Issue with loadVocabFromCsv #198

katy-sadowski opened this issue Jul 14, 2024 · 3 comments

Comments

@katy-sadowski
Copy link
Contributor

I got the following error when trying to load the vocabulary tables from csv. It seems it's putting quotes around the whole list of column names in the insert, so it thinks that the list is the name of a single column. I checked and my table looks normal, with the 4 columns, as does the csv I'm inserting.

> cd <- DatabaseConnector::createConnectionDetails(
+   dbms     = "postgresql", 
+   server   = "localhost/ohdsi", 
+   user     = "postgres", 
+   password = "postgres", 
+   port     = 5432)
> 
> cdmSchema      <- "dbt_synthea_1k"
> cdmVersion     <- "5.4"
> syntheaVersion <- "3.0.0"
> syntheaSchema  <- "synthea_1k"
> syntheaFileLoc <- "~/Synthea/output/csv"
> vocabFileLoc   <- "~/Synthea/vocab_shard_1k"
> ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)
Connecting using PostgreSQL driver
Working on file ~/Synthea/vocab_shard_1k/concept_ancestor.csv
 - reading file 
 - type converting
 - uploading 263067 rows of data in 1 chunks.
  |==========================================================================================| 100%
Executing SQL took 0.605 secs
 - chunk uploading started on 2024-07-13 19:30:27 for rows 1 to 263067
  |                                                                                          |   0%Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") : 
  java.sql.BatchUpdateException: Batch entry 0 INSERT INTO dbt_synthea_1k.concept_ancestor ("ancestor_concept_id,descendant_concept_id,min_levels_of_separation,max_levels_of_separation") VALUES(5.82111112743323E14) was aborted: ERROR: column "ancestor_concept_id,descendant_concept_id,min_levels_of_separat" of relation "concept_ancestor" does not exist
  Position: 46  Call getNextException to see other errors in the batch.
@burrowse
Copy link
Collaborator

@katy-sadowski Would you be able to send me the file you are loading? The only thing I can think of is the fact that the name of the function is misleading in that it expects a tabbed delimiter instead of an actual comma in the load

 vocabTable <-
        data.table::fread(
          file = paste0(vocabFileLoc, "/", csv),
          stringsAsFactors = FALSE,
          header = TRUE,
          sep = "\t",
          na.strings = ""
        )

@katy-sadowski
Copy link
Contributor Author

Ah, yep, that is it! Maybe a param could be added to specify the separator? (No rush - this is not blocking me 😄 )

@burrowse
Copy link
Collaborator

@katy-sadowski Very slow to add this but I've just pushed an update to the function to the develop branch to accept a delimiter! It'll be officially included in the next release :)

burrowse added a commit that referenced this issue Oct 4, 2024
* fixes for duckdb support
* synthea v3.3.0 support (addition of icd10 codes to condition_occurrence logic)
* update of LoadFromVocabCSV.R function to accept a delimiter #198 

Co-authored-by: Frank DeFalco <fdefalco@ohdsi.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants