Skip to content

The Parameters of the Configuration file

eiglesias34 edited this page Nov 1, 2022 · 12 revisions

Section [default]:

  • main_directory: The directory where the data sources and mapping are located.

Section [datasets]:

  • number_of_datasets: How many datasets will be converted to a knowledge graph.
  • output_folder: The location where the output file will be generated.
  • all_in_one_file: When multiple data sources are converted, each dataset will have its own output file. This option allows the user to choose instead of having multiple output files to have one output file.
  • remove_duplicate: Remove duplicates from the knowledge graph.
  • name: Name for the stats file, and, if the option all_in_one_file is "yes", then it will be the name of the output file.
  • dbType: Indicates what type of database will be used. This can only be "mysql" or "postgres".
  • ordered: When the option is "yes", the SDM-RDFizer will organize the triples maps in such a way that the minimum amount of memory is consumed. When the option is "no", the SDM-RDFizer will not organize the triples maps.
  • enrichment: This parameter can only be used for CSV files. When the option is "yes", the SDM-RDFizer uses the pandas library to remove duplicate rows from the data source. When dealing with files larger than 1.2 GB, pandas does not work properly. For that reason, this parameter must be set to "no". This option allows the SDM-RDFizer to process the files with a different library. Unfortunately, this library does not remove duplicate rows from the data source.
  • output_format: This parameter indicates the format of the output Knowledge Graph. This can be either "n-triples" or "turtle". (Currently the turtle format can only be used for csv files)

Section [dataset1] or [dataset2] or ...:

  • name: Name of the output file.
  • mapping: Location of the mapping.
  • user: User for the database. (This option is only necessary if a MySQL or Postgres database is being used)
  • password: Password for the database. (This option is only necessary if a MySQL or Postgres database is being used)
  • host: Host for the database. (This option is only necessary if a MySQL or Postgres database is being used)
  • port: Port for the database. (This option is only necessary if a MySQL database is being used)
  • db: Postgres database is being used. (This option is only necessary if a MySQL or Postgres database is being used)
Clone this wiki locally