Skip to content

Gradoop Data Importers

cmoesler edited this page Jan 21, 2019 · 2 revisions

Data Importers can be used to create simple graphs from common data formats.
Data Importers are unlike Data Sources as the format does not have to be Gradoop-specific, but they can be used like a data source (without a matching data sink).


MinimalJSONImporter

The minimal JSON importer can be used to turn text files where every line is a JSON object into a graph. A vertex will be created for every JSON object, the properties of the object will be added as properties to the new vertex. Every property is parsed as a string. The label of the vertex will be the same for every vertex.
For example the JSON object

{"Name": "Max", "Age": 28, "Address": {"Street": "Main Street", "City": "SomeCity", "ZIPCode": 12345}}

will be turned into a vertex with label JsonRowVertex and properties Name, Age and Address set to "Max", "28" and "{\"Street\":\"Main Street\",\"City\":\"SomeCity\",\"ZIPCode\":12345}}" respectively.
Array-type properties are supported, but every element of an array is assumed to be a string.

MinimalJSONImporter example

Paths to files can point to local (file://) or distributed (hdfs://) files.

DataSource importer = new MinimalJSONImporter("/path/to/jsonfile");

MinimalCSVImporter

The MinimalCSVImporter can be used to create an EPGM instance from a CSV file of vertices that are not already in Gradoop format. Each line will imported as a vertex and each row will set as a property of this vertex. To set the name of each property it is possible to read the first line of the file and set it as property key or to pass a list of the property names to the constructor. The checkReoccurringHeader parameter specifies if each line of the file should be checked for reoccurring of the column property names. In case of reoccuring this line will be skipped.

MinimalCSVImporter examples

Paths to files can point to local (file://) or distributed (hdfs://) files. The delimiter attribute specifies which token delimiter should be used. If each line of the file should be checked for the reoccurring of the header set the checkReoccurringHeaderFlag to true.

With header line
DataSource importVertexImporter = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, checkReoccurringHeaderFlag);
Without header line
DataSource importer = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, listOfColumnNames, checkReoccurringHeaderFlag);

In case the file does not contain a header line the constructor need a list of the names of the column.

With charset
DataSource importVertexImporter = new MinimalCSVImporter("/path/to/csvfile", delimiter, gradoopFlinkConfig, charset, checkReoccurringHeaderFlag) ;

The default charset is UTF-8.