Please add support for a project file #82

jbutcher21 · 2021-09-13T20:33:30Z

Customer often have multiple files to load. G2Loader allows the user to place the full list of files to be loaded in a single "project" file that looks like this ...

cat demo/truth/project.json
{
"DATA_SOURCES": [
{"DATA_SOURCE": "CUSTOMERS", "FILE_FORMAT": "CSV", "FILE_NAME": "truthset-person-v1-set1-data.csv"},
{"DATA_SOURCE": "WATCHLIST", "FILE_FORMAT": "CSV", "FILE_NAME": "truthset-person-v1-set2-data.csv"}
]
}

The stream-producer should then pick these files in order and load them on the queue.

Ideally wild cards should be allowed as well! like so ...
{"DATA_SOURCE": "SAYARI", "FILE_FORMAT": "JSON", "FILE_NAME": "/sayari/mapped/*.json"},
Note: sayari has 100s of files to be loaded

There should be some validation of these files as in

can it be opened and read
does it contain recognizable json or csv
and does the data source exist in the configuration (future)

G2Loader does this currently... it validates the first 100 records of every file before it loads any so that you don't go through the processing of the first two files only to find out that the 3rd one doesn't even exist or can't be opened.

Future: G2Loader does the 3 above plus the following:
4. Counts the mapped and unmapped attributes it finds.
5. Checks and notates common mapping errors like incomplete addresses
6. Has a set of errors warnings and info and has recommendations and suggestions
7. Publishes a report than can be exported to show to others.

You can see this testing analysis by typing the following in an sshd container for a single file ...
./G2Loader.py -T -f demo/truth/truthset-person-v1-set1-data.csv/?data_source=CUSTOMERS

or for a project file ...
./G2Loader.py -T -p demo/truth/project.json

That's it.

The only other solution is to keep using G2Loader.

github-actions bot added the triage Need to triage label Sep 13, 2021

jamietypovsky removed the triage Need to triage label Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please add support for a project file #82

Please add support for a project file #82

jbutcher21 commented Sep 13, 2021

Please add support for a project file #82

Please add support for a project file #82

Comments

jbutcher21 commented Sep 13, 2021