Skip to content

saulfrank/dataquality_users

Repository files navigation

To run data quality script

** Using RStudio

Run install packages in command line:

install.packages("data.table", repos="http://R-Forge.R-project.org")

install.packages("RJSONIO")

install.packages("phonenumber")

install.packages("sqldf")

Ensure the following files are in the same folder:

  1. Data cleansing v(x.x).R
  2. File_test_run.csv

** Run the R script: Data cleansing v(x.x).R

The output includes

  1. Corrections to the data against the test cases: clean_name, clean_email, clean_phone
  2. A data quality map which is a JSON array of corrections made against test cases: data_quality_map
  3. Exception column to flag data that cannot be programmatically corrected and should be fixed at source : name_exception, email_exception, phone_exception

File outputs

  1. DQ_output.csv - output from the script.
  2. DQ_exception.csv - Using SQL against data tables, a summary of the exceptions is produced.

Generate 1,000 tuples of test data using node and faker API

Install Node

sudo npm install faker

sudo npm install json2csv

Run: node user_generate.js

This will generate: file.csv

This file is used in file_test_cases.xlsx to create the test cases in file_test_run.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published