Name		Name	Last commit message	Last commit date
parent directory ..
great_expectations		great_expectations
README.md		README.md

README.md

Great Expectations

Great Expectations is the leading tool for validating, documenting, and profiling your data to maintain quality and improve communication between teams

Features

Automated data profiling
The library profiles your data to get basic statistics, and automatically generates a suite of Expectations based on what is observed in the data.
Data validation
Expectation Suite passes or fails, and returns any unexpected values that failed a test
Data Docs
Renders HTML file of Expectations in clean, human-readable documentation containing both Expectation Suites and data Validation Results
Diverse Datasources and Store backends
Various datasources such Pandas dataframes, Spark dataframes, and SQL databases via SQLAlchemy.

Overview

Expectations suite json
- faa_registration
Data Docs html report
- faa_registration
Validation run report
- Home
- faa_registration

Dataset

Refer: Getting started with Great Expectations

Step 01: Environment Configuration

1.0 Create a python environment and activate it

1.1. Install module `great_expectations`

pip install great_expectations

1.2. Verify the version

great_expectations --version

Output: great_expectations, version 0.15.46

03. Initialize at the base dir

great_expectations init

Change working dir to the newly created dir, great_expectations

cd great_expectations

04. Import data into repo

Copy the csv into great_expectations/data Files:

faa_registration.csv

Step 02: Connect to data

05. Launch cli datasource process

great_expectations datasource new

Input following in the prompt

1 - Local File
1 - Pandas
data - relative path to datasets

This open a Jupyter notebook,

Change to datasource_name var to nyc_yellow_taxi_trip_data

Update example_yaml to ignore all non csv files

example_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
  class_name: PandasExecutionEngine
data_connectors:
  default_inferred_data_connector_name:
    class_name: InferredAssetFilesystemDataConnector
    base_directory: data
    default_regex:
      group_names:
        - data_asset_name
      pattern: (.*)
  default_runtime_data_connector_name:
    class_name: RuntimeDataConnector
    assets:
      my_runtime_asset_name:
        batch_identifiers:
          - runtime_batch_identifier_name
"""
print(example_yaml)

Save the datasource Configuration
Close Jupyter notebook
Wait for terminal to show Saving file at /datasource_new.ipynb

Step 03: Create Expectations for nyc_yellow_taxi_trip_data

3.1. Launch cli suite process

great_expectations suite new

Input following in the prompt

3 - Automatically, using a profiler
1 - Select index of file faa_registration.csv
faa_registration_suite - suite name

This open a Jupyter notebook,

Change to datasource_name var to spy_plane_data

Update exclude_column_names to

exclude_column_names = [
    "N-NUMBER",
    "SERIAL NUMBER",
    "MFR MDL CODE",
    "ENG MFR MDL",
#    "YEAR MFR",
    "TYPE REGISTRANT",
    "NAME",
    "STREET",
    "STREET2",
    "CITY",
    "STATE",
    "ZIP CODE",
    "REGION",
    "COUNTY",
    "COUNTRY",
#    "LAST ACTION DATE",
#    "CERT ISSUE DATE",
    "CERTIFICATION",
    "TYPE AIRCRAFT",
    "TYPE ENGINE",
    "STATUS CODE",
    "MODE S CODE",
    "FRACT OWNER",
    "AIR WORTH DATE",
    "OTHER NAMES(1)",
    "OTHER NAMES(2)",
    "OTHER NAMES(3)",
    "OTHER NAMES(4)",
    "OTHER NAMES(5)",
#    "EXPIRATION DATE",
#    "UNIQUE ID",
    "KIT MFR",
    "KIT MODEL",
    "MODE S CODE HEX",
    "X35",
]

Run to create default expectation and analyze the result
Wait for terminal to show Saving file at /*.ipynb
Modify expectation as per need
Modified the JSON file great_expectations/expectations/faa_registration_suite.json and kept necessary expectations
```
great_expectations suite edit faa_registration_suite
```
Input following in the prompt (! SYS ERROR, COULD NOT LOAD THE NOTEBOOK)

1 - Manually, without interacting with a sample batch of data (default)

Updated to:
This Expectation suite currently contains 4 total Expectations across 1 columns.

Step 04: Validate Data

4.1. Create checkpoint

great_expectations checkpoint new planes_features_checkpoint_v0.1

This open a Jupyter notebook,

Run all cells.
Report in new page

Step 05: Commit the following files and folders

great_expectations/data
great_expectations/expectations/*.json
great_expectations/uncommitted/data_docs/*
great_expectations/uncommitted/*.ipynb

Resource: https://git-scm.com/docs/gitignore

Step 06: Deploying using Git Actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

great-expectation

great-expectation

README.md

Great Expectations

Features

Overview

Dataset

Step 01: Environment Configuration

1.0 Create a python environment and activate it

1.1. Install module `great_expectations`

1.2. Verify the version

03. Initialize at the base dir

04. Import data into repo

Step 02: Connect to data

05. Launch cli datasource process

Step 03: Create Expectations for nyc_yellow_taxi_trip_data

3.1. Launch cli suite process

Step 04: Validate Data

4.1. Create checkpoint

Step 05: Commit the following files and folders

Step 06: Deploying using Git Actions

Ends

Files

great-expectation

Directory actions

More options

Directory actions

More options

Latest commit

History

great-expectation

Folders and files

parent directory

README.md

Great Expectations

Features

Overview

Dataset

Step 01: Environment Configuration

1.0 Create a python environment and activate it

1.1. Install module great_expectations

1.2. Verify the version

03. Initialize at the base dir

04. Import data into repo

Step 02: Connect to data

05. Launch cli datasource process

Step 03: Create Expectations for nyc_yellow_taxi_trip_data

3.1. Launch cli suite process

Step 04: Validate Data

4.1. Create checkpoint

Step 05: Commit the following files and folders

Step 06: Deploying using Git Actions

Ends

1.1. Install module `great_expectations`