-
Notifications
You must be signed in to change notification settings - Fork 21
Snowflake
Herminio Vazquez edited this page Feb 11, 2024
·
8 revisions
Working inside Snowflake, is the activity of running data quality checks using Snowflake Python Worksheets.
-
stage
under data schema in Snowflake -
cuallee.whl
the wheel distribution from PiPy index with the latest version ofcuallee
-
wheel_loader.py
a python script developed by the Snowflake Labs -
anaconda
dependencies added to your worksheet -
warehouse
where to run the python worksheet
- Make sure that you enabled and accept the Anaconda Python Packages terms and conditions under
Admin > Billing & Terms
- Select a schema in your Snowflake instance and create a
stage
, it does not matter if it is internal or external. Let's call itDEMO_STAGE
- Proceed to the PiPy index and proceed to download the built distribution of
cuallee
. At the time of this writing the file available is:cuallee-0.8.5-py3-none-any.whl
- Upload your
.whl
file into theDEMO_STAGE
either via thecli
or through the UI - Proceed to download the
wheel_loader.py
available here - Upload your
wheel_loader.py
file into theDEMO_STAGE
either via thecli
or through the UI - Create a new worksheet using the
+
sign in Snowflake Worksheets and selectPython Worksheet
- In the top right corner of your worksheet, don't forget to select the warehouse to be used to execute this worksheet
- In the top left corner of your worksheet, select the database schema that contains the
DEMO_STAGE
- Next to the schema selection, and the settings drop down menu, press on the packages drop-down menu
-
2
tabs will be available: Anaconda Packages and Stage Packages - In the Anaconda Packages add the following library dependencies required by
cuallee
:colorama==0.4.6
pandas==1.5.3
pygments==2.15.1
requests==2.31.0
toolz==0.12.0
snowflake-snowpark-python==1.11.1
- In the Stage Packages add the following library dependencies to use
cuallee
:@demo_stage/cuallee-0.8.5-py3-none-any.whl
@demo_stage/wheel_loader.py
- After completing the package setup for both Anaconda and Stage, the added libraries should appear under the bottom of the drop-down inside the Installed Packages
- At this point you are ready to go! below a snippet to test the use of
cuallee
inside Snowflake
# cuallee
# checks inside snowflake demo
import snowflake.snowpark as snowpark
import wheel_loader
def main(session: snowpark.Session):
# Your code goes here, inside the "main" handler.
wheel_loader.load('cuallee-0.8.5-py3-none-any.whl')
from cuallee import Check, CheckLevel, Control
check = Check(CheckLevel.WARNING, "Custom", session=session)
tableName = 'snowflake_sample_data.tpch_sf100.lineitem'
dataframe = session.table(tableName)
check.is_greater_than("L_QUANTITY", 2)
check.is_legit("L_COMMENT")
# Return value will appear in the Results tab.
return Control.completeness(dataframe, session=session).union(check.validate(dataframe))
pip install cuallee
-
pip install cuallee[snowpark]
orpip install snowflake-snowpark-python
- Set environment variables to start a session
-
SF_ACCOUNT
obtained by clicking into the bottom left part of your snowflake account and selectingCopy account url
- Then remove the
https://
part and also thesnowflakecomputing.com
part of the URL - It should end up in something like this:
SF_ACCOUNT=1234567.region-name.cloud
-
SF_USER
your snowflake username -
SF_PASSWORD
your snowflake password -
SF_ROLE
your snowflake role i.e.ACCOUNTADMIN
-
SF_WAREHOUSE
your designated warehouse for running data quality checks i.e.COMPUTE_WH
-
SF_DATABASE
your database selection for running checks i.e.SNOWFLAKE_SAMPLE_DATA
-