Skip to content

Commit

Permalink
Adding CFN and srcipt chgs per olotu@ and new EMRor table in README.
Browse files Browse the repository at this point in the history
  • Loading branch information
tjmaws committed Dec 24, 2024
1 parent b3e0cca commit 2fcb230
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 25 deletions.
37 changes: 16 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,30 +151,25 @@ Current list of available Regions with S3 Tables support can be found [here](htt

Provided below is the list of EMR Clusters that are deployable from the CloudFormation template. This decision is based off of the organization requirements.

Available Instance Classes for EMR based upon Size:

| Size | Primary Instance | Core Instances | Task Instances |
|--------|------------------|----------------|----------------|
| Small | 1 x m5.4xlarge | 1 x i3.4xlarge | 1 x i3.4xlarge |
| Small | 1 x m5.4xlarge | 1 x i3.4xlarge | 1 x i3.4xlarge |
| | or | or | or |
| | 1 x m5d.4xlarge | 1 x m5d.4xlarge | 1 x r5d.4xlarge |
| | | | |
| Medium | 1 x m5.4xlarge | 4 x i3.4xlarge | 4 x i3.4xlarge |
| Large | 1 x r5.4xlarge | 4 x i3.4xlarge | 8 x i3.4xlarge |
| Xlarge | 1 x r5.4xlarge | 8 x i3.4xlarge | 12 x i3.4xlarge |

Available Instance Classes for EMR based upon Size:

1. Small:
- Primary: m5.4xlarge or m5d.4xlarge
- Core and Task: i3.4xlarge or r5d.4xlarge

2. Medium:
- Primary: m5.4xlarge or m5d.4xlarge
- Core and Task: i3.4xlarge or r5d.4xlarge

3. Large:
- Primary: r5.4xlarge or i3.4xlarge
- Core and Task: i3.4xlarge or r5d.4xlarge

4. Xlarge:
- Primary: r5.4xlarge or i3.4xlarge
- Core and Task: i3.4xlarge or r5d.4xlarge
| | or | or | or |
| | 1 x m5d.4xlarge | 4 x r5d.4xlarge | 4 x r5d.xlarge |
| | | | |
| Large | 1 x r5.4xlarge | 4 x i3.4xlarge | 8 x i3.4xlarge |
| | or | or | or |
| | 1 x i3.4xlarge | 4 x r5d.4xlarge | 8 x r5d.4xlarge |
| | | | |
| Xlarge | 1 x r5.4xlarge | 4 x i3.4xlarge | 16 x i3.4xlarge |
| | | or | or |
| | 1 x i3.4xlarge | 4 x r5d.4xlarge | 16 x r5d.4xlarge |

### Cluster Performance Configuration

Expand Down
4 changes: 3 additions & 1 deletion scripts/pyspark/mys3tablespysparkscript.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,16 @@ def ctas_action(src_catalog, catalog, src_db, src_tbl, dst_db, dst_tbl, dst_part
`{catalog}`.`{dst_db}`.`{dst_tbl}`
USING iceberg
AS SELECT * FROM `{src_catalog}`.`{src_db}`.`{src_tbl}`
LIMIT 0
"""
else:
sql_query_d = f"""
CREATE TABLE IF NOT EXISTS
`{catalog}`.`{dst_db}`.`{dst_tbl}`
USING iceberg
PARTITIONED BY {dst_partitions}
AS SELECT * FROM `{src_catalog}`.`{src_db}`.`{src_tbl}`
AS SELECT * FROM `{src_catalog}`.`{src_db}`.`{src_tbl}`
LIMIT 0
"""

# Run the CTAS SQL query
Expand Down
6 changes: 4 additions & 2 deletions src/automated-migration-to-s3-tables-latest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ Mappings:
csvwithversionid: restore-and-copy/csv-manifest/with-version-id/
Parameter:
catalogname: s3tablescatalog
sparkcatalog: gdc_catalog
sparkcatalog: spark_catalog

EMR:
Cluster:
Expand Down Expand Up @@ -847,14 +847,16 @@ Resources:
`{{catalog}}`.`{{dst_db}}`.`{{dst_tbl}}`
USING iceberg
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
LIMIT 0
"""
else:
sql_query_d = f"""
CREATE TABLE IF NOT EXISTS
`{{catalog}}`.`{{dst_db}}`.`{{dst_tbl}}`
USING iceberg
PARTITIONED BY {{dst_partitions}}
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
LIMIT 0
"""
# Run the CTAS SQL query
Expand Down
4 changes: 3 additions & 1 deletion src/function_codes/UploadScriptFunction.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,14 +172,16 @@ def ctas_action(src_catalog, catalog, src_db, src_tbl, dst_db, dst_tbl, dst_part
`{{catalog}}`.`{{dst_db}}`.`{{dst_tbl}}`
USING iceberg
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
LIMIT 0
"""
else:
sql_query_d = f"""
CREATE TABLE IF NOT EXISTS
`{{catalog}}`.`{{dst_db}}`.`{{dst_tbl}}`
USING iceberg
PARTITIONED BY {{dst_partitions}}
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
AS SELECT * FROM `{{src_catalog}}`.`{{src_db}}`.`{{src_tbl}}`
LIMIT 0
"""
# Run the CTAS SQL query
Expand Down

0 comments on commit 2fcb230

Please sign in to comment.