Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Default spark-infrastructure Helm charts to V2 and ensure migration #300

Closed
10 tasks done
cwoods-cpointe opened this issue Aug 27, 2024 · 5 comments · Fixed by #321
Closed
10 tasks done

Feature: Default spark-infrastructure Helm charts to V2 and ensure migration #300

cwoods-cpointe opened this issue Aug 27, 2024 · 5 comments · Fixed by #321
Labels
enhancement New feature or request
Milestone

Comments

@cwoods-cpointe
Copy link
Contributor

cwoods-cpointe commented Aug 27, 2024

Description

We have a long lived feature branch that needs to get merged in - feature/spark-infrastructure-v2-default. This branch sets the spark-inf helm deployment v2 as the default along with migration steps and other changes. Currently the manual actions call out using aissemble-spark-infrastructure-deploy and not the -v2 profile.

Technical Details:

The feature branch combines hive-metastore-db and hive-metastore-service into one chart. It then adds this one chart into the aissemble-spark-inf chart.
Also the thrift-service is bundled into the aissemble-spark-inf chart.

DOD

  • Rebase dev onto the old feature branch
  • Ensure the feature branch is in working order
  • Foundation MDA warns of v1 deprecation
  • Migration instructions are clearly outlined and follow our other v1 -> v2 migration strategies
    • aissemble-jive-metastore-service-chart
    • aissemble-spark-history-chart
    • aissemble-thrift-service-chart
  • Projects need to be able to smoothly transition from where they are at (v1 or v2) to this new v2
    • create a migration If there are breaking changes
  • Configuration store PVC is read only

Test Strategy/Script

Test 1: testing the default v2 behavior

  1. Generate a new project using the following command:
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.9.0-SNAPSHOT \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  1. Add the following pipeline to test-project-pipeline-models/src/main/resources/pipelines/
{
  "name": "PysparkPersist",
  "package": "com.boozallen",
  "type": {
    "name": "data-flow",
    "implementation": "data-delivery-pyspark"
  },
  "steps": [
    {
      "name": "PersistData",
      "type": "synchronous",
      "persist": {
        "type": "hive"
      }
    }
  ]
}
  1. Add the following record to test-project-pipeline-models/src/main/resources/records/
{
  "name": "CustomRecord",
  "package": "com.boozallen.aiops.mda.pattern.record",
  "description": "Example custom record for Pyspark Data Delivery Patterns",
  "fields": [
    {
      "name": "customField",
      "type": {
        "name": "customType",
        "package": "com.boozallen.aiops.mda.pattern.dictionary"
      }
    }
  ]
}
  1. Add the following dictionary to test-project-pipeline-models/src/main/resources/dictionaries/
{
  "name": "PysparkDataDeliveryDictionary",
  "package": "com.boozallen.aiops.mda.pattern.dictionary",
  "dictionaryTypes": [
    {
      "name": "customType",
      "simpleType": "string"
    }
  ]
}
  1. Execute mvn clean install repeatedly, resolving all presented manual actions until none remain.
  2. Verify the spark-infrastructure helm chart is V2 is being defaulted
    6.1. The profile aissemble-spark-infrastructure-deploy-v2 is in test-project-deploy/pom.xml
  3. Within test-project-pipelines/pyspark-persist/src/pyspark_persist/step/persist_data.py, define the implementation for execute_step_impl as follows:
    def execute_step_impl(self) -> None:
        from ..record.custom_record import CustomRecord
        from ..schema.custom_record_schema import CustomRecordSchema
        custom_record = CustomRecord.from_dict({"customField": "foo"})
        record2 = CustomRecord.from_dict({"customField": "bar"})
        df = self.spark.createDataFrame(
            [
                custom_record,
                record2
            ],
            CustomRecordSchema().struct_type
        )
        self.save_dataset(df, "my_new_table")
  1. Build the project with mvn clean install
  2. Deploy the project with tilt up
  3. Once all resources are ready, trigger the pyspark-persist pipeline
  4. Bash into the data access pod kubectl exec -it <DATA_ACCESS_POD_NAME> -- bash
  5. Execute curl -X POST localhost:8080/graphql -H "Content-Type: application/json" -d '{ "query": "{ CustomRecord(table: \"my_new_table\") { customField } }" }' and ensure that data including two records is returned, ie: {"data":{"CustomRecord":[{"customField":null},{"customField":null}]}}
  6. Navigate to localhost:18080 and ensure that the Spark History Server is visible and has recorded the pipeline execution for pyspark-persist
  7. Tilt down and delete the created project.

Test 2: testing the v1 -> v2 migration

  1. Generate a new project using the following command:
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.8.0 \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  1. Execute test 1 steps 2 thru 5 and 7
  2. Upgrade aissemble to 1.9.0-SNAPSHOT
  3. Perform mvn clean install
    4.1. Update the pyproject.toml to include the snapshot repo
[[tool.poetry.source]]
name = "devpypi"
url = "https://test.pypi.org/simple/"
priority = "supplemental"
  1. Verify the user is warned in the build output that the spark-inf v1 chart is deprecated
  2. Follow the upgrade steps - https://github.com/boozallen/aissemble/tree/dev/extensions/extensions-helm/extensions-helm-spark-infrastructure
  3. Verify the V2 charts are in correct
  4. Build to project with mvn clean install
  5. Deploy the project with tilt up
  6. Once all resources are ready, trigger the pyspark-persist pipeline
  7. Bash into the data access pod kubectl exec -it <DATA_ACCESS_POD_NAME> -- bash
  8. Execute curl -X POST localhost:8080/graphql -H "Content-Type: application/json" -d '{ "query": "{ CustomRecord(table: \"my_new_table\") { customField } }" }' and ensure that data including two records is returned, ie: {"data":{"CustomRecord":[{"customField":null},{"customField":null}]}}
  9. Navigate to localhost:18080 and ensure that the Spark History Server is visible and has recorded the pipeline execution for pyspark-persist
  10. tilt down and delete the project

Test 3: testing the Config-store pvc is a read only file system

  1. Generate a new project using the following command:
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.9.0-SNAPSHOT \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  1. Copy the src directory from the attached folder and put it into the root of the projects
    2.1. If you are on a WSL instance and do not have the project on a path the Rancher instance can reach then save the src file somewhere on your C drive and make note of the path
    2.2. 300-helper.zip
  2. Add to the fermenter-mda plugin executions in test-project-deploy/pom.xml
<execution>
    <id>configuration-store</id>
    <phase>generate-sources</phase>
    <goals>
        <goal>generate-sources</goal>
    </goals>
    <configuration>
        <basePackage>com.boozallen.aissemble.test</basePackage>
        <profile>configuration-store-deploy-v2</profile>
        <!-- The property variables below are passed to the Generation Context and utilized
             to customize the deployment artifacts. -->
        <propertyVariables>
            <appName>configuration-store</appName>
        </propertyVariables>
    </configuration>
</execution>
  1. Add the config-store-service to tilt with:
# For WSL users, the configuration files need to be in an accessible path. Update the project path to the root file system. Example: '/mnt/c' or '/mnt/wsl/rancher-desktop'
project_path = os.path.abspath('.')
# Update configuration_files_path to match the path of the config files to be loaded into the configuration store. Example 'my-project-deploy/src/main/resources/config'
configuration_files_path = 'src/main/resources/configurations'

load('ext://helm_resource', 'helm_resource')
helm_resource(
    name='configuration-store',
    release_name='configuration-store',
    chart='test-project-deploy/src/main/resources/apps/configuration-store',
    namespace='config-store-ns',
    flags=['--values=test-project-deploy/src/main/resources/apps/configuration-store/values.yaml',
           '--values=test-project-deploy/src/main/resources/apps/configuration-store/values-dev.yaml',
           '--set=aissemble-configuration-store-chart.configurationVolume.volumePathOnNode=' + project_path + '/' + configuration_files_path,
           '--create-namespace']
)
  1. For wsl users change the following
project_path='/mnt/c/Users/YOUR_USER/PATH/TO/300-helper`
configuration_files_path = 'src/main/resources/configurations'
  1. Build the project
  2. Deploy the project
  3. ssh into the configuration-store pod
  4. Try and create a new file within the pvc with touch /configurations/test.txt
  5. Verify you are unable to do so
  6. delete the project

References/Additional Context

N/A

@cwoods-cpointe cwoods-cpointe added the enhancement New feature or request label Aug 27, 2024
@cwoods-cpointe cwoods-cpointe changed the title Feature: Default spark-infrastructure Helm charts to V2 and ensure allow migration Feature: Default spark-infrastructure Helm charts to V2 and ensure migration Aug 27, 2024
@cwoods-cpointe
Copy link
Contributor Author

cwoods-cpointe commented Aug 28, 2024

Running into issues with the current implementation:

  • The hive-mestastore-db/service can sometimes enter a crashloop and requires restarting the db pod
  • When starting the pyspark pipeline, it fails to find the configmap spark-config. Config map is being created as {{ .Release.Name }}-spark-conf
  • Starting the spark driver, it fails to find the spark.eventLog.dir location. its looking for /opt/spark/spark-events instead of the s3 path when we are using s3-local

@cwoods-cpointe
Copy link
Contributor Author

While testing I found a bug in our configuration store implementation. The PVC is supposed to be read only but it is miss-configured. Will add the resolution to DoD

@cwoods-cpointe
Copy link
Contributor Author

Currently running into an issue with the spark-event PVC that is being attached to the container. It is giving permissions errors when writing to the path

@cwoods-cpointe
Copy link
Contributor Author

Resolved the spark-event pvc error by updated the node host path. Working on testing the migration steps and scripts

cwoods-cpointe added a commit that referenced this issue Sep 6, 2024
cwoods-cpointe pushed a commit that referenced this issue Sep 6, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
cwoods-cpointe pushed a commit that referenced this issue Sep 9, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
cwoods-cpointe pushed a commit that referenced this issue Sep 9, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
cwoods-cpointe pushed a commit that referenced this issue Sep 9, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
@cwoods-cpointe cwoods-cpointe linked a pull request Sep 9, 2024 that will close this issue
cwoods-cpointe pushed a commit that referenced this issue Sep 9, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
cwoods-cpointe added a commit that referenced this issue Sep 9, 2024
cwoods-cpointe pushed a commit that referenced this issue Sep 9, 2024
Default all new applications to use spark-infrastructure to the V2
charts. Mark any V1 users with a depreciation warning. Provide migration
instructions for going from V1 -> V2.

Update spark-events to be saved to a more dynamic PVC instead of a S3
bucket.

Signed-off-by: Peter McClonski <mcclonski_peter@bah.com>

 #300 Resolve several issues with the V2 charts

Correct the Spark-event PV host node mount point. Update the spark
configmap name to be in line with our documentaion (-conf -> -config).

Corrected a bug in the Configuration-store PVC being writable instead of
read only.
@chang-annie
Copy link
Contributor

chang-annie commented Sep 10, 2024

all test steps are passing!

Test 1: testing the default v2 behavior

  • generated new test-project, added sample pipeline, dictionary, and record files, built successfully
  • confirmed spark-infrastructure helm chart is defaulted to v2
  • updated execute_step_impl
  • rebuilt, deployed through tilt, and successfully kicked off pyspark-persist pipeline
  • bashed into data-access pod and executed curl command to confirm it returns two records
  • confirmed that spark history server is visible and recorded the pyspark-persist pipeline execution

Test 2: testing the v1 -> v2 migration

  • generated new test-project, added sample pipeline, dictionary, and record files, built successfully
  • updated execute_step_impl
  • upgraded aiSSEMBLE to 1.9.0-SNAPSHOT and added supplementary sources
  • verified spark-inf v1 deprecation warnings exist in build output
  • followed upgrade steps for helm charts
  • verified spark-infrastructure contains the 3 subcharts - aissemble-hive-metastore-service-chart, aissemble-spark-history-chart, and aissemble-thrift-server-chart
  • rebuilt, deployed through tilt, and successfully kicked off pyspark-persist pipeline
  • bashed into data-access pod and executed curl command to confirm it returns two records
  • confirmed that spark history server is visible and recorded the pyspark-persist pipeline execution

Test 3: testing the Config-store pvc is a read only file system

  • generated new test-project and added provided src folder
  • updated fermenter-mda plugin execution in test-project-deploy/pom.xml
  • added config-store-service to tilt
  • rebuilt and deployed via tilt
  • bashed into configuration-store pod and confirmed unable to create new file (PVC is read only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@ewilkins-csi @chang-annie @cwoods-cpointe and others