Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatchingEngineIndex.create_tree_ah_index in Vertex Pipelines times out after 900 seconds #1870

Closed
chrisk447 opened this issue Dec 27, 2022 · 3 comments
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@chrisk447
Copy link

Environment details

  • Running in Vertex Pipelines
  • OS type and version: gcr.io/deeplearning-platform-release/base-cu100
  • google-cloud-aiplatform version: 1.20.0
  • KFP version: 1.8.18

Steps to reproduce
Build a kfp component using aiplatform.MatchingEngineIndex.create_tree_ah_index.

Include google-cloud-aiplatform==1.20.0 as a package to install. Use the gcr.io/deeplearning-platform-release/base-cu100 as the docker image. Build the pipeline json using kfp.v2.dsl's component function.

Create a Vertex Pipeline using the pipeline json.

Expected result
Pipeline should continue running until matching engine index is fully created.

Code Example

def create_tree_ah_index(
    display_name: str,
    jsonl_formatted_data_uri: str,
    dimensions: int = 100,
    approximate_neighbors_count: int = 150,
    distance_measure_type: str = "DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count: int = 500,
    leaf_nodes_to_search_percent: float = 7,
    description: str = "ANN index",
    labels: dict = {"label_name": "label_value"},
    sync: bool = False
):
    from google.cloud import aiplatform
    import logging
    import time
    logging.basicConfig(level=logging.INFO)

    logging.info("ANN Index")
    try:
        tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
            display_name=display_name,
            contents_delta_uri=jsonl_formatted_data_uri,
            dimensions=dimensions,
            approximate_neighbors_count=approximate_neighbors_count,
            distance_measure_type=distance_measure_type,
            leaf_node_embedding_count=leaf_node_embedding_count,
            leaf_nodes_to_search_percent=leaf_nodes_to_search_percent,
            description=description,
            labels=labels,
            sync=sync
        )
        while True:
            if tree_ah_index._are_futures_done():
                index_resource_name = tree_ah_index.resource_name
                logging.info("Index successfully created with ID : %s ", index_resource_name)
                break
            logging.info("Polling the operation every 3 minutes to create index...")
            time.sleep(180)

    except Exception as e:
        logging.exception("The index creation failed: {}".format(e))

Stack trace

[KFP Executor 2022-12-22 18:33:31,182 INFO]: Create MatchingEngineIndex backing LRO: projects/589820861215/locations/us-central1/indexes/8058285535598215168/operations/2805451540867842048
[KFP Executor 2022-12-22 18:36:30,704 INFO]: Polling the operation every 3 minutes to create index...
[KFP Executor 2022-12-22 18:36:30,704 INFO]: Polling the operation every 3 minutes to create index...
[KFP Executor 2022-12-22 18:36:30,704 INFO]: Polling the operation every 3 minutes to create index...
[KFP Executor 2022-12-22 18:36:30,704 INFO]: Polling the operation every 3 minutes to create index...
[KFP Executor 2022-12-22 18:48:31,057 ERROR]: The index creation failed: MatchingEngineIndex resource has not been created. Resource failed with: Operation did not complete within the designated timeout of 900 seconds.
@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Dec 27, 2022
@sasha-gitg
Copy link
Member

@chrisk447 This seems to be the result of a breaking change in python-api-core that introduced a 900s default timeout on LROs: googleapis/python-api-core#462

Please try downgrading google-api-core<2.11.0

@vam-google
Copy link

vam-google commented Jan 9, 2023

The old behavior was incorrect, there should be no assumption that core libraries would ever poll for an LRO indefinitely (old behavior). Please provide actual timeouts for any api-specific calls (i.e. something like result(timeout = 60*60) for calls to PollingFuture. It is still possible to mimic the old infinite timeout behavior by calling it as result(timeout = None) but please do not do it unless absolutely necessary, as infinite timeouts are an anti-pattern and allow writing code which may run for infinitely long.

@sasha-gitg
Copy link
Member

Fixed in 1.21.

copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Mar 20, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Mar 20, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Mar 23, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Apr 4, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Apr 5, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Apr 5, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Apr 5, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 518075999
copybara-service bot pushed a commit to kubeflow/pipelines that referenced this issue Apr 5, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 522122033
rd-pong pushed a commit to rd-pong/pipelines that referenced this issue Apr 26, 2023
…oid dataset creation 900s timeout and remove the workaround. Reference: googleapis/python-aiplatform#1870 (comment)

PiperOrigin-RevId: 522122033
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

3 participants