Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't loop over output from lightweight function #2692

Closed
ColinCampbell-GBI opened this issue Dec 3, 2019 · 7 comments · Fixed by #2711
Closed

Can't loop over output from lightweight function #2692

ColinCampbell-GBI opened this issue Dec 3, 2019 · 7 comments · Fixed by #2711

Comments

@ColinCampbell-GBI
Copy link

What happened:

When calling ParallelFor on the output of a lightweight python component, I get an error saying that the task is not defined.

@dsl.pipeline(
    name="Parallel Pipeline",
)
def parallel_pipeline(
        project_id,
        region,
        pipeline_root,
        job_config=''
):
    get_areas_op = func_to_container_op(get_active_areas)
    test_op = func_to_container_op(test)

    areas_task = get_areas_op()

    with dsl.ParallelFor(areas_task.output) as item:
        test_op(item)

Pipeline returns the error

invalid spec: templates.parallel-pipeline.tasks.for-loop-for-loop-c9def31b-1.dependencies[1] dependency 'Get active areas' not defined

This occurs when triggering a run through create_run_from_pipeline_func. If the pipeline is compiled and uploaded using the browser, attempting to view the pipeline leads to a blank screen with the following errors in the console.

react-dom.production.min.js:179 TypeError: Cannot read property 'y' of undefined
    at Graph.tsx:161
    at Array.forEach (<anonymous>)
    at t.render (Graph.tsx:136)
    at so (react-dom.production.min.js:163)
    at ao (react-dom.production.min.js:162)
    at po (react-dom.production.min.js:169)
    at Ko (react-dom.production.min.js:204)
    at Xo (react-dom.production.min.js:205)
    at Da (react-dom.production.min.js:220)
    at Ma (react-dom.production.min.js:219)
bo @ react-dom.production.min.js:179
To.n.callback @ react-dom.production.min.js:190
ci @ react-dom.production.min.js:128
ui @ react-dom.production.min.js:128
Pa @ react-dom.production.min.js:229
Da @ react-dom.production.min.js:220
Ma @ react-dom.production.min.js:219
Sa @ react-dom.production.min.js:216
ea @ react-dom.production.min.js:214
enqueueSetState @ react-dom.production.min.js:134
x.setState @ react.production.min.js:13
t.setStateSafe @ Page.tsx:80
(anonymous) @ PipelineDetails.tsx:341
(anonymous) @ PipelineDetails.tsx:1
(anonymous) @ PipelineDetails.tsx:1
a @ PipelineDetails.tsx:1
Promise.then (async)
l @ PipelineDetails.tsx:1
a @ PipelineDetails.tsx:1
Promise.then (async)
l @ PipelineDetails.tsx:1
(anonymous) @ PipelineDetails.tsx:1
F @ PipelineDetails.tsx:1
t.load @ PipelineDetails.tsx:243
(anonymous) @ PipelineDetails.tsx:240
(anonymous) @ PipelineDetails.tsx:1
(anonymous) @ PipelineDetails.tsx:1
(anonymous) @ PipelineDetails.tsx:1
F @ PipelineDetails.tsx:1
t.componentDidMount @ PipelineDetails.tsx:239
Pa @ react-dom.production.min.js:229
Da @ react-dom.production.min.js:220
Ma @ react-dom.production.min.js:219
La @ react-dom.production.min.js:231
Tn @ react-dom.production.min.js:85
Graph.tsx:161 Uncaught (in promise) TypeError: Cannot read property 'y' of undefined
    at Graph.tsx:161
    at Array.forEach (<anonymous>)
    at t.render (Graph.tsx:136)
    at so (react-dom.production.min.js:163)
    at ao (react-dom.production.min.js:162)
    at po (react-dom.production.min.js:169)
    at Ko (react-dom.production.min.js:204)
    at Xo (react-dom.production.min.js:205)
    at Da (react-dom.production.min.js:220)
    at Ma (react-dom.production.min.js:219)

What did you expect to happen:

ParallelFor to run a task for each entry in the list returned by get_active_areas

Anything else you would like to add:

Pipeline compiled with 0.1.36, run on kubeflow 0.7.0

@ColinCampbell-GBI
Copy link
Author

ColinCampbell-GBI commented Dec 3, 2019

Another error occurs if the python function compiled with func_to_container_op and passed into the ParallelFor has a single word name. e.g. def areas() -> list:

...
  File "/Users/colincampbell/.virtualenvs/venvs/recommendations/lib/python3.7/site-packages/kfp/_client.py", line 342, in create_run_from_pipeline_func
    compiler.Compiler().compile(pipeline_func, pipeline_package_path, pipeline_conf=pipeline_conf)
  File "/Users/colincampbell/.virtualenvs/venvs/recommendations/lib/python3.7/site-packages/kfp/compiler/compiler.py", line 846, in compile
    package_path=package_path)
  File "/Users/colincampbell/.virtualenvs/venvs/recommendations/lib/python3.7/site-packages/kfp/compiler/compiler.py", line 908, in _create_and_write_workflow
    pipeline_conf)
  File "/Users/colincampbell/.virtualenvs/venvs/recommendations/lib/python3.7/site-packages/kfp/compiler/compiler.py", line 790, in _create_workflow
    workflow = fix_big_data_passing(workflow)
  File "/Users/colincampbell/.virtualenvs/venvs/recommendations/lib/python3.7/site-packages/kfp/compiler/_data_passing_rewriter.py", line 155, in fix_big_data_passing
    upstream_template_name = task_name_to_template_name[upstream_task_name]
KeyError: 'Active'

@Ark-kun
Copy link
Contributor

Ark-kun commented Dec 4, 2019

There seems to be two bugs:

  1. PipelineParam.op_name is not getting sanitized for when added as dependency by ParallelFor. @numerology
  2. UX handling of invalid pipelines. @Bobgy

This also highlights the lack of validation in Backend's create_run and upload_pipeline methods. @IronPan @jingzhang36 @rmgogogo

@numerology
Copy link

@ColinCampbell-GBI do you mind sharing your get_active_areas and test functions so that we can easily verify the fix? Thanks!

@Ark-kun
Copy link
Contributor

Ark-kun commented Dec 5, 2019

I can reproduce the issue:

import kfp

produce_op = kfp.components.load_component_from_text('''\
name: Produce list
outputs:
- name: data_list
implementation:
  container:
    image: busybox
    command:
    - sh
    - -c
    - echo "[1, 2, 3]" > "$0"
    - outputPath: data_list
''')

consume_op = kfp.components.load_component_from_text('''\
name: Consume data
inputs:
- name: data
implementation:
  container:
    image: busybox
    command:
    - echo
    - inputValue: data
''')

def pip1():
    source_task = produce_op()
    with kfp.dsl.ParallelFor(source_task.output) as item:
        consume_op(item)

kfp.compiler.Compiler()._compile(pip1)
...
   {'dag': {'tasks': [{'arguments': {'parameters': [{'name': 'produce-list-data_list',
          'value': '{{item}}'}]},
       'dependencies': ['produce-list', 'Produce list'],
       'name': 'for-loop-for-loop-75795c46-1',
       'template': 'for-loop-for-loop-75795c46-1',
       'withParam': '{{tasks.Produce list.outputs.parameters.Produce list-data_list}}'},
      {'name': 'produce-list', 'template': 'produce-list'}]},
...

As you see, the task name is not sanitized here (has spaces).

@Bobgy
Copy link
Contributor

Bobgy commented Dec 5, 2019

Thanks Alexey! I will fix UI handling

@ColinCampbell-GBI
Copy link
Author

@ColinCampbell-GBI do you mind sharing your get_active_areas and test functions so that we can easily verify the fix? Thanks!

def get_active_areas() -> list:
    return [
        {"a": 1},
        {"b": 2},
    ]


def test(areas: dict) -> str:
    print(areas)
    return "test"

@numerology
Copy link

It seems like we need to sanitize the output names eagerly during compilation. Will put together a fix shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants