BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

mbs-octoml · 2021-09-22T16:54:14Z

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:

Examples using tensorflow will gobble up GPU memory and not give it back.
Any examples which use any of the (many!) global registration mechanisms
need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

…_pass_infra.py This tutorial registers a global layout transformation for conv2d for all targets which is not well-formed. Later uses of conv2d in the tutorials pick that layout up then assert fail in the conv2d type-relation. Better would be to register a transform for an entirely fake target, but that is beyond my current level of expertise. In general our use of sphinx/sphinx_gallery for running and rendering the tutorials is highly suspect since there is no inter-example isolation: - Examples using tensorflow will gobble up GPU memory and not give it back. - Any examples which use any of the (many!) global registration mechanisms need to ensure the registrant is safe across all tutorials. I recall seeing a thread with the sphinx_gallery where they said they'd prefer not to work on process-level isolation, but it's probably worth pinging again. While digging into this I noticed we had a slicing cast in AlterOpLayout due to a derived class of ObjectRef introducing virtuals. I moved the virtuals to the corresponding Node classes. In this case we got away with it since the ObjectRef happened to not get copied but we were on very thin ice.

I should have run locally, there goes 6hrs of CI.

jroesch · 2021-09-23T16:34:50Z

@mbs-octoml can we just put a backlog item on fixing the tutorial? going to merge for CI

junrushao · 2021-09-23T16:41:36Z

Thanks @mbs-octoml @jroesch!

…_pass_infra.py (apache#9076) * BUG apache#8013: Remove register_alter_op_layout example from dev/use_pass_infra.py This tutorial registers a global layout transformation for conv2d for all targets which is not well-formed. Later uses of conv2d in the tutorials pick that layout up then assert fail in the conv2d type-relation. Better would be to register a transform for an entirely fake target, but that is beyond my current level of expertise. In general our use of sphinx/sphinx_gallery for running and rendering the tutorials is highly suspect since there is no inter-example isolation: - Examples using tensorflow will gobble up GPU memory and not give it back. - Any examples which use any of the (many!) global registration mechanisms need to ensure the registrant is safe across all tutorials. I recall seeing a thread with the sphinx_gallery where they said they'd prefer not to work on process-level isolation, but it's probably worth pinging again. While digging into this I noticed we had a slicing cast in AlterOpLayout due to a derived class of ObjectRef introducing virtuals. I moved the virtuals to the corresponding Node classes. In this case we got away with it since the ObjectRef happened to not get copied but we were on very thin ice. * [checkpoint] Woops, forgot there was an extra AlterOpLayout I should have run locally, there goes 6hrs of CI.

mbs-octoml requested review from anijain2305, areusch, comaniac, jroesch, junrushao, jwfromm, MarisaKirisame, mbrookhart, merrymercy, slyubomirsky, tqchen, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners September 22, 2021 16:54

mbs-octoml mentioned this pull request Sep 22, 2021

[Bug] tutorials do not build from a clean source tree #9013

Closed

[checkpoint] Woops, forgot there was an extra AlterOpLayout

bc96d94

I should have run locally, there goes 6hrs of CI.

This was referenced Sep 23, 2021

Move the allocates of AoT codegen to be TVMBAWs #9065

Merged

[4/6] Arm(R) Ethos(TM)-U NPU TIR to CS for Conv2D #8811

Merged

mbs-octoml mentioned this pull request Sep 23, 2021

[Relay] Prepare for merging context_analysis.cc and device_annotation.cc #9077

Merged

jroesch approved these changes Sep 23, 2021

View reviewed changes

junrushao merged commit e887286 into apache:main Sep 23, 2021

mbs-octoml deleted the mbs-issue-9013 branch September 23, 2021 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

mbs-octoml commented Sep 22, 2021 •

edited

Loading

jroesch commented Sep 23, 2021

junrushao commented Sep 23, 2021

BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

Conversation

mbs-octoml commented Sep 22, 2021 • edited Loading

jroesch commented Sep 23, 2021

junrushao commented Sep 23, 2021

mbs-octoml commented Sep 22, 2021 •

edited

Loading