-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backend] Metadata/Executions not written in 1.7.0-rc1 (New visualisations not working as a result) #6138
Comments
I've also just tested the issues with 1.7.0-rc2. I've tried kfp 1.4.0 and 1.6.6. |
Downgrading the metadata images seems to have solved this for me:
I believe the |
Actually reverting doesn't solve the problem completely, I still find the Statistics and Evaluator artifacts don't display. In the markdown they have zeroed out id's:
|
There seems to be a problem with ml-metadata 1.0.0 which is required by tfx 1.0.0 and matches the grpc server of 1.7.1. Downgrading the the metadata-grpc-server doesn't work solve anything. I've gone back down to tfx 0.30 which uses metadata 0.30.0 Visualisations work when I downgrade gcr.io/tfx-oss-public/ml_metadata_store_server to |
Hello @vaskozl , what is the TFX version you are using when you are using KFP 1.4.0? |
/cc @jiyongjung0 |
Possible related: tensorflow/tensorflow#50045 |
When on 1.4.0 everything from TFX 0.27 to 1.0.0 wrote metadata. (visualizations didn't work but they do in the 1.7.0 release). I do recall having to bump the grpcio version in requirements up and down while hopping up to TFX 1.0.0 . TFX 1.0.0 -> Kubeflow 1.7.0 (with metadata) No metadata written As I said I'm now using 1.7.0 with just the ml_metadata_store_server downgraded below 1.0.0. |
I found some possible cause. It is related to the changes in the way TFX stores their contexts since 1.0.0 (which is related to the changes in the execution stack using TFX IR). In TFX 0.X, the context were
However in TFX 1.0, the context became
So it seems like Kubeflow Pipelines cannot find context (and artifacts) properly. |
Thank you @jiyongjung0 for finding the difference between TFX 0.X and 1.0+! For backward compatibility, what should KFP frontend do to detect if a TFX pipeline is TFX 0.X or TFX 1.0+? |
Unfortunately, it seems that there is no direct clue when finding executions. (Artifacts has I think that we can try to find 1.0 context first, and fallback to 0.X context if not found. |
Makes sense to me to find 1.0 context first and then fall back to 0.X |
Sounds good, agree with the fallback strategy here. |
Hello @jiyongjung0 , Currently we use this code logic to identify the execution for a specific node using node ID (which is the corresponding Pod ID). However, with the latest integration with TFX, we are unable to find this connection from Execution, see one of the following example for a statisticsGen execution: How do I fix the TFX integration that I get the properties correctly like the following? (From an old deployment) |
Hi, this is a bug from TFX side introduced in tensorflow/tfx@24fc5d1. It seems like we don't record pod names in TFX 1.0.0. I'll try to fix this ASAP, and will update the release plan. |
See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391192276
See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391192276
Kubeflow Dag Runner should record additional execution property to store pod names. See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391192276
Kubeflow Dag Runner should record additional execution property to store pod names. See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391220221
I'm trying to include the above fix in the TFX 1.2.0 which is expected to be released tomorrow. I'll update you when the release is finalized. |
…4157) * Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391220221 Co-authored-by: jiyongjung <jiyongjung@google.com>
* Update RELEASE.md * Update version.py * Update version.py * Update version.py * Update dependencies.py * Update RELEASE.md * Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391220221 * Update RELEASE.md Co-authored-by: jiyongjung <jiyongjung@google.com>
TFX 1.2.0 was released today and this should be fixed. Thank you again for reporting this! |
Appreciate it a lot @jiyongjung0 for the quick fix and release of TFX 1.2.0! The following is what I found:
|
Thank you so much!! For 2.(Transform output) It seems like an inconsistency in TFX implementation. The artifact from Transform was added recently and I never tried before. I'll talk with other TFX folks and get back to you. For 3.(Evaluator), We need to change the |
Document the conversation: Now TFX moves the information in Execution to Context with type KFP will consider the possibility to pulling context for TFX. |
Kubeflow Dag Runner should record additional execution property to store pod names. See also kubeflow/pipelines#6138. PiperOrigin-RevId: 391220221
@zijianjoy Was the TFX > 1.0.0 fix included in the KFP 1.7.0 release? |
@ConverJens I can confirm it has. TFX 1.2.0 and Pipelines 1.7.0 work perfectly with no patches. |
@vaskozl Great news, thank you for the information! |
Yes @ConverJens , it is as confirmed by @vaskozl . BTW, @Bobgy is working on a compatibility matrix for KFP and TFX (and more) which shows the working version combinations in the future. |
@zijianjoy Great! A compatibility matrix would be awsome! |
Hello @ConverJens , you can check out the compatibility matrix in https://www.kubeflow.org/docs/components/pipelines/installation/compatibility-matrix/ now. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, there.
And it also had an issue with getKfpPod using KFP_POD_NAME, because kfpRun wasn't written into the DB.
|
Closing this issue as there is no activity since 2022. /close |
@rimolive: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
I upgraded Kubeflow from 1.4.0 to 1.7.0-rc1 with the platnform-agnostic manifests.
While I now see correct visualizations of statistics from runs that happened before upgrading to 1.7.0-rc1, new runs only display the markdown details.
The TFX pipelines I submit are exactly the same. On the new runs the ML Metadata tab of the components prints:
"Corresponding ML Metadata not found."
Furthermore I don't see any new executions on the executions page despite running many pipelines since upgrading.
I don't see anything special in the logs of the TFX pods except:
But that was present before upgrading to 1.7.0-rc1.
The only errors I see in the metadata-grpc-deployment pod is:
Which I also think is normal?
Basically I don't think executions and artifacts are getting written to the DB for some reason in 1.7.0-rc1. Not sure how to debug this. This causes the visualizations to not show up as far as I can see.
Metadata in the TFX pipelines is configured via the get_default_kubeflow_metadata_config tfx.orchestration.kubeflow function.
Environment:
Kubeflow version: 1.4.0 -> 1.7.0-rc1
kfctl version: Not used. Using tfx.orchestration.kubeflow to submit pipelines.
Kubernetes platform: Upstream kubeadm: k8s v1.20.5
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release): Centos 8
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: