Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posts/arc-a770-testing/part-3/ #40

Open
utterances-bot opened this issue Aug 17, 2023 · 6 comments
Open

posts/arc-a770-testing/part-3/ #40

utterances-bot opened this issue Aug 17, 2023 · 6 comments

Comments

@utterances-bot
Copy link

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 3

This post covers my findings from training style transfer models and running Stable Diffusion with the 🤗 Diffusers library on the Arc A770 with Intel’s PyTorch extension.

https://christianjmills.com/posts/arc-a770-testing/part-3/

Copy link

It looks like Intel finally released a PyTorch 2.x version of IPEX a few weeks ago: https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu and it even looks like it natively supports Windows now https://intel.github.io/intel-extension-for-pytorch/xpu/2.0.110+xpu/tutorials/installation.html!

Are you still planning to re-run these tests?

@cj-mills
Copy link
Owner

@ColonelPhantom Yep, I plan to try the new version on Ubuntu and Windows once I wrap up my current tutorial.

Copy link

how is it now? Want to know before buying. Thanks

@cj-mills
Copy link
Owner

@parthvzala,

The most recent 2.1.20+xpu release of Intel's Pytorch extension partially works, depending on what you want to use it for.

There was a definite drop in quality with the PyTorch 2.0+ versions of the extension, and I'm still not sure what the source of the issues is.

Activating the IPEX_XPU_ONEDNN_LAYOUT environment variable now causes model accuracy to fail to improve during training.

Setting models to evaluation mode (e.g., with model.eval()) causes the model to produce completely different (and useless) results than when the model is in training mode (e.g., with model.train()), at least for the image classification, YOLOX, and Mask R-CNN models that I tested.

The evaluation mode issues aside, the image classification and YOLOX object detection models did improve to a usable point during training. However, while the Mask R-CNN model did improve during the training process, it failed to reach a usable accuracy.

Stable Diffusion inference with HuggingFace Diffusers still works, with the A770 able to produce 1024x1024 images with Stable Diffusion XL without issue.

Unfortunately, I don't have time to try running any LLMs, as I need to take the Arc GPU back out of my system today.

Copy link

@cj-mills
Hi Christian

For some of the issues you had faced on ARC770, would be great if you could post it as issues on the Intel Extension for Pytorch Github repo. This space is actively monitored by Intel engineers (like me)

Appreciate the detailed technical blog and looking forward to more interesting tutorials from you!

@cj-mills
Copy link
Owner

cj-mills commented Apr 8, 2024

@vishnumadhu365

A reader opened a related issue in February, but I can make another one with more details when I have time to reinstall the ARC card in my desktop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants