posts/arc-a770-testing/part-3/ #40

utterances-bot · 2023-08-17T15:56:10Z

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 3

This post covers my findings from training style transfer models and running Stable Diffusion with the 🤗 Diffusers library on the Arc A770 with Intel’s PyTorch extension.

https://christianjmills.com/posts/arc-a770-testing/part-3/

ColonelPhantom · 2023-08-17T15:56:11Z

It looks like Intel finally released a PyTorch 2.x version of IPEX a few weeks ago: https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu and it even looks like it natively supports Windows now https://intel.github.io/intel-extension-for-pytorch/xpu/2.0.110+xpu/tutorials/installation.html!

Are you still planning to re-run these tests?

cj-mills · 2023-08-17T16:16:04Z

@ColonelPhantom Yep, I plan to try the new version on Ubuntu and Windows once I wrap up my current tutorial.

parthvzala · 2024-03-28T15:33:58Z

how is it now? Want to know before buying. Thanks

cj-mills · 2024-03-31T22:26:43Z

@parthvzala,

The most recent 2.1.20+xpu release of Intel's Pytorch extension partially works, depending on what you want to use it for.

There was a definite drop in quality with the PyTorch 2.0+ versions of the extension, and I'm still not sure what the source of the issues is.

Activating the IPEX_XPU_ONEDNN_LAYOUT environment variable now causes model accuracy to fail to improve during training.

Setting models to evaluation mode (e.g., with model.eval()) causes the model to produce completely different (and useless) results than when the model is in training mode (e.g., with model.train()), at least for the image classification, YOLOX, and Mask R-CNN models that I tested.

The evaluation mode issues aside, the image classification and YOLOX object detection models did improve to a usable point during training. However, while the Mask R-CNN model did improve during the training process, it failed to reach a usable accuracy.

Stable Diffusion inference with HuggingFace Diffusers still works, with the A770 able to produce 1024x1024 images with Stable Diffusion XL without issue.

Unfortunately, I don't have time to try running any LLMs, as I need to take the Arc GPU back out of my system today.

vishnumadhu365 · 2024-04-05T13:26:22Z

@cj-mills
Hi Christian

For some of the issues you had faced on ARC770, would be great if you could post it as issues on the Intel Extension for Pytorch Github repo. This space is actively monitored by Intel engineers (like me)

Appreciate the detailed technical blog and looking forward to more interesting tutorials from you!

cj-mills · 2024-04-08T04:33:23Z

@vishnumadhu365

A reader opened a related issue in February, but I can make another one with more details when I have time to reinstall the ARC card in my desktop.

TheMrCodes mentioned this issue May 10, 2024

Training loss does not improve when running the cifar10 sample intel/intel-extension-for-pytorch#537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posts/arc-a770-testing/part-3/ #40

posts/arc-a770-testing/part-3/ #40

utterances-bot commented Aug 17, 2023

ColonelPhantom commented Aug 17, 2023

cj-mills commented Aug 17, 2023

parthvzala commented Mar 28, 2024

cj-mills commented Mar 31, 2024

vishnumadhu365 commented Apr 5, 2024

cj-mills commented Apr 8, 2024

posts/arc-a770-testing/part-3/ #40

posts/arc-a770-testing/part-3/ #40

Comments

utterances-bot commented Aug 17, 2023

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 3

ColonelPhantom commented Aug 17, 2023

cj-mills commented Aug 17, 2023

parthvzala commented Mar 28, 2024

cj-mills commented Mar 31, 2024

vishnumadhu365 commented Apr 5, 2024

cj-mills commented Apr 8, 2024