Skip to content

Commit

Permalink
updated installation guide and blogs_publications (#773)
Browse files Browse the repository at this point in the history
add wheel links

fine tune docs

fine tune

fine tune

fine tune

fine tune
  • Loading branch information
jingxu10 authored May 19, 2022
1 parent 627f1ef commit 5d07b1b
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 15 deletions.
1 change: 1 addition & 0 deletions docs/tutorials/blogs_publications.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Blogs & Publications
====================

* [Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
* *Note*: APIs mentioned in it are deprecated.
Expand Down
81 changes: 67 additions & 14 deletions docs/tutorials/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ Examples

#### Code Changes Highlight

There are only a few lines code change required to use Intel® Extension for PyTorch\* on training.

Recommended code changes involve:
1. `torch.channels_last` is recommended to be applied to both of the model object and data to raise CPU resource usage efficiency.
2. `ipex.optimize` function applies optimizations against the model object, as well as an optimizer object.


```
...
import torch
import intel_extension_for_pytorch as ipex
...
model = Model()
model = model.to(memory_format=torch.channels_last)
criterion = ...
optimizer = ...
model.train()
Expand Down Expand Up @@ -56,6 +64,7 @@ train_loader = torch.utils.data.DataLoader(
)
model = torchvision.models.resnet50()
model = model.to(memory_format=torch.channels_last)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
Expand Down Expand Up @@ -104,6 +113,7 @@ train_loader = torch.utils.data.DataLoader(
)
model = torchvision.models.resnet50()
model = model.to(memory_format=torch.channels_last)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
Expand All @@ -116,7 +126,7 @@ for batch_idx, (data, target) in enumerate(train_loader):
data = data.to(memory_format=torch.channels_last)
output = model(data)
loss = criterion(output, target)
loss.backward()
loss.backward()
optimizer.step()
print(batch_idx)
torch.save({
Expand Down Expand Up @@ -193,6 +203,10 @@ torch.save({

## Inference

Channels last is a memory layout format that is more friendly to Intel Architecture. It is recommended for users to utilize this memory layout format for computer vision workloads. It is as simple as invoking `to(memory_format=torch.channels_last)` function against the model object and input data.

Moreover, `optimize` function of Intel® Extension for PyTorch\* applies optimizations to the model, and could bring performance boosts. For both computer vision workloads and NLP workloads, it is recommended to apply the `optimize` function against the model object.

### Float32

#### Imperative Mode
Expand Down Expand Up @@ -244,7 +258,7 @@ with torch.no_grad():

#### TorchScript Mode

It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.

##### Resnet50

Expand Down Expand Up @@ -301,6 +315,10 @@ with torch.no_grad():

### BFloat16

Similar to running with FP32, the `optimize` function also works for BFloat16 data type. The only difference is setting `dtype` parameter to `torch.bfloat16`.

Auto Mixed Precision (AMP) is recommended to be working with BFloat16 data type.

#### Imperative Mode

##### Resnet50
Expand Down Expand Up @@ -352,7 +370,7 @@ with torch.no_grad():

#### TorchScript Mode

It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.

##### Resnet50

Expand Down Expand Up @@ -412,6 +430,18 @@ with torch.no_grad():

#### Calibration

For calibrating a model with INT8 data type, code changes are highlighted in the code snippet below.

Please follow the steps below:

1. Utilize `torch.fx.experimental.optimization.fuse` function to perform op folding for better performance.
2. Import `intel_extension_for_pytorch` as `ipex`.
3. Instantiate a config object with `ipex.quantization.QuantConf` function to save configuration data during calibration.
4. Iterate through calibration dataset under `ipex.quantization.calibrate` scope to perform the calibration.
5. Save the calibration data into a `json` file.
6. Invoke `ipex.quantization.convert` function to apply the calibration configure object to the fp32 model object to get an INT8 model.
7. Save the INT8 model into a `pt` file.

```
import os
import torch
Expand All @@ -420,39 +450,50 @@ model = Model()
model.eval()
data = torch.rand(<shape>)
# Applying torch.fx.experimental.optimization.fuse against model performs
# Applying torch.fx.experimental.optimization.fuse against model performs
# conv-batchnorm folding for better performance.
import torch.fx.experimental.optimization as optimization
model = optimization.fuse(model, inplace=True)
#################### code changes ####################
import intel_extension_for_pytorch as ipex
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
######################################################
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
for d in calibration_data_loader():
# conf will be updated with observed statistics during calibrating with the dataset
for d in calibration_data_loader():
# conf will be updated with observed statistics during calibrating with the dataset
with ipex.quantization.calibrate(conf):
model(d)
model(d)
conf.save('int8_conf.json', default_recipe=True)
with torch.no_grad():
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
model.save('quantization_model.pt')
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
######################################################
model.save('quantization_model.pt')
```

#### Deployment

##### Imperative Mode

In imperative mode, the INT8 model conversion is done on-the-fly.

Please follow the steps below:

1. Utilize `torch.fx.experimental.optimization.fuse` function to perform op folding for better performance.
2. Import `intel_extension_for_pytorch` as `ipex`.
3. Load the calibration configuration object from the saved file.
4. Invoke `ipex.quantization.convert` function to apply the calibration configure object to the fp32 model object to get an INT8 model.
5. Run inference.

```
import torch
model = Model()
model.eval()
data = torch.rand(<shape>)
# Applying torch.fx.experimental.optimization.fuse against model performs
# Applying torch.fx.experimental.optimization.fuse against model performs
# conv-batchnorm folding for better performance.
import torch.fx.experimental.optimization as optimization
model = optimization.fuse(model, inplace=True)
Expand All @@ -463,15 +504,25 @@ conf = ipex.quantization.QuantConf('int8_conf.json')
######################################################
with torch.no_grad():
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
model(data)
```

##### Graph Mode

In graph mode, the INT8 model is loaded from the local file and can be used directly on the inference.

Please follow the steps below:

1. Import `intel_extension_for_pytorch` as `ipex`.
2. Load the INT8 model from the saved file.
3. Run inference.

```
import torch
#################### code changes ####################
import intel_extension_for_pytorch as ipex
######################################################
model = torch.jit.load('quantization_model.pt')
model.eval()
Expand All @@ -481,6 +532,8 @@ with torch.no_grad():
model(data)
```

oneDNN provides [oneDNN Graph Compiler](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview4/doc#onednn-graph-compiler) as a prototype feature which could boost performance for selective topologies. No code change is required. Please install [a binary](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) with this feature enabled. We verified this feature with `Bert-large`, `bert-base-cased`, `roberta-base`, `xlm-roberta-base`, `google-electra-base-generator` and `google-electra-base-discriminator`.

## C++

To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch\* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
Expand Down Expand Up @@ -582,4 +635,4 @@ $ ldd example-app
## Model Zoo
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.11-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.11-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
9 changes: 8 additions & 1 deletion docs/tutorials/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Prebuilt wheel files availability matrix for Python versions

| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
| :--: | :--: | :--: | :--: | :--: | :--: |
| 1.11.200 | | ✔️ | ✔️ | ✔️ | ✔️ |
| 1.11.0 | | ✔️ | ✔️ | ✔️ | ✔️ |
| 1.10.100 | ✔️ | ✔️ | ✔️ | ✔️ | |
| 1.10.0 | ✔️ | ✔️ | ✔️ | ✔️ | |
Expand All @@ -63,6 +64,11 @@ Alternatively, you can also install the latest version with the following comman
python -m pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable
```

For pre-built wheel files with [oneDNN Graph Compiler](#installation_onednn_graph_compiler), please use the following command to perform the installation.
```
python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-dev
```

**Note:** For version prior to 1.10.0, please use package name `torch_ipex`, rather than `intel_extension_for_pytorch`.

**Note:** To install a package with a specific version, please run with the following command.
Expand All @@ -76,7 +82,7 @@ python -m pip install <package_name>==<version_name> -f https://software.intel.c
```bash
git clone --recursive https://github.com/intel/intel-extension-for-pytorch
cd intel-extension-for-pytorch
git checkout v1.11.0
git checkout v1.11.200

# if you are updating an existing checkout
git submodule sync
Expand Down Expand Up @@ -119,6 +125,7 @@ docker pull intel/intel-optimized-pytorch:latest

|Version|Pre-cxx11 ABI|cxx11 ABI|
|--|--|--|
| 1.11.200 | [libintel-ext-pt-1.11.200+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-shared-with-deps-1.11.200%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.11.200+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-cxx11-abi-shared-with-deps-1.11.200%2Bcpu.run) |
| 1.11.0 | [libintel-ext-pt-1.11.0+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-1.11.0%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.11.0+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-cxx11-abi-1.11.0%2Bcpu.run) |
| 1.10.100 | [libtorch-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/libtorch-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip) | [libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip) |
| 1.10.0 | [intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0+cpu.zip](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0%2Bcpu.zip) | [intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0+cpu.zip](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu.zip) |
Expand Down
15 changes: 15 additions & 0 deletions docs/tutorials/releases.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
Releases
=============

## 1.11.200

### Highlights

- Enable more fused operators to accelerate particular models.
- Fuse `Convolution` and `LeakyReLU` ([#648](https://github.com/intel/intel-extension-for-pytorch/commit/d7603133f37375b3aba7bf744f1095b923ba979e))
- Support [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html) and fuse it with `add` ([#684](https://github.com/intel/intel-extension-for-pytorch/commit/b66d6d8d0c743db21e534d13be3ee75951a3771d))
- Fuse `Linear` and `Tanh` ([#685](https://github.com/intel/intel-extension-for-pytorch/commit/f0f2bae96162747ed2a0002b274fe7226a8eb200))
- In addition to the original installation methods, this release provides Docker installation from [DockerHub](https://hub.docker.com/).
- Provided the [evaluation wheel packages](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) that could boost performance for selective topologies on top of oneDNN graph compiler prototype feature.
***NOTE***: This is still at an early development stage and not fully mature yet, but feel free to reach out through GitHub tickets if you have any suggestions.

**[Full Changelog](https://github.com/intel/intel-extension-for-pytorch/compare/v1.11.0...v1.11.200)**


## 1.11.0

We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by tightly following PyTorch 1.11 release. Along with extension 1.11, we focused on continually improving OOB user experience and performance. Highlights include:
Expand Down

0 comments on commit 5d07b1b

Please sign in to comment.