Transform

Neural Compressor supports built-in preprocessing methods on different framework backends. Refer to this HelloWorld example on how to configure a transform in a dataloader.

Transform support list

TensorFlow

Transform	Parameters	Comments	Usage(In yaml file)
Resize(size, interpolation)	size (list or int): Size of the result interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Resize the input image to the given size	Resize: size: 256 interpolation: bilinear
CenterCrop(size)	size (list or int): Size of the result	Crops the given image at the center to the given size	CenterCrop: size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation)	size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest'	Crop the given image to random size and aspect ratio	RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear
Normalize(mean, std)	mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape	Normalize a image with mean and standard deviation	Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0]
RandomCrop(size)	size (list or int): Size of the result	Crop the image at a random location to the given size	RandomCrop: size: [10, 10] # size: 10
Compose(transform_list)	transform_list (list of Transform objects): list of transforms to compose	Composes several transforms together	If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (args) normalize = preprocess["Normalize"] (args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label
CropResize(x, y, width, height, size, interpolation)	x (int):Left boundary of the cropping area y (int):Top boundary of the cropping area width (int):Width of the cropping area height (int):Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest' and 'bicubic'	Crop the input image with given location and resize it	CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear
RandomHorizontalFlip()	None	Horizontally flip the given image randomly	RandomHorizontalFlip: {}
RandomVerticalFlip()	None	Vertically flip the given image randomly	RandomVerticalFlip: {}
DecodeImage()	None	Decode a JPEG-encoded image to a uint8 tensor	DecodeImage: {}
EncodeJped()	None	Encode image to a Tensor of type string	EncodeJped: {}
Transpose(perm)	perm (list): A permutation of the dimensions of input image	Transpose image according perm	Transpose: perm: [1, 2, 0]
ResizeWithRatio(min_dim, max_dim, padding)	min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim	Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array or tf.Tensor.	ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True
CropToBoundingBox(offset_height, offset_width, target_height, target_width)	offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result	Crops an image to a specified bounding box	CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224
Cast(dtype)	dtype (str, default='float32'): A dtype to convert image to	Convert image to given dtype	Cast: dtype: float32
ToArray()	None	Convert PIL Image to numpy array	ToArray: {}
Rescale()	None	Scale the values of image to [0,1]	Rescale: {}
AlignImageChannel(dim)	dim (int): The channel number of result image	Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated.	AlignImageChannel: dim: 3
ParseDecodeImagenet()	None	Parse features in Example proto	ParseDecodeImagenet: {}
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale)	height (int): Height of the result width (int): Width of the result random_crop (bool, default=False): whether to random crop resize_side (int, default=256):desired shape after resize operation random_flip_left_right (bool, default=False): whether to random flip left and right mean_value (list, default=[0.0,0.0,0.0]):means for each channel scale (float, default=1.0):std value	Combination of a series of transforms which is applicable to images in Imagenet	ResizeCropImagenet: height: 224 width: 224 random_crop: False resize_side: 256 random_flip_left_right: False mean_value: [123.68, 116.78, 103.94] scale: 0.017
QuantizedInput(dtype, scale)	dtype(str): desired image dtype, support 'uint8', 'int8' scale(float, default=None):scaling ratio of each point in image	Convert the dtype of input to quantize it	QuantizedInput: dtype: 'uint8'
LabelShift(label_shift)	label_shift(int, default=0): number of label shift	Convert label to label - label_shift	LabelShift: label_shift: 0
BilinearImagenet(height, width, central_fraction, mean_value, scale)	height(int): Height of the result width(int):Width of the result central_fraction(float, default=0.875):fraction of size to crop mean_value(list, default=[0.0,0.0,0.0]):means for each channel scale(float, default=1.0):std value	Combination of a series of transforms which is applicable to images in Imagenet	BilinearImagenet: height: 224 width: 224 central_fraction: 0.875 mean_value: [0.0,0.0,0.0] scale: 1.0
SquadV1(label_file, n_best_size, max_seq_length, max_query_length, max_answer_length, do_lower_case, doc_stride)	label_file (str): path of label file vocab_file(str): path of vocabulary file n_best_size (int, default=20): The total number of n-best predictions to generate in the nbest_predictions.json output file max_seq_length (int, default=384): The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter, than this will be padded max_query_length (int, default=64): The maximum number of tokens for the question. Questions longer than this will be truncated to this length max_answer_length (int, default=30): The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another do_lower_case (bool, default=True): Whether to lower case the input text. Should be True for uncased models and False for cased models doc_stride (int, default=128): When splitting up a long document into chunks, how much stride to take between chunks	Postprocess the predictions of bert on SQuAD	SquadV1 label_file: /path/to/label_file n_best_size: 20 max_seq_length: 384 max_query_length: 64 max_answer_length: 30 do_lower_case: True doc_stride: True

Pytorch

Transform	Parameters	Comments	Usage(In yaml file)
Resize(size)	size (list or int): Size of the result interpolation(str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Resize the input image to the given size	Resize: size: 256 interpolation: bilinear
CenterCrop(size)	size (list or int): Size of the result	Crops the given image at the center to the given size	CenterCrop: size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation)	size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Crop the given image to random size and aspect ratio	RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear
Normalize(mean, std)	mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape	Normalize a image with mean and standard deviation	Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0]
RandomCrop(size)	size (list or int): Size of the result	Crop the image at a random location to the given size	RandomCrop: size: [10, 10] # size: 10
Compose(transform_list)	transform_list (list of Transform objects): list of transforms to compose	Composes several transforms together	If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (args) normalize = preprocess["Normalize"] (args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label
RandomHorizontalFlip()	None	Horizontally flip the given image randomly	RandomHorizontalFlip: {}
RandomVerticalFlip()	None	Vertically flip the given image randomly	RandomVerticalFlip: {}
Transpose(perm)	perm (list): A permutation of the dimensions of input image	Transpose image according perm	Transpose: perm: [1, 2, 0]
CropToBoundingBox(offset_height, offset_width, target_height, target_width)	offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result	Crops an image to a specified bounding box	CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224
ToTensor()	None	Convert a PIL Image or numpy.ndarray to tensor	ToTensor: {}
ToPILImage()	None	Convert a tensor or an ndarray to PIL Image	ToPILImage: {}
Pad(padding, fill, padding_mode)	padding (int or tuple or list): Padding on each border fill (int or str or tuple): Pixel fill value for constant fill. Default is 0 padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant	Pad the given image on all sides with the given “pad” value	Pad: padding: 0 fill: 0 padding_mode: constant
ColorJitter(brightness, contrast, saturation, hue)	brightness (float or tuple of python:float (min, max)): How much to jitter brightness. Default is 0 contrast (float or tuple of python:float (min, max)): How much to jitter contrast. Default is 0 saturation (float or tuple of python:float (min, max)): How much to jitter saturation. Default is 0 hue (float or tuple of python:float (min, max)): How much to jitter hue. Default is 0	Randomly change the brightness, contrast, saturation and hue of an image	ColorJitter: brightness: 0 contrast: 0 saturation: 0 hue: 0
ToArray()	None	Convert PIL Image to numpy array	ToArray: {}
CropResize(x, y, width, height, size, interpolation)	x (int):Left boundary of the cropping area y (int):Top boundary of the cropping area width (int):Width of the cropping area height (int):Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Crop the input image with given location and resize it	CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear
Cast(dtype)	dtype (str, default ='float32') :The target data type	Convert image to given dtype	Cast: dtype: float32
AlignImageChannel(dim)	dim (int): The channel number of result image	Align image channel, now just support [H,W,4]->[H,W,3] and [H,W,3]->[H,W], input image must be PIL Image. This transform is going to be deprecated.	AlignImageChannel: dim: 3
ResizeWithRatio(min_dim, max_dim, padding)	min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim	Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array.	ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True

MXNet

Transform	Parameters	Comments	Usage(In yaml file)
Resize(size, interpolation)	size (list or int): Size of the result interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Resize the input image to the given size	Resize: size: 256 interpolation: bilinear
CenterCrop(size)	size (list or int): Size of the result	Crops the given image at the center to the given size	CenterCrop: size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation)	size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Crop the given image to random size and aspect ratio	RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear
Normalize(mean, std)	mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape	Normalize a image with mean and standard deviation	Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0]
RandomCrop(size)	size (list or int): Size of the result	Crop the image at a random location to the given size	RandomCrop: size: [10, 10] # size: 10
Compose(transform_list)	transform_list (list of Transform objects): list of transforms to compose	Composes several transforms together	If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (args) normalize = preprocess["Normalize"] (args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label
CropResize(x, y, width, height, size, interpolation)	x (int):Left boundary of the cropping area y (int):Top boundary of the cropping area width (int):Width of the cropping area height (int):Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Crop the input image with given location and resize it	CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear
RandomHorizontalFlip()	None	Horizontally flip the given image randomly	RandomHorizontalFlip: {}
RandomVerticalFlip()	None	Vertically flip the given image randomly	RandomVerticalFlip: {}
CropToBoundingBox(offset_height, offset_width, target_height, target_width)	offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result	Crops an image to a specified bounding box	CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224
ToArray()	None	Convert NDArray to numpy array	ToArray: {}
ToTensor()	None	Converts an image NDArray or batch of image NDArray to a tensor NDArray	ToTensor: {}
Cast(dtype)	dtype (str, default ='float32') :The target data type	Convert image to given dtype	Cast: dtype: float32
Transpose(perm)	perm (list): A permutation of the dimensions of input image	Transpose image according perm	Transpose: perm: [1, 2, 0]
AlignImageChannel(dim)	dim (int): The channel number of result image	Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated.	AlignImageChannel: dim: 3
ToNDArray()	None	Convert np.array to NDArray	ToNDArray: {}
ResizeWithRatio(min_dim, max_dim, padding)	min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim	Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array.	ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True

ONNXRT

Type	Parameters	Comments	Usage(In yaml file)
Resize(size, interpolation)	size (list or int): Size of the result interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'	Resize the input image to the given size	Resize: size: 256 interpolation: bilinear
CenterCrop(size)	size (list or int): Size of the result	Crops the given image at the center to the given size	CenterCrop: size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation)	size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest'	Crop the given image to random size and aspect ratio	RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear
Normalize(mean, std)	mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape	Normalize a image with mean and standard deviation	Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0]
RandomCrop(size)	size (list or int): Size of the result	Crop the image at a random location to the given size	RandomCrop: size: [10, 10] # size: 10
Compose(transform_list)	transform_list (list of Transform objects): list of transforms to compose	Composes several transforms together	If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (args) normalize = preprocess["Normalize"] (args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label
CropResize(x, y, width, height, size, interpolation)	x (int):Left boundary of the cropping area y (int):Top boundary of the cropping area width (int):Width of the cropping area height (int):Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest'	Crop the input image with given location and resize it	CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear
RandomHorizontalFlip()	None	Horizontally flip the given image randomly	RandomHorizontalFlip: {}
RandomVerticalFlip()	None	Vertically flip the given image randomly	RandomVerticalFlip: {}
CropToBoundingBox(offset_height, offset_width, target_height, target_width)	offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result	Crops an image to a specified bounding box	CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224
ToArray()	None	Convert PIL Image to numpy array	ToArray: {}
Rescale()	None	Scale the values of image to [0,1]	Rescale: {}
AlignImageChannel(dim)	dim (int): The channel number of result image	Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated.	AlignImageChannel: dim: 3
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale)	height (int): Height of the result width (int): Width of the result random_crop (bool, default=False): whether to random crop resize_side (int, default=256):desired shape after resize operation random_flip_left_right (bool, default=False): whether to random flip left and right mean_value (list, default=[0.0,0.0,0.0]):mean for each channel scale (float, default=1.0):std value	Combination of a series of transforms which is applicable to images in Imagenet	ResizeCropImagenet: height: 224 width: 224 random_crop: False resize_side: 256 random_flip_left_right: False mean_value: [123.68, 116.78, 103.94] scale: 0.017
Cast(dtype)	dtype (str, default ='float32') :The target data type	Convert image to given dtype	Cast: dtype: float32
ResizeWithRatio(min_dim, max_dim, padding)	min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim	Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array.	ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform.md

transform.md

Transform

Transform support list

TensorFlow

Pytorch

MXNet

ONNXRT

Files

transform.md

Latest commit

History

transform.md

File metadata and controls

Transform

Transform support list

TensorFlow

Pytorch

MXNet

ONNXRT