Quantization Operators¶
Quantization is a model optimization technique to reduce the size of a large model in order to achieve better storage performance with a small loss in accuracy.
CUDA Operators¶
-
at::Tensor _float_to_bfloat16_gpu(const at::Tensor &input)¶
Converts a tensor of
float
values into a tensor of Brain Floating Point (bfloat16
) values.- Parameters:
input – A tensor of
float
values- Returns:
A new tensor with values from the input tensor converted to
bfloat16
.
-
at::Tensor _bfloat16_to_float_gpu(const at::Tensor &input)¶
Converts a tensor of Brain Floating Point (
bfloat16
) values into a tensor offloat
values.- Parameters:
input – A tensor of
bfloat16
values- Returns:
A new tensor with values from the input tensor converted to
float
.
-
Tensor _float_to_FP8rowwise_gpu(const Tensor &input, const bool forward)¶
Converts a tensor of
float
values into a tensor offp8
values.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
- Throws:
c10::Error – if
input.dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
fp8
.
-
at::Tensor _FP8rowwise_to_float_gpu(const at::Tensor &input, bool forward, const int64_t output_dtype)¶
Converts a tensor of
fp8
values into a tensor offloat
values.- Parameters:
input – A tensor of
fp8
valuesforward –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
(withdtype
of eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).
-
Tensor _float_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
float
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
float
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
Tensor _half_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
at::Half
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
at::Half
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
Tensor _single_or_half_precision_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
at::Single
orat::Half
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
at::Single
orat::Half
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
at::Tensor _fused8bitrowwise_to_float_gpu(const at::Tensor &input)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
float
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _fused8bitrowwise_to_half_gpu(const at::Tensor &input)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
at::Half
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
- Returns:
A new tensor with values from the input tensor converted to
at::Half
.
-
at::Tensor _fused8bitrowwise_to_single_or_half_precision_gpu(const at::Tensor &input, const int64_t output_dtype, const bool scale_bias_last, const bool quant_padding_float_type)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
float
,at::Half
, orat::BFloat16
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
,at::Half
, orat::BFloat16
.
-
at::Tensor _fused8bitrowwise_to_float_mixed_dim_gpu(const at::Tensor &input, const at::Tensor &D_offsets, const int64_t output_dtype)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
at::kFloat
orat::kHalf
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
D_offsets –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
)- Returns:
A new tensor with values from the input tensor converted to
at::kFloat
orat::kHalf
.
-
Tensor _float_to_fusednbitrowwise_gpu(const Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
float
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
float
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
at::Tensor _half_to_fusednbitrowwise_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
at::Half
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
at::Half
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
Tensor _single_or_half_precision_to_fusednbitrowwise_gpu(const Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
float
orat::Half
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
float
orat::Half
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
at::Tensor _fusednbitrowwise_to_float_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
float
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _fusednbitrowwise_to_half_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
at::Half
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
- Returns:
A new tensor with values from the input tensor converted to
at::Half
.
-
at::Tensor _fusednbitrowwise_to_single_or_half_precision_gpu(const at::Tensor &input, const int64_t bit_rate, const int64_t output_dtype)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
float
orat::Half
orat::Bf16
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
orSparseType::FP16
orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
orat::Half
orat::Bf16
, depending onoutput_dtype
.
-
at::Tensor _float_to_hfp8_gpu(const at::Tensor &input, const int64_t ebits, const int64_t exponent_bias, const double max_pos)¶
Converts a tensor of
float
values into a tensor of Hybrid 8-bit Floating Point (hfp8
) values.- Parameters:
input – A tensor of
float
valuesebits –
exponent_bias –
max_pos –
- Throws:
c10::Error – if
ebits > 0
orexponent_bias > 0
.- Returns:
A new tensor with values from the input tensor converted to
hfp8
.
-
at::Tensor _hfp8_to_float_gpu(const at::Tensor &input, const int64_t ebits, const int64_t exponent_bias)¶
Converts a tensor of Hybrid 8-bit Floating Point (
hfp8
) values into a tensor offloat
values.- Parameters:
input – A tensor of
hfp8
valuesebits –
exponent_bias –
- Throws:
c10::Error – if
ebits > 0
orexponent_bias > 0
.- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _float_to_msfp_gpu(const at::Tensor &input, const int64_t bounding_box_size, const int64_t ebits, const int64_t mbits, const int64_t bias, const double min_pos, const double max_pos)¶
Converts a tensor of
float
values into a tensor of Microsoft Floating Point (msfp
) values.- Parameters:
input – A tensor of
float
valuesbounding_box_size –
ebits –
mbits –
bias –
min_pos –
max_pos –
- Returns:
A new tensor with values from the input tensor converted to
msfp
.
-
at::Tensor _msfp_to_float_gpu(const at::Tensor &input, const int64_t ebits, const int64_t mbits, const int64_t bias)¶
Converts a tensor of Microsoft Floating Point (
msfp
) values into a tensor offloat
values.- Parameters:
input – A tensor of
msfp
valuesebits –
mbits –
bias –
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
Tensor _float_to_paddedFP8rowwise_gpu(const Tensor &input, const bool forward, const int64_t row_dim)¶
Converts a tensor of
float
values into a tensor of paddedfp8
rowwise values.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
row_dim –
- Returns:
A new tensor with values from the input tensor converted to padded
fp8
rowwise.
-
at::Tensor _paddedFP8rowwise_to_float_gpu(const at::Tensor &input, const bool forward, const int64_t row_dim, const int64_t output_last_dim, const int64_t output_dtype)¶
Converts a tensor of padded
fp8
rowwise values into a tensor offloat values
.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
row_dim –
output_last_dim –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
,SparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
.
CPU Operators¶
-
Tensor &_fused8bitrowwise_to_float_cpu_out(Tensor &output, const Tensor &input)¶
-
Tensor &_float_to_fused8bitrowwise_cpu_out(Tensor &output, const Tensor &input)¶
-
Tensor float_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor half_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor float_or_half_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_float_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_half_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_float_or_half_cpu(const Tensor &input, const int64_t output_dtype, const bool scale_bias_last, const bool quant_padding_float_type)¶
-
Tensor float_to_FP8rowwise_cpu(const Tensor &input, bool forward)¶
-
Tensor FP8rowwise_to_float_cpu(const Tensor &input, bool forward, const int64_t output_dtype)¶
-
Tensor fusednbitrowwise_to_float_cpu(const Tensor &input, const int64_t bit_rate)¶
-
Tensor fusednbitrowwise_sbfront_to_float_cpu(const Tensor &input, const int64_t bit_rate)¶
Dequantize int4/int2 rows with scale and bias stored in the front into float32.
Dequantize int4/int2 rows with scale and bias stored in the front into float32. The input tensor should have torch.quint4x2 or torch.quint2x4 dtype and QuantizedCPU backend. This operator is only recommended for testing purpose because its kernel is reference implementation and not optimized.
- Parameters:
input – Tensor of int4/int2 rows with scale and bias stored in the front.
bit_rate – Bit rate of each element. Should be 4 or 2.
- Returns:
Tensor of float32, holding dequantized numbers.
-
Tensor fusednbitrowwise_to_half_cpu(const Tensor &input, const int64_t bit_rate)¶
-
Tensor fusednbitrowwise_to_float_or_half_cpu(const Tensor &input, const int64_t bit_rate, const int64_t output_dtype)¶
-
void FloatToFP8Quantized_ref(const float *const input, const size_t nrows, const size_t ncols, uint8_t *const output, const int ebits, const int exponent_bias, const double max_pos)¶
-
void FP8QuantizedToFloat_ref(const uint8_t *const input, const size_t nrows, const size_t ncols, float *const output, const int ebits, const int exponent_bias)¶