ttnn.multiply_
- ttnn.multiply_(input_tensor_a: ttnn.Tensor, input_tensor_b: ttnn.Tensor, *, fast_and_approximate_mode: bool = False, sub_core_grids: ttnn.CoreRangeSet = None) ttnn.Tensor
-
Performs in-place multiplication operation on
input_aandinput_band returns the tensor with the same layout asinput_tensor\[\verb|multiply|(\mathrm{{input\_tensor\_a,input\_tensor\_b}})\]- Parameters:
-
input_tensor_a (ttnn.Tensor) – the input tensor.
input_tensor_b (ttnn.Tensor) – the input tensor.
- Keyword Arguments:
-
fast_and_approximate_mode (bool, optional) – Use the fast and approximate mode. Defaults to False.
sub_core_grids (ttnn.CoreRangeSet, optional) – sub core grids for the operation. Defaults to None.
- Returns:
-
ttnn.Tensor – the output tensor.
Binary elementwise operations, C=op(A,B), support input tensors A and B in row major and tile layout, in interleaved or sharded format (height, width or block sharded), in DRAM or L1. A and B are completely independent, and can have different tensor specs.
Broadcast of A and B operands is supported up to dimension 5 (DNCHW). Any dimensions of size 1 in either A or B will be expanded to match the other input, and data will be duplicated along that dimension. For example, if the shape of A is [2,1,1,32] and B is [1,16,8,1], the output shape will be [2,16,8,32]. The size of dimensions higher than 5 must match between A and B.
The output C also supports row major and tile layout, interleaved or sharded format (height, width or block sharded), in DRAM or L1. The tensor spec of C is independent of A and B, and can be explicitly set using the optional output tensor input; if not provided, the operation will attempt a best decision at an appropriate tensor spec. The dimensions of C, or equivalently the optional output tensor, must match the broadcast-matched size of A and B.
Performance considerations: Elementwise operations operate natively in tile format, tiled tensors are preferred as an input, and row-major tensors are tilized and untilized during the operation. L1 sharded layout is preferred, with no broadcast and matching tensor specs for A, B and C.
Note
Supported dtypes and layouts:
Dtypes
Layouts
BFLOAT16, FLOAT32, UINT16
TILE, ROW_MAJOR
If the input tensor is ROW_MAJOR layout, it will be internally converted to TILE layout.
When
fast_and_approximate_modeis True for bfloat16 datatype, the operation uses FPU implementation for better performance. Whenfast_and_approximate_modeis False for bfloat16 datatype, the operation uses SFPU with the result rounded to nearest even (RNE). The operation is not supported for INT32 inputs since the outputs are returned as FLOAT32.Example
# Create two tensors for inplace multiplication tensor1 = ttnn.from_torch( torch.tensor([[2, 2], [2, 2]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device ) tensor2 = ttnn.from_torch( torch.tensor([[1, 1], [1, 1]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device ) # Perform inplace operation ttnn.multiply_(tensor1, tensor2) logger.info("Inplace multiplication completed")