ttnn.multiply_

ttnn.multiply_(input_tensor_a: ttnn.Tensor, input_tensor_b: ttnn.Tensor, *, fast_and_approximate_mode: bool = False, sub_core_grids: ttnn.CoreRangeSet = None) → ttnn.Tensor

Performs in-place multiplication operation on input_a and input_b and returns the tensor with the same layout as input_tensor

\[\verb|multiply|(\mathrm{{input\_tensor\_a,input\_tensor\_b}})\]

Parameters:

input_tensor_a (ttnn.Tensor) – the input tensor.
input_tensor_b (ttnn.Tensor) – the input tensor.

Keyword Arguments:

fast_and_approximate_mode (bool, optional) – Use the fast and approximate mode. Defaults to False.
sub_core_grids (ttnn.CoreRangeSet, optional) – sub core grids for the operation. Defaults to None.

Returns:

ttnn.Tensor – the output tensor.

Binary elementwise operations, C=op(A,B), support input tensors A and B in row major and tile layout, in interleaved or sharded format (height, width or block sharded), in DRAM or L1. A and B are completely independent, and can have different tensor specs.

Broadcast of A and B operands is supported up to dimension 5 (DNCHW). Any dimensions of size 1 in either A or B will be expanded to match the other input, and data will be duplicated along that dimension. For example, if the shape of A is [2,1,1,32] and B is [1,16,8,1], the output shape will be [2,16,8,32]. The size of dimensions higher than 5 must match between A and B.

The output C also supports row major and tile layout, interleaved or sharded format (height, width or block sharded), in DRAM or L1. The tensor spec of C is independent of A and B, and can be explicitly set using the optional output tensor input; if not provided, the operation will attempt a best decision at an appropriate tensor spec. The dimensions of C, or equivalently the optional output tensor, must match the broadcast-matched size of A and B.

Performance considerations: Elementwise operations operate natively in tile format, tiled tensors are preferred as an input, and row-major tensors are tilized and untilized during the operation. L1 sharded layout is preferred, with no broadcast and matching tensor specs for A, B and C.

Note

Supported dtypes and layouts:

Dtypes	Layouts
BFLOAT16, FLOAT32, UINT16	TILE, ROW_MAJOR

If the input tensor is ROW_MAJOR layout, it will be internally converted to TILE layout.

When fast_and_approximate_mode is True for bfloat16 datatype, the operation uses FPU implementation for better performance. When fast_and_approximate_mode is False for bfloat16 datatype, the operation uses SFPU with the result rounded to nearest even (RNE). The operation is not supported for INT32 inputs since the outputs are returned as FLOAT32.

Example

# Create two tensors for inplace multiplication
tensor1 = ttnn.from_torch(
    torch.tensor([[2, 2], [2, 2]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device
)
tensor2 = ttnn.from_torch(
    torch.tensor([[1, 1], [1, 1]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device
)

# Perform inplace operation
ttnn.multiply_(tensor1, tensor2)
logger.info("Inplace multiplication completed")