Device Debug Print

Note

Tools are only fully supported on source builds.

Overview

DEVICE_PRINT is an experimental feature that is meant to replace DPRINT. For more info about DPRINT, see the kernel_print tool documentation.

Enabling

To enable DEVICE_PRINT you need to first enable DPRINT. Then, you should enable feature switch that will allow usage of DEVICE_PRINT.

export TT_METAL_DEVICE_PRINT=1                    # required, use new DEVICE_PRINT system instead of legacy DPRINT.

To generate device debug prints, include the api/debug/dprint.h header and use the APIs defined there. An example with the different features available is shown below:

#include "api/debug/dprint.h"  // required in all kernels using DPRINT

void kernel_main() {
    // Direct printing is supported for const char*/char/uint32_t/float
    DEVICE_PRINT("Test string {} {} {}\n", 'a', 5, 0.123456f);
    // BF16 type printing is supported via provided type
    bf16_t my_bf16_val(0x3dfb); // Equivalent to 0.122559
    DEVICE_PRINT("BF16 value: {}\n", my_bf16_val);

    // DEVICE_PRINT supports formatting options that are supported by fmtlib:
    DEVICE_PRINT("{:.5f}\n", 0.123456f);
    DEVICE_PRINT("{:>10}\n", 123); // right align in a field of width 10
    DEVICE_PRINT("{:<10}\n", 123); // left align in a field of width 10
    DEVICE_PRINT("{0:x} {0} {0:o} {0:b}\n", 15); // single argument print in hexadecimal, decimal, octal, and binary

    // The following prints only occur on a particular RISCV core:
    DEVICE_PRINT_MATH("this is the math kernel\n");
    DEVICE_PRINT_PACK("this is the pack kernel\n");
    DEVICE_PRINT_UNPACK("this is the unpack kernel\n");
    DEVICE_PRINT_DATA0("this is the data movement kernel on noc 0\n");
    DEVICE_PRINT_DATA1("this is the data movement kernel on noc 1\n");
}

Data from Circular Buffers can be printed using the TileSlice object. It can be constructed as described below, and fed directly to a DEVICE_PRINT call.

Argument

Type

Description

cb_id

uint8_t

Id of the Circular Buffer to print data from.

tile_idx

int

Index of tile inside the CB to print data from.

slice_range

SliceRange

A struct to describe starting index, ending index, and stride for data to print within the CB. Fields are h0, h1, hs, w0, w1, ws, all uint8_t.

cb_type

dprint_tslice_cb_t

Only used for Data Movement RISCs, specify TSLICE_INPUT_CB or TSLICE_OUTPUT_CB depending on if the CB to print from is input or output.

ptr_type

dprint_tslice_ptr_t

Only used for Data Movement RISCs, specify TSLICE_RD_PTR to read from the front of the CB, or TSLICE_WR_PTR to read from the back of the CB. UNPACK RISC only reads from the front of the CB, PACK RISC only reads from the back of the CB.

endl_rows

bool

Whether to add a newline between printed rows, default true.

print_untilized

bool

Whether to untilize the CB data while printing it (always done for block float formats), default true.

An example of how to print data from a CB (in this case, CBIndex::c_25) is shown below. Note that sampling happens relative to the current CB read or write pointer. This means that for printing a tile read from the front of the CB, the DEVICE_PRINT call has to occur between the cb_wait_front and cb_pop_front calls. For printing a tile from the back of the CB, the DEVICE_PRINT call has to occur between the cb_reserve_back and cb_push_back calls. Currently supported data formats for printing from CBs are DataFormat::Float32, DataFormat::Float16_b, DataFormat::Bfp8_b, DataFormat::Bfp4_b, DataFormat::Int8, DataFormat::UInt8, DataFormat::UInt16, DataFormat::Int32, and DataFormat::UInt32.

#include "api/debug/device_print.h"  // required in all kernels using DEVICE_PRINT

void kernel_main() {
    // Assuming the tile we want to print from CBIndex::c_25 is from the front the CB, print must happen after
    // this call. If the tile is from the back of the CB, then print must happen after cb_reserve_back().
    cb_wait_front(CBIndex::c_25, 1);
    ...

    // Extract a numpy slice `[0:32:16, 0:32:16]` from tile `0` from `CBIndex::c_25` and print it.
    DEVICE_PRINT("{}\n", TSLICE(CBIndex::c_25, 0, SliceRange::hw0_32_16()));
    // Note that since the MATH core does not have access to CBs, so this is an invalid print:
    DEVICE_PRINT_MATH("{}\n", TSLICE(CBIndex::c_25, 0, SliceRange::hw0_32_16())); // Invalid

    // Print a full tile
    for (int32_t r = 0; r < 32; ++r) {
        SliceRange sr = SliceRange{.h0 = r, .h1 = r+1, .hs = 1, .w0 = 0, .w1 = 32, .ws = 1};
        // On data movement RISCs, tiles can be printed from either the CB read or write pointers. Also need to specify whether
        // the CB is input or output.
        DEVICE_PRINT_DATA0("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, TSLICE_INPUT_CB, TSLICE_RD_PTR, true, false));
        DEVICE_PRINT_DATA1("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, TSLICE_OUTPUT_CB, TSLICE_WR_PTR, true, false));
        // Unpacker RISC only has rd_ptr and only input CBs, so no extra args
        DEVICE_PRINT_UNPACK("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, true, false));
        // Packer RISC only has wr_ptr
        DEVICE_PRINT_PACK("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, true, false));
    }

    ...
    cb_pop_front(CBIndex::c_25, 1);
}

Note

The DEVICE_PRINT buffer for a RISC is only flushed when new line character \n is read, or the device that the RISC belongs to is closed.