Device Debug Print
Note
Tools are only fully supported on source builds.
Overview
DEVICE_PRINT is an experimental feature that is meant to replace DPRINT.
For more info about DPRINT, see the kernel_print tool documentation.
Enabling
To enable DEVICE_PRINT you need to first enable DPRINT. Then, you should enable feature switch that will allow usage of DEVICE_PRINT.
export TT_METAL_DEVICE_PRINT=1 # required, use new DEVICE_PRINT system instead of legacy DPRINT.
To generate device debug prints, include the api/debug/dprint.h header and use the APIs defined there.
An example with the different features available is shown below:
#include "api/debug/dprint.h" // required in all kernels using DPRINT
void kernel_main() {
// Direct printing is supported for const char*/char/uint32_t/float
DEVICE_PRINT("Test string {} {} {}\n", 'a', 5, 0.123456f);
// BF16 type printing is supported via provided type
bf16_t my_bf16_val(0x3dfb); // Equivalent to 0.122559
DEVICE_PRINT("BF16 value: {}\n", my_bf16_val);
// DEVICE_PRINT supports formatting options that are supported by fmtlib:
DEVICE_PRINT("{:.5f}\n", 0.123456f);
DEVICE_PRINT("{:>10}\n", 123); // right align in a field of width 10
DEVICE_PRINT("{:<10}\n", 123); // left align in a field of width 10
DEVICE_PRINT("{0:x} {0} {0:o} {0:b}\n", 15); // single argument print in hexadecimal, decimal, octal, and binary
// The following prints only occur on a particular RISCV core:
DEVICE_PRINT_MATH("this is the math kernel\n");
DEVICE_PRINT_PACK("this is the pack kernel\n");
DEVICE_PRINT_UNPACK("this is the unpack kernel\n");
DEVICE_PRINT_DATA0("this is the data movement kernel on noc 0\n");
DEVICE_PRINT_DATA1("this is the data movement kernel on noc 1\n");
}
Data from Circular Buffers can be printed using the TileSlice object. It can be constructed as described below, and fed directly to a DEVICE_PRINT call.
Argument |
Type |
Description |
|---|---|---|
cb_id |
uint8_t |
Id of the Circular Buffer to print data from. |
tile_idx |
int |
Index of tile inside the CB to print data from. |
slice_range |
SliceRange |
A struct to describe starting index, ending index, and stride for data to print within the CB. Fields are |
cb_type |
dprint_tslice_cb_t |
Only used for Data Movement RISCs, specify |
ptr_type |
dprint_tslice_ptr_t |
Only used for Data Movement RISCs, specify |
endl_rows |
bool |
Whether to add a newline between printed rows, default |
print_untilized |
bool |
Whether to untilize the CB data while printing it (always done for block float formats), default |
An example of how to print data from a CB (in this case, CBIndex::c_25) is shown below. Note that sampling happens relative
to the current CB read or write pointer. This means that for printing a tile read from the front of the CB, the
DEVICE_PRINT call has to occur between the cb_wait_front and cb_pop_front calls. For printing a tile from the
back of the CB, the DEVICE_PRINT call has to occur between the cb_reserve_back and cb_push_back calls. Currently supported data
formats for printing from CBs are DataFormat::Float32, DataFormat::Float16_b, DataFormat::Bfp8_b, DataFormat::Bfp4_b,
DataFormat::Int8, DataFormat::UInt8, DataFormat::UInt16, DataFormat::Int32, and DataFormat::UInt32.
#include "api/debug/device_print.h" // required in all kernels using DEVICE_PRINT
void kernel_main() {
// Assuming the tile we want to print from CBIndex::c_25 is from the front the CB, print must happen after
// this call. If the tile is from the back of the CB, then print must happen after cb_reserve_back().
cb_wait_front(CBIndex::c_25, 1);
...
// Extract a numpy slice `[0:32:16, 0:32:16]` from tile `0` from `CBIndex::c_25` and print it.
DEVICE_PRINT("{}\n", TSLICE(CBIndex::c_25, 0, SliceRange::hw0_32_16()));
// Note that since the MATH core does not have access to CBs, so this is an invalid print:
DEVICE_PRINT_MATH("{}\n", TSLICE(CBIndex::c_25, 0, SliceRange::hw0_32_16())); // Invalid
// Print a full tile
for (int32_t r = 0; r < 32; ++r) {
SliceRange sr = SliceRange{.h0 = r, .h1 = r+1, .hs = 1, .w0 = 0, .w1 = 32, .ws = 1};
// On data movement RISCs, tiles can be printed from either the CB read or write pointers. Also need to specify whether
// the CB is input or output.
DEVICE_PRINT_DATA0("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, TSLICE_INPUT_CB, TSLICE_RD_PTR, true, false));
DEVICE_PRINT_DATA1("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, TSLICE_OUTPUT_CB, TSLICE_WR_PTR, true, false));
// Unpacker RISC only has rd_ptr and only input CBs, so no extra args
DEVICE_PRINT_UNPACK("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, true, false));
// Packer RISC only has wr_ptr
DEVICE_PRINT_PACK("{} --READ--cin1-- {}\n", (uint)r, TileSlice(0, 0, sr, true, false));
}
...
cb_pop_front(CBIndex::c_25, 1);
}
Note
The DEVICE_PRINT buffer for a RISC is only flushed when new line character \n is read, or the device that the RISC belongs to is closed.