
During his six-month tenure, Petar Milenkovic enhanced the tenstorrent/tt-llk and tt-metal repositories by developing robust low-level kernel features and improving data handling for embedded systems. He implemented int32 subtraction and 32-bit integer support, enabling more flexible arithmetic operations and direct register unpacking, which reduced data loss and improved correctness. Using C++ and Python, he refactored tensor tiling and packing logic to support arbitrary input sizes, introduced static assertions for error detection, and authored comprehensive documentation to streamline onboarding. Petar also strengthened kernel reliability in tt-metal by adding targeted unit tests and debugging tools for max pooling operations, ensuring maintainable code.

September 2025 (tt-metal): Focused on strengthening max pooling kernel reliability through testing and debugging enhancements. Delivered a new max pooling test and a debug-environment setup to improve diagnosis, reproducibility, and iteration speed. This work establishes the groundwork for upcoming performance optimizations and regression safety in the kernel.
September 2025 (tt-metal): Focused on strengthening max pooling kernel reliability through testing and debugging enhancements. Delivered a new max pooling test and a debug-environment setup to improve diagnosis, reproducibility, and iteration speed. This work establishes the groundwork for upcoming performance optimizations and regression safety in the kernel.
June 2025 monthly summary for tenstorrent/tt-llk. Delivered foundational documentation and robust input handling improvements that enhance developer onboarding, product reliability, and data processing throughput.
June 2025 monthly summary for tenstorrent/tt-llk. Delivered foundational documentation and robust input handling improvements that enhance developer onboarding, product reliability, and data processing throughput.
In May 2025, the focus was on stability and correctness of tensor tiling processing for the tt-llk repository. The primary deliverable was a targeted bug fix to pack_untilize that enables handling of input tensors of any size, along with the introduction of a new addressing mode to correctly process rows without unnecessary clearing of the y-counter. The work improves reliability for variable input shapes and lays groundwork for future performance and feature improvements.
In May 2025, the focus was on stability and correctness of tensor tiling processing for the tt-llk repository. The primary deliverable was a targeted bug fix to pack_untilize that enables handling of input tensors of any size, along with the introduction of a new addressing mode to correctly process rows without unnecessary clearing of the y-counter. The work improves reliability for variable input shapes and lays groundwork for future performance and feature improvements.
April 2025 performance summary for tenstorrent/tt-llk focusing on feature delivery and code quality improvements. Delivered 32-bit integer support in the Low-Level Kernel (LLK) for Wormhole (WH) and Blackhole (BH) architectures, enabling Int32 and UInt32 inputs with direct unpacking into the destination register, bypassing Source A/Source B limitations and reducing data loss risk.
April 2025 performance summary for tenstorrent/tt-llk focusing on feature delivery and code quality improvements. Delivered 32-bit integer support in the Low-Level Kernel (LLK) for Wormhole (WH) and Blackhole (BH) architectures, enabling Int32 and UInt32 inputs with direct unpacking into the destination register, bypassing Source A/Source B limitations and reducing data loss risk.
March 2025: Delivered BH board narrow row data support in LLK by modifying packing/unpacking to accept a narrow_row parameter, enabling a single packer interface for data arriving in narrow row format (Faces 0 and 2; skip Faces 1 and 3). No major bugs reported. This work improves data path flexibility and reduces special-case handling, paving the way for broader data-format support.
March 2025: Delivered BH board narrow row data support in LLK by modifying packing/unpacking to accept a narrow_row parameter, enabling a single packer interface for data arriving in narrow row format (Faces 0 and 2; skip Faces 1 and 3). No major bugs reported. This work improves data path flexibility and reduces special-case handling, paving the way for broader data-format support.
February 2025: Delivered essential int32 subtraction support in the SFPU kernel across two repositories (tt-llk-wh-b0 and tt-llk-bh). Implementations include a new int32 subtraction header and core logic with cross-format data handling and hardware considerations, enabling broader arithmetic workloads and more consistent results across data formats.
February 2025: Delivered essential int32 subtraction support in the SFPU kernel across two repositories (tt-llk-wh-b0 and tt-llk-bh). Implementations include a new int32 subtraction header and core logic with cross-format data handling and hardware considerations, enabling broader arithmetic workloads and more consistent results across data formats.
Overview of all repositories you've contributed to across your timeline