
During seven months on tenstorrent/tt-mlir, Dusan Milinkovic engineered a robust CPU-hoisting and lowering pipeline, enabling TTIR workloads to execute efficiently on host CPUs. He refactored the EmitPy and TTIRToLinalg paths, introducing direct TTIR-to-Python conversion and templated reduction patterns to streamline performance and maintainability. Leveraging C++, MLIR, and Python, Dusan expanded support for complex tensor operations, improved memory management, and enhanced test coverage, addressing stability and reliability in model execution. His work reduced memory footprints, simplified host execution, and enabled multi-device support, reflecting deep expertise in compiler design, low-level optimization, and collaborative codebase evolution within a fast-moving environment.
April 2026 performance summary for tenstorrent/tt-mlir: Delivered a major refactor and performance improvements in the CPU-hoisted EmitPy pipeline and reduced Python emission path complexity, plus a targeted performance and maintainability overhaul of TTIR-to-Linalg reductions. These changes simplify host CPU execution and improve test stability, reducing downstream maintenance and enabling faster iterations on CPU-bound workloads. Key outcomes include direct TTIR-to-Python conversion via TTIRToEmitPyCPU, removal of the TTNN golden-path plumbing, and a new, templated reduction pattern that consolidates common ops and mitigates previously long-running reductions (notably CumSum).
April 2026 performance summary for tenstorrent/tt-mlir: Delivered a major refactor and performance improvements in the CPU-hoisted EmitPy pipeline and reduced Python emission path complexity, plus a targeted performance and maintainability overhaul of TTIR-to-Linalg reductions. These changes simplify host CPU execution and improve test stability, reducing downstream maintenance and enabling faster iterations on CPU-bound workloads. Key outcomes include direct TTIR-to-Python conversion via TTIRToEmitPyCPU, removal of the TTNN golden-path plumbing, and a new, templated reduction pattern that consolidates common ops and mitigates previously long-running reductions (notably CumSum).
March 2026 (2026-03) focused on stability, memory efficiency, and performance improvements across TT-MLIR and TT-Forge-FE, with a strong emphasis on CPU-hoisted workloads and TTIR->Linalg lowering. Key features delivered and major fixes: - Enabled CPU-hoisted constant evaluation by default and added safe fallbacks to skip const-eval when lowering to Linalg is impossible, improving reliability in corner cases. - Fixed a memory-safety issue (double-free) in CPU-hoisted outputs by enabling TTNN tensor reuse during unpacking, preventing crashes in models reusing the same buffers. - Major TTIR->Linalg pooling improvements: rework of MaxPool2d/AvgPool2d to support dilation, ceil-mode, and flattened inputs; cleanup of reshape/pad flows and better handling of flattened compat info attributes. - Added integer support for CPU-hoisted ArgMax and Mean, including tests, to broaden the viability of CPU-hoisted reductions. - Introduced a const-eval pass before optimizer passes to stabilize layout decisions and reduce memory usage, complemented by a boolean-narrowing pass for CPU-hoisted ops to shrink boolean tensors. - Pipeline and codebase improvements: moved the CSE pass and hardened TTIR empty semantics by removing the Pure trait to avoid unintended merging; refactored support for CPU-hoisted eltwise ops (unary/binary) and enhanced test coverage. - CPU-hoisted ops improvements in test coverage and performance: implicit broadcasting and related streamlining for binary ops; improvements to implicit broadcasting in WhereOp and related test coverage. Business value and impact: - Increased stability and reliability of the CPU-hoisted path, reducing runtime crashes and assertion failures across TT-XLA models. - Reduced memory footprint and improved data locality for CPU-hoisted computations, contributing to better throughput and lower end-to-end latency in model workflows. - Broader test coverage and more maintainable code paths, enabling faster future changes with safer defaults. Technologies/skills demonstrated: - MLIR/TTIR, TTNN, Linalg, and layout transformation pipelines; robust debugging and crash analysis; strengthening test automation (lit tests, golden tests) and CI validation.
March 2026 (2026-03) focused on stability, memory efficiency, and performance improvements across TT-MLIR and TT-Forge-FE, with a strong emphasis on CPU-hoisted workloads and TTIR->Linalg lowering. Key features delivered and major fixes: - Enabled CPU-hoisted constant evaluation by default and added safe fallbacks to skip const-eval when lowering to Linalg is impossible, improving reliability in corner cases. - Fixed a memory-safety issue (double-free) in CPU-hoisted outputs by enabling TTNN tensor reuse during unpacking, preventing crashes in models reusing the same buffers. - Major TTIR->Linalg pooling improvements: rework of MaxPool2d/AvgPool2d to support dilation, ceil-mode, and flattened inputs; cleanup of reshape/pad flows and better handling of flattened compat info attributes. - Added integer support for CPU-hoisted ArgMax and Mean, including tests, to broaden the viability of CPU-hoisted reductions. - Introduced a const-eval pass before optimizer passes to stabilize layout decisions and reduce memory usage, complemented by a boolean-narrowing pass for CPU-hoisted ops to shrink boolean tensors. - Pipeline and codebase improvements: moved the CSE pass and hardened TTIR empty semantics by removing the Pure trait to avoid unintended merging; refactored support for CPU-hoisted eltwise ops (unary/binary) and enhanced test coverage. - CPU-hoisted ops improvements in test coverage and performance: implicit broadcasting and related streamlining for binary ops; improvements to implicit broadcasting in WhereOp and related test coverage. Business value and impact: - Increased stability and reliability of the CPU-hoisted path, reducing runtime crashes and assertion failures across TT-XLA models. - Reduced memory footprint and improved data locality for CPU-hoisted computations, contributing to better throughput and lower end-to-end latency in model workflows. - Broader test coverage and more maintainable code paths, enabling faster future changes with safer defaults. Technologies/skills demonstrated: - MLIR/TTIR, TTNN, Linalg, and layout transformation pipelines; robust debugging and crash analysis; strengthening test automation (lit tests, golden tests) and CI validation.
February 2026 summary focused on delivering performance, stability, and developer productivity across TT-XLA and TT-MLIR. We completed significant CPU-hoisting enhancements for constant evaluation, modernized the hoisting pipeline, expanded TTIR->Linalg pattern coverage, and introduced memory-optimized transformations. These efforts improved model execution reliability, reduced intermediate memory footprints, and streamlined build and CI workflows.
February 2026 summary focused on delivering performance, stability, and developer productivity across TT-XLA and TT-MLIR. We completed significant CPU-hoisting enhancements for constant evaluation, modernized the hoisting pipeline, expanded TTIR->Linalg pattern coverage, and introduced memory-optimized transformations. These efforts improved model execution reliability, reduced intermediate memory footprints, and streamlined build and CI workflows.
Month: 2026-01. This sprint focused on delivering high-value improvements to host-based execution paths, expanding compiler lowering capabilities, and tightening pipeline safety. The work enables broader hardware utilization, improved model throughput, and more predictable optimizations, with strong test coverage and measurable performance/reliability gains.
Month: 2026-01. This sprint focused on delivering high-value improvements to host-based execution paths, expanding compiler lowering capabilities, and tightening pipeline safety. The work enables broader hardware utilization, improved model throughput, and more predictable optimizations, with strong test coverage and measurable performance/reliability gains.
December 2025 focused on delivering a robust CPU-hoisted function ecosystem in TTIR/TTNN for tenstorrent/tt-mlir. The work enables const-eval subgraphs to run on CPU with Destination Passing Style (DPS) and supports hoisting multiple operations into a single CPU-hoisted function. The feature is toggleable via enable-cpu-hoisted-const-eval in the backend pipeline, and includes memory and return-value enhancements plus cross-target pipeline support and clearer naming across targets. In addition, the team improved memory handling for const-eval inputs, reduced complexity by enabling CPU-hoisted return values, and restructured TTNN pipelines to support CPU hoisting across targets. Flaky TTIR builder tests were stabilized by skipping problematic tests to improve CI reliability. Overall, these changes increase performance, memory efficiency, pipeline modularity, and test reliability, while laying the groundwork for broader CPU-based optimizations across TTNN targets.
December 2025 focused on delivering a robust CPU-hoisted function ecosystem in TTIR/TTNN for tenstorrent/tt-mlir. The work enables const-eval subgraphs to run on CPU with Destination Passing Style (DPS) and supports hoisting multiple operations into a single CPU-hoisted function. The feature is toggleable via enable-cpu-hoisted-const-eval in the backend pipeline, and includes memory and return-value enhancements plus cross-target pipeline support and clearer naming across targets. In addition, the team improved memory handling for const-eval inputs, reduced complexity by enabling CPU-hoisted return values, and restructured TTNN pipelines to support CPU hoisting across targets. Flaky TTIR builder tests were stabilized by skipping problematic tests to improve CI reliability. Overall, these changes increase performance, memory efficiency, pipeline modularity, and test reliability, while laying the groundwork for broader CPU-based optimizations across TTNN targets.
November 2025: Delivered CPU-hoisting for stablehlo.dynamic_update_slice in tt-mlir, broadened integer-type support, and generalized the hoist analysis/transform framework. Implemented targeted fixes for TTIR lowering and CPU module behavior to improve stability and CPU execution coverage. This work expands CPU offload opportunities, enhances TTIR compatibility, and delivers measurable performance and reliability gains.
November 2025: Delivered CPU-hoisting for stablehlo.dynamic_update_slice in tt-mlir, broadened integer-type support, and generalized the hoist analysis/transform framework. Implemented targeted fixes for TTIR lowering and CPU module behavior to improve stability and CPU execution coverage. This work expands CPU offload opportunities, enhances TTIR compatibility, and delivers measurable performance and reliability gains.
Month 2025-10 focused on delivering CPU-ready TTIR features and stabilizing CPU-hoist paths for tt-mlir, enabling translation to CPU binaries and more robust performance. Key outcomes include introducing an affine lowering pass in TTIRToCPUPipeline to translate TTIR to CPU-friendly dialects and updating tests to include the missing device parameter; implementing NonContiguousMemrefCopyToLinalg to lower memref.copy for ttir.conv2d and ensuring tensor.extract_slice results are copied into the output buffer to support CPU-hoistability; overall improvements in CPU translation readiness and test coverage.
Month 2025-10 focused on delivering CPU-ready TTIR features and stabilizing CPU-hoist paths for tt-mlir, enabling translation to CPU binaries and more robust performance. Key outcomes include introducing an affine lowering pass in TTIRToCPUPipeline to translate TTIR to CPU-friendly dialects and updating tests to include the missing device parameter; implementing NonContiguousMemrefCopyToLinalg to lower memref.copy for ttir.conv2d and ensuring tensor.extract_slice results are copied into the output buffer to support CPU-hoistability; overall improvements in CPU translation readiness and test coverage.

Overview of all repositories you've contributed to across your timeline