
Milan Vasiljevic developed and optimized machine learning compiler infrastructure across tenstorrent/tt-mlir, tt-forge-fe, and related repositories, focusing on deep learning model support, performance, and reliability. He engineered cross-framework integrations, such as PaddlePaddle and PyTorch interoperability, and implemented advanced MLIR-based fusion and sharding patterns to accelerate convolutional and transformer workloads. Using C++, Python, and MLIR, Milan delivered automated benchmarking, nightly validation for vision and OCR models, and robust regression testing. His work included targeted bug fixes, code refactoring, and enhancements to CI/CD pipelines, resulting in improved model throughput, reduced runtime risk, and maintainable, test-driven codebases for production deployments.
February 2026 monthly summary (2026-02) for tenstorrent/tt-forge and tenstorrent/tt-mlir. Focused on delivering business value through reliable benchmarking, targeted performance optimizations, and clear governance. Key outcomes include stronger regression detection for transformer benchmarks, stability improvements for perf runs, and safer, more maintainable pattern-matching and fusing components. Demonstrated proficiency across benchmarking automation, MLIR pattern matching, low-level IR optimizations, and governance practices for code ownership.
February 2026 monthly summary (2026-02) for tenstorrent/tt-forge and tenstorrent/tt-mlir. Focused on delivering business value through reliable benchmarking, targeted performance optimizations, and clear governance. Key outcomes include stronger regression detection for transformer benchmarks, stability improvements for perf runs, and safer, more maintainable pattern-matching and fusing components. Demonstrated proficiency across benchmarking automation, MLIR pattern matching, low-level IR optimizations, and governance practices for code ownership.
January 2026 monthly summary focusing on key accomplishments across two core repos (tenstorrent/tt-mlir and tenstorrent/tt-xla). Deliverables span a critical bug fix improving pattern matching reliability, enhanced performance regression testing, and improved artifact traceability for IR exports. The work robustly supports faster debugging, more reliable model patterns, and clearer test artifacts, driving business value through stability and measurable quality improvements.
January 2026 monthly summary focusing on key accomplishments across two core repos (tenstorrent/tt-mlir and tenstorrent/tt-xla). Deliverables span a critical bug fix improving pattern matching reliability, enhanced performance regression testing, and improved artifact traceability for IR exports. The work robustly supports faster debugging, more reliable model patterns, and clearer test artifacts, driving business value through stability and measurable quality improvements.
December 2025 performance highlights for tenstorrent/tt-mlir: delivered multiple TTIRFusing patterns to accelerate inference, improved numerical stability, and hardened the pipeline against edge-cases and upstream changes. Implemented robust type handling and sharding considerations, expanded test coverage, and aligned with upstream tt-metal fixes. These efforts increased model throughput, reduced crash risk, and broadened support for large-scale LMs and vision models.
December 2025 performance highlights for tenstorrent/tt-mlir: delivered multiple TTIRFusing patterns to accelerate inference, improved numerical stability, and hardened the pipeline against edge-cases and upstream changes. Implemented robust type handling and sharding considerations, expanded test coverage, and aligned with upstream tt-metal fixes. These efforts increased model throughput, reduced crash risk, and broadened support for large-scale LMs and vision models.
November 2025 monthly summary for tenstorrent/tt-mlir focused on delivering performance-oriented MLIR pattern optimizations and scheduler hardening. Major work consolidated tensor-op optimizations across CNNs and vision models, with a shift to MeanOp for mean reductions and always-on global pooling. Introduced commuting patterns for reduce-permute interactions and reshape-slice interactions to unlock deeper fusion opportunities in Yolov9 and EfficientNet. Strengthened the TTIRFusing flow with targeted improvements to reduce unnecessary reshapes and enable more aggressive fusion. Tightened scheduler correctness for multi-output ops and added regression tests to protect correctness in model components like SplitQueryKeyValueAndSplitHeadsOp. All changes are covered by tests, with clear ticket mappings in commit messages.
November 2025 monthly summary for tenstorrent/tt-mlir focused on delivering performance-oriented MLIR pattern optimizations and scheduler hardening. Major work consolidated tensor-op optimizations across CNNs and vision models, with a shift to MeanOp for mean reductions and always-on global pooling. Introduced commuting patterns for reduce-permute interactions and reshape-slice interactions to unlock deeper fusion opportunities in Yolov9 and EfficientNet. Strengthened the TTIRFusing flow with targeted improvements to reduce unnecessary reshapes and enable more aggressive fusion. Tightened scheduler correctness for multi-output ops and added regression tests to protect correctness in model components like SplitQueryKeyValueAndSplitHeadsOp. All changes are covered by tests, with clear ticket mappings in commit messages.
Month: 2025-10 — Focused on stabilizing performance, fixing benchmarking/configuration regressions, and tightening CI feedback loops across tt-xla and tt-forge. Delivered targeted fixes and configuration improvements that restore expected performance, improve benchmarking fidelity, and accelerate validation cycles.
Month: 2025-10 — Focused on stabilizing performance, fixing benchmarking/configuration regressions, and tightening CI feedback loops across tt-xla and tt-forge. Delivered targeted fixes and configuration improvements that restore expected performance, improve benchmarking fidelity, and accelerate validation cycles.
September 2025 monthly summary for tenstorrent/tt-mlir focusing on fusion and pooling patterns that improve performance, memory efficiency, and reliability of MLIR-based optimizations.
September 2025 monthly summary for tenstorrent/tt-mlir focusing on fusion and pooling patterns that improve performance, memory efficiency, and reliability of MLIR-based optimizations.
August 2025 (2025-08) — Delivered performance and correctness improvements in the MLIR-based Conv2D path within tenstorrent/tt-mlir. Focused on optimizer-level sharding robustness and fusion opportunities through BatchNorm decomposition and ConvolutionOp fusion patterns. Implemented layout-aware sharding validation, introduced output layout overrides, and expanded test coverage to validate interleaved layouts and fused paths. These changes reduce risk of incorrect sharding decisions, unlock more aggressive fusion opportunities for Conv2D, and lay the groundwork for improved runtime parallelism and throughput on representative workloads.
August 2025 (2025-08) — Delivered performance and correctness improvements in the MLIR-based Conv2D path within tenstorrent/tt-mlir. Focused on optimizer-level sharding robustness and fusion opportunities through BatchNorm decomposition and ConvolutionOp fusion patterns. Implemented layout-aware sharding validation, introduced output layout overrides, and expanded test coverage to validate interleaved layouts and fused paths. These changes reduce risk of incorrect sharding decisions, unlock more aggressive fusion opportunities for Conv2D, and lay the groundwork for improved runtime parallelism and throughput on representative workloads.
July 2025 monthly summary focusing on key technical developments and business impact for tenstorrent/tt-mlir. Delivered regression test coverage for Conv2D sharding to validate optimizer-driven memory layout changes in the TTNN backend, strengthening reliability for convolutional workloads and optimizer flags.
July 2025 monthly summary focusing on key technical developments and business impact for tenstorrent/tt-mlir. Delivered regression test coverage for Conv2D sharding to validate optimizer-driven memory layout changes in the TTNN backend, strengthening reliability for convolutional workloads and optimizer flags.
May 2025 summary for tenstorrent/tt-forge-fe focused on delivering automated validation capabilities for PaddleOCR and strengthening the forge test framework. Key deliverable: PaddleOCR Nightly Validation Tests added to the forge suite, covering detection, recognition, and end-to-end scenarios. Implemented necessary dependencies, image and character-set fetching utilities, and new test files to enable robust nightly validation. The work is traceable to commit 20747da70ce9058906c043c4f47c2056ab5617b9 (PR #1823). No major bugs fixed this month; emphasis was on feature delivery and test automation. This enhances validation reliability, reduces manual QA effort, and speeds feedback for PaddleOCR models.
May 2025 summary for tenstorrent/tt-forge-fe focused on delivering automated validation capabilities for PaddleOCR and strengthening the forge test framework. Key deliverable: PaddleOCR Nightly Validation Tests added to the forge suite, covering detection, recognition, and end-to-end scenarios. Implemented necessary dependencies, image and character-set fetching utilities, and new test files to enable robust nightly validation. The work is traceable to commit 20747da70ce9058906c043c4f47c2056ab5617b9 (PR #1823). No major bugs fixed this month; emphasis was on feature delivery and test automation. This enhances validation reliability, reduces manual QA effort, and speeds feedback for PaddleOCR models.
Month: 2025-04 — This month focused on expanding testing coverage for Paddle-based models and stabilizing dependencies to enable broader ML capabilities. Key features delivered include nightly testing infrastructure for Paddle Vision and PaddleNLP models, and dependency upgrades to streamline datasets, packaging, and Paddle libraries. No major bug fixes were needed this month as the work prioritized test infrastructure and maintainability.
Month: 2025-04 — This month focused on expanding testing coverage for Paddle-based models and stabilizing dependencies to enable broader ML capabilities. Key features delivered include nightly testing infrastructure for Paddle Vision and PaddleNLP models, and dependency upgrades to streamline datasets, packaging, and Paddle libraries. No major bug fixes were needed this month as the work prioritized test infrastructure and maintainability.
March 2025 highlights: Delivered cross-framework PaddlePaddle support by adapting the PyTorch compilation flow, enabling Paddle module wrapping, Paddle-specific parameter parsing and verification, and targeted testing. Also refactored PyTorch-Paddle tensor conversion utilities, improved type hints and naming, and extended tests to ensure correctness. Resolved a critical PaddlePaddle frontend bug in the tt-tvm converter related to padding index handling in lookup_table by ensuring proper weight conversion through NumPy array materialization and back to TVM NDArrays. These efforts enhanced interoperability, correctness, and reliability across two major repos with measurable business value.
March 2025 highlights: Delivered cross-framework PaddlePaddle support by adapting the PyTorch compilation flow, enabling Paddle module wrapping, Paddle-specific parameter parsing and verification, and targeted testing. Also refactored PyTorch-Paddle tensor conversion utilities, improved type hints and naming, and extended tests to ensure correctness. Resolved a critical PaddlePaddle frontend bug in the tt-tvm converter related to padding index handling in lookup_table by ensuring proper weight conversion through NumPy array materialization and back to TVM NDArrays. These efforts enhanced interoperability, correctness, and reliability across two major repos with measurable business value.

Overview of all repositories you've contributed to across your timeline