
Milan Stojkovic contributed to the tenstorrent/tt-forge-fe and tenstorrent/tt-xla repositories by building and optimizing core machine learning compiler infrastructure. He migrated key tensor operations and normalization routines from Python to C++ backends, improving execution determinism and performance. Milan enhanced MLIR generation, implemented advanced operator support, and streamlined attribute mapping to enable broader model compatibility and hardware-aware optimizations. His work included refactoring test suites for reliability, upgrading CI/CD pipelines, and integrating dynamic logging for better observability. Using C++, Python, and MLIR, Milan delivered robust, maintainable solutions that reduced technical debt and improved the reliability of model compilation and deployment workflows.
April 2026 (2026-04) performance summary: Focused on feature delivery and cross-repo improvements to strengthen attention pipelines and PyTorch compatibility. Key features delivered included Composite SDPA Support and Testing (Scaled Dot Product Attention) in TT-MLIR, with a new conversion pattern for scaled dot product attention. Additionally, the Torch-XLA dependency was upgraded in TT-XLA to a newer version, improving PyTorch compatibility and runtime performance. Bugs fixed: No separate bug fixes documented in this period; stability gains come from feature updates. Overall impact: Enhanced attention computation paths and reliability, smoother integration with PyTorch workflows, and faster iteration and model throughput. Technologies/skills demonstrated: SDPA and scaled dot product attention, MLIR-based integration, Torch-XLA integration, PyTorch ecosystem, test coverage, and cross-repo collaboration with traceable changes.
April 2026 (2026-04) performance summary: Focused on feature delivery and cross-repo improvements to strengthen attention pipelines and PyTorch compatibility. Key features delivered included Composite SDPA Support and Testing (Scaled Dot Product Attention) in TT-MLIR, with a new conversion pattern for scaled dot product attention. Additionally, the Torch-XLA dependency was upgraded in TT-XLA to a newer version, improving PyTorch compatibility and runtime performance. Bugs fixed: No separate bug fixes documented in this period; stability gains come from feature updates. Overall impact: Enhanced attention computation paths and reliability, smoother integration with PyTorch workflows, and faster iteration and model throughput. Technologies/skills demonstrated: SDPA and scaled dot product attention, MLIR-based integration, Torch-XLA integration, PyTorch ecosystem, test coverage, and cross-repo collaboration with traceable changes.
March 2026 — Key outcomes for tenstorrent/tt-xla include: (1) correctness and data integrity: PJRT_Buffer_ToHostBuffer size query fixed to return logical size; added logicalTensorSize() and updated copyToHost assertions; (2) performance and model support: RMS normalization fusion enhancements for GPT-OSS with additional patterns, tests, and docs; (3) reliability and coverage: enabled whole rms_norm pattern by simplifying pattern matching; (4) compatibility: torch-xla dependency uplift to a newer version to improve compatibility and stability across environments. Commits cited for traceability: e015c782a40040441542e82ec366187e0d50766a; 5f3929e09b4ecd1edf6aba3c9027abe72daabc0d; dede8c41fe323b8fdfaf9c712fd6e23a0c9eb35e; afe606d5fb6eb38bc09b36ec87a3d0abef8f54df.
March 2026 — Key outcomes for tenstorrent/tt-xla include: (1) correctness and data integrity: PJRT_Buffer_ToHostBuffer size query fixed to return logical size; added logicalTensorSize() and updated copyToHost assertions; (2) performance and model support: RMS normalization fusion enhancements for GPT-OSS with additional patterns, tests, and docs; (3) reliability and coverage: enabled whole rms_norm pattern by simplifying pattern matching; (4) compatibility: torch-xla dependency uplift to a newer version to improve compatibility and stability across environments. Commits cited for traceability: e015c782a40040441542e82ec366187e0d50766a; 5f3929e09b4ecd1edf6aba3c9027abe72daabc0d; dede8c41fe323b8fdfaf9c712fd6e23a0c9eb35e; afe606d5fb6eb38bc09b36ec87a3d0abef8f54df.
Worked on 2 features and fixed 1 bugs across 1 repositories.
Worked on 2 features and fixed 1 bugs across 1 repositories.
January 2026 monthly summary focusing on stability, performance, and observability for the tt-xla workstream. Key emphasis was on enhancing PyTorch FX fusion capabilities, hardening stability across multi-chip configurations, and improving logging and observability to support faster debugging and more actionable metrics for customers deploying large-language-model workloads.
January 2026 monthly summary focusing on stability, performance, and observability for the tt-xla workstream. Key emphasis was on enhancing PyTorch FX fusion capabilities, hardening stability across multi-chip configurations, and improving logging and observability to support faster debugging and more actionable metrics for customers deploying large-language-model workloads.
Month: 2025-12 — Summary: Focused on strengthening normalization workflows and fused-ops paths in tenstorrent/tt-xla, while stabilizing model test signals in CI. This period delivered foundational enhancements that improve model correctness, performance readiness, and developer productivity. Key features delivered: - Tensor normalization and composite operation enhancements: Added RMSNorm, enabled composite operations by default (with a debugging toggle), and added LayerNorm support in nn modules. Also improved tests for normalization and composite functionality to increase confidence in fused op paths. - Testing/instrumentation improvements related to composite ops: Refactored composite handling for nn modules, expanded test infra, and supported enabling/disabling composite ops via options. Included updates to tests to reflect valid PyTorch usage (e.g., LayerNorm scenarios). Major bugs fixed: - Improved test reliability for YOLO models by updating statuses from KNOWN_FAILURE_XFAIL to EXPECTED_PASSING for YOLOX and YOLOV9, leading to more stable CI signals and faster feedback loops. Overall impact and accomplishments: - Greater reliability of core normalization/composite pathways, enabling faster iteration and more robust deployment readiness. - Reduced debugging effort through clearer, more accurate test signals and enhanced test coverage for normalization/composite behavior. - Established groundwork for future performance gains via fused composite ops, with LayerNorm integration in nn modules. Technologies/skills demonstrated: - MLIR-based operator extensions (RMSNorm), composite op enablement, and nn.Module compatibility. - Test infrastructure improvements, including targeted test refactors and CI signal stabilization. - PyTorch ecosystem alignment (LayerNorm, normalization tests), code refactoring, and continuous integration discipline.
Month: 2025-12 — Summary: Focused on strengthening normalization workflows and fused-ops paths in tenstorrent/tt-xla, while stabilizing model test signals in CI. This period delivered foundational enhancements that improve model correctness, performance readiness, and developer productivity. Key features delivered: - Tensor normalization and composite operation enhancements: Added RMSNorm, enabled composite operations by default (with a debugging toggle), and added LayerNorm support in nn modules. Also improved tests for normalization and composite functionality to increase confidence in fused op paths. - Testing/instrumentation improvements related to composite ops: Refactored composite handling for nn modules, expanded test infra, and supported enabling/disabling composite ops via options. Included updates to tests to reflect valid PyTorch usage (e.g., LayerNorm scenarios). Major bugs fixed: - Improved test reliability for YOLO models by updating statuses from KNOWN_FAILURE_XFAIL to EXPECTED_PASSING for YOLOX and YOLOV9, leading to more stable CI signals and faster feedback loops. Overall impact and accomplishments: - Greater reliability of core normalization/composite pathways, enabling faster iteration and more robust deployment readiness. - Reduced debugging effort through clearer, more accurate test signals and enhanced test coverage for normalization/composite behavior. - Established groundwork for future performance gains via fused composite ops, with LayerNorm integration in nn modules. Technologies/skills demonstrated: - MLIR-based operator extensions (RMSNorm), composite op enablement, and nn.Module compatibility. - Test infrastructure improvements, including targeted test refactors and CI signal stabilization. - PyTorch ecosystem alignment (LayerNorm, normalization tests), code refactoring, and continuous integration discipline.
November 2025: Delivered measurable business value across tt-xla and tt-mlir by enhancing test visibility, CI reliability, and model optimization workflows. Highlights include dynamic BringupStatus logging in the test infrastructure and CI pipeline (removing static config dependencies and enabling logging via ENABLE_BRINGUP_STAGE_LOGGING=1), expanded testing coverage with restored NOT_STARTED handling and xfail model validation, and a critical Pytest crash fix when a result reason is not set. In tt-mlir, introduced a new RMSNormOp composite conversion pattern to improve translation to the target IR and model optimization workflows. These efforts reduce debugging time, increase deployment confidence, and demonstrate strong cross-repo collaboration and automation-driven quality improvements.
November 2025: Delivered measurable business value across tt-xla and tt-mlir by enhancing test visibility, CI reliability, and model optimization workflows. Highlights include dynamic BringupStatus logging in the test infrastructure and CI pipeline (removing static config dependencies and enabling logging via ENABLE_BRINGUP_STAGE_LOGGING=1), expanded testing coverage with restored NOT_STARTED handling and xfail model validation, and a critical Pytest crash fix when a result reason is not set. In tt-mlir, introduced a new RMSNormOp composite conversion pattern to improve translation to the target IR and model optimization workflows. These efforts reduce debugging time, increase deployment confidence, and demonstrate strong cross-repo collaboration and automation-driven quality improvements.
October 2025 monthly summary focusing on key accomplishments for TT projects. This month centered on enabling MLIR uplift in the tt-forge-fe repository by upgrading the Docker build environment to include the xxd utility, ensuring the required toolchain is available for MLIR-related builds. This change improves build reproducibility, accelerates MLIR integration, and sets the foundation for future optimizations and tooling improvements.
October 2025 monthly summary focusing on key accomplishments for TT projects. This month centered on enabling MLIR uplift in the tt-forge-fe repository by upgrading the Docker build environment to include the xxd utility, ensuring the required toolchain is available for MLIR-related builds. This change improves build reproducibility, accelerates MLIR integration, and sets the foundation for future optimizations and tooling improvements.
September 2025 performance summary focusing on artifact provenance, code cleanliness, and build reliability across tt-xla and tt-forge-fe. Delivered a fingerprint-based identification mechanism for compiled artifacts, streamlined codebase, and stabilized development tooling to support faster iterations and safer releases.
September 2025 performance summary focusing on artifact provenance, code cleanliness, and build reliability across tt-xla and tt-forge-fe. Delivered a fingerprint-based identification mechanism for compiled artifacts, streamlined codebase, and stabilized development tooling to support faster iterations and safer releases.
August 2025: Focused on performance-driven migration, correctness hardening, and test reliability across Forge repositories. Completed a substantial C++ migration of core operators to Tenstorrent Forge, improved backward pass correctness for ReduceAvg, and hardened tests for deterministic outcomes, establishing a stronger foundation for Forge-scale workloads and future feature delivery.
August 2025: Focused on performance-driven migration, correctness hardening, and test reliability across Forge repositories. Completed a substantial C++ migration of core operators to Tenstorrent Forge, improved backward pass correctness for ReduceAvg, and hardened tests for deterministic outcomes, establishing a stronger foundation for Forge-scale workloads and future feature delivery.
July 2025 monthly summary for tenstorrent/tt-forge-fe: Delivered a major shift of core tensor operations to the C++ backend, enabling more deterministic and faster execution, improved autograd integration with constant tensor creation, and increased CI reliability through cleanup efforts. The work lowers Python back-end latency, reduces framework overhead, and sets the stage for broader performance optimizations in subsequent releases.
July 2025 monthly summary for tenstorrent/tt-forge-fe: Delivered a major shift of core tensor operations to the C++ backend, enabling more deterministic and faster execution, improved autograd integration with constant tensor creation, and increased CI reliability through cleanup efforts. The work lowers Python back-end latency, reduces framework overhead, and sets the stage for broader performance optimizations in subsequent releases.
June 2025 — Focused on MLIR generation enhancements and report output correctness in the tt-forge-fe project. Delivered MLIR-format report output, hardened MLIR attribute naming by replacing a hard-coded string with a constant, and added dynamic system descriptor selection to support current and future hardware architectures (e.g., wormhole, blackhole). These changes improve reliability, enable hardware-aware optimizations, and reduce maintenance burden.
June 2025 — Focused on MLIR generation enhancements and report output correctness in the tt-forge-fe project. Delivered MLIR-format report output, hardened MLIR attribute naming by replacing a hard-coded string with a constant, and added dynamic system descriptor selection to support current and future hardware architectures (e.g., wormhole, blackhole). These changes improve reliability, enable hardware-aware optimizations, and reduce maintenance burden.
Month: 2025-05 | Tenstorrent tt-forge-fe contributed improvements concentrated on test reliability and model accuracy gating for Detr. The work enhanced CI signal fidelity and traceability for model-related changes, supporting higher confidence in production readiness.
Month: 2025-05 | Tenstorrent tt-forge-fe contributed improvements concentrated on test reliability and model accuracy gating for Detr. The work enhanced CI signal fidelity and traceability for model-related changes, supporting higher confidence in production readiness.
April 2025 monthly summary focusing on key accomplishments across tt-tvm and tt-forge-fe, delivering core features, indexing and shape handling improvements, and platform readiness for production deployment. The work centered on embedding lookup enhancements for PaddlePaddle, expanded advanced indexing capabilities, support for negative dimensions, dynamic shapes handling, and robust softmax attribute handling to ensure correctness across TVM-backed models.
April 2025 monthly summary focusing on key accomplishments across tt-tvm and tt-forge-fe, delivering core features, indexing and shape handling improvements, and platform readiness for production deployment. The work centered on embedding lookup enhancements for PaddlePaddle, expanded advanced indexing capabilities, support for negative dimensions, dynamic shapes handling, and robust softmax attribute handling to ensure correctness across TVM-backed models.
Monthly summary for 2025-03 focusing on business value and technical achievements for tenstorrent/tt-forge-fe. Highlighted work includes delivering new features, stabilizing the MLIR backend, and updating dependencies to strengthen backend compatibility and model support.
Monthly summary for 2025-03 focusing on business value and technical achievements for tenstorrent/tt-forge-fe. Highlighted work includes delivering new features, stabilizing the MLIR backend, and updating dependencies to strengthen backend compatibility and model support.
January 2025 focused on strengthening Forge-FE lowering reliability and expanding model support. Key outcomes include introducing an AttributeMapper with MLIRGenerator integration for flexible attribute renaming and type conversion during Forge-FE to MLIR lowering; extending the MLIR generator with repeat_interleave support to enable usage with models like Llama-3.2-1B; stabilizing compatibility by reverting the 'reduce_avg' attribute rename and updating op_mapping; expanding test coverage by removing an xfail in the embedding test to exercise meta-llama/Llama-3.2-1B across configured models; and documenting Pytest usage to standardize testing practices. Collectively these changes reduce integration risk, broaden model compatibility, and improve developer efficiency and test reliability.
January 2025 focused on strengthening Forge-FE lowering reliability and expanding model support. Key outcomes include introducing an AttributeMapper with MLIRGenerator integration for flexible attribute renaming and type conversion during Forge-FE to MLIR lowering; extending the MLIR generator with repeat_interleave support to enable usage with models like Llama-3.2-1B; stabilizing compatibility by reverting the 'reduce_avg' attribute rename and updating op_mapping; expanding test coverage by removing an xfail in the embedding test to exercise meta-llama/Llama-3.2-1B across configured models; and documenting Pytest usage to standardize testing practices. Collectively these changes reduce integration risk, broaden model compatibility, and improve developer efficiency and test reliability.
Monthly summary for 2024-12: Delivered key CI/CD and dependency management improvements for tt-forge-fe, and strengthened test suite reliability and code organization. These changes enhanced model compatibility and reliability of builds, reduced flaky tests across environments, and improved maintainability.
Monthly summary for 2024-12: Delivered key CI/CD and dependency management improvements for tt-forge-fe, and strengthened test suite reliability and code organization. These changes enhanced model compatibility and reliability of builds, reduced flaky tests across environments, and improved maintainability.
November 2024 performance summary for tenstorrent/tt-forge-fe: Delivered foundational MLIR generation improvements and targeted codebase cleanup that strengthen the product's reliability, performance, and maintainability. Key focus areas included expanding operation support for cosine and sine, stabilizing Llama 3b compilation, and removing deprecated graph-building primitives to streamline the compilation path and reduce maintenance overhead. The work accelerates model deployment readiness and reduces risk in future MLIR lowering changes.
November 2024 performance summary for tenstorrent/tt-forge-fe: Delivered foundational MLIR generation improvements and targeted codebase cleanup that strengthen the product's reliability, performance, and maintainability. Key focus areas included expanding operation support for cosine and sine, stabilizing Llama 3b compilation, and removing deprecated graph-building primitives to streamline the compilation path and reduce maintenance overhead. The work accelerates model deployment readiness and reduces risk in future MLIR lowering changes.

Overview of all repositories you've contributed to across your timeline