
Marko Radosavljevic developed advanced fused operation and testing infrastructure for the tenstorrent/tt-llk and tt-metal repositories, focusing on hardware-aware simulation, robust configuration, and extensible architecture support. He implemented tile-aware SFPU modeling and batch FPU fusion using C++ and Python, enabling accurate hardware simulation and scalable test automation. Marko refactored the Fuser module for multi-architecture compatibility, introduced YAML-driven configuration with Pydantic validation, and improved accessibility in debug outputs. His work addressed configuration safety, reduced CI noise, and streamlined validation cycles, resulting in maintainable, performance-oriented code that accelerates development and ensures reliable, production-grade releases across evolving hardware targets.
April 2026: tt-metal delivered a significant refactor of the Fuser module to enable first-class multi-architecture support, paired with testing robustness improvements and cleaner separation of concerns. The changes improve maintainability, extensibility for new architectures, and CI reliability across targets (including handling of unsupported architectures).
April 2026: tt-metal delivered a significant refactor of the Fuser module to enable first-class multi-architecture support, paired with testing robustness improvements and cleaner separation of concerns. The changes improve maintainability, extensibility for new architectures, and CI reliability across targets (including handling of unsupported architectures).
February 2026 monthly summary for tenstorrent/tt-llk focused on expanding fused compute capabilities, strengthening test infrastructure, and improving developer experience. Delivered multi-unpack FPU fusion across L1-to-L1 with inclusion of ReduceBlockMax, enhanced UnpackerA integration with a new reuse_dest option, expanded ReduceUnpacker/test support, and robust fused-test validation with YAML/Pydantic validation and block-based traversal. Improved debug output accessibility for colorblind users and fixed codegen reliability by replacing MathFidelity ints with an enum. These efforts increase performance potential, test coverage, and production-grade reliability, driving faster validation and safer, higher-quality releases.
February 2026 monthly summary for tenstorrent/tt-llk focused on expanding fused compute capabilities, strengthening test infrastructure, and improving developer experience. Delivered multi-unpack FPU fusion across L1-to-L1 with inclusion of ReduceBlockMax, enhanced UnpackerA integration with a new reuse_dest option, expanded ReduceUnpacker/test support, and robust fused-test validation with YAML/Pydantic validation and block-based traversal. Improved debug output accessibility for colorblind users and fixed codegen reliability by replacing MathFidelity ints with an enum. These efforts increase performance potential, test coverage, and production-grade reliability, driving faster validation and safer, higher-quality releases.
Month: 2026-01 — Tenstorrent tt-llk fused test infrastructure expanded for configurability, performance, and maintainability. Key work focused on extending FPU/SFPU capabilities, global test configuration, and test infrastructure stability to accelerate validation of fused kernels and reduce run times. Key features delivered: - Fuser and SFPU enhancements with test configurability: Adds new FPU operations to the fuser (ReduceScalar, ReduceColumn, ReduceRow), introduces batch processing, and expands unary SFPU test configurability with custom output dimensions, starting tile index, and FAST_MODE for faster test sweeps; updates UnpackerAB support and related test configuration. - Global fused tests configuration and broadcasting enhancements: Introduces a global fused tests configuration (performance mode, profiling fields, loop factor) and adds broadcast_type in fused YAML configuration, along with unifying operand dimension parameters for maintainability. - Metal SFPU support in tests: Integrates log1p and tanh operations into testing infra and automates the download of SFPU headers from the tt-metal repo. - Custom batch size and size management for fused operations: Adds batch_size to the fuser YAML config to permit processing multiple tiles at once; standardizes operand dimensions for easier maintenance. - Additional quality and test infra improvements: Cleanup to remove outdated SFPU header includes and prevent compilation issues. Major bugs fixed: - Cleanup: Removed includes of deleted sfpu max header files to prevent compilation issues and streamline codebase maintenance. Overall impact and accomplishments: - Significantly improved test configurability and performance visibility for fused operations, enabling faster, more stable performance benchmarking and regression checks. - Reduced maintenance burden through global configuration, standardized parameters, and streamlined test setup across fused tests and SFPU coverage. - Strengthened code health by removing stale includes and ensuring test scaffolding remains aligned with current header revisions. Technologies/skills demonstrated: - Hardware-oriented FPU/SFPU testing, YAML-driven configuration, and test automation. - Cross-repo integration with tt-metal for SFPU headers, and build/test infrastructure enhancements for faster iterations. - Performance-minded validation through global config, perf mode, and loop_factor, plus FAST_MODE for rapid sweep coverage.
Month: 2026-01 — Tenstorrent tt-llk fused test infrastructure expanded for configurability, performance, and maintainability. Key work focused on extending FPU/SFPU capabilities, global test configuration, and test infrastructure stability to accelerate validation of fused kernels and reduce run times. Key features delivered: - Fuser and SFPU enhancements with test configurability: Adds new FPU operations to the fuser (ReduceScalar, ReduceColumn, ReduceRow), introduces batch processing, and expands unary SFPU test configurability with custom output dimensions, starting tile index, and FAST_MODE for faster test sweeps; updates UnpackerAB support and related test configuration. - Global fused tests configuration and broadcasting enhancements: Introduces a global fused tests configuration (performance mode, profiling fields, loop factor) and adds broadcast_type in fused YAML configuration, along with unifying operand dimension parameters for maintainability. - Metal SFPU support in tests: Integrates log1p and tanh operations into testing infra and automates the download of SFPU headers from the tt-metal repo. - Custom batch size and size management for fused operations: Adds batch_size to the fuser YAML config to permit processing multiple tiles at once; standardizes operand dimensions for easier maintenance. - Additional quality and test infra improvements: Cleanup to remove outdated SFPU header includes and prevent compilation issues. Major bugs fixed: - Cleanup: Removed includes of deleted sfpu max header files to prevent compilation issues and streamline codebase maintenance. Overall impact and accomplishments: - Significantly improved test configurability and performance visibility for fused operations, enabling faster, more stable performance benchmarking and regression checks. - Reduced maintenance burden through global configuration, standardized parameters, and streamlined test setup across fused tests and SFPU coverage. - Strengthened code health by removing stale includes and ensuring test scaffolding remains aligned with current header revisions. Technologies/skills demonstrated: - Hardware-oriented FPU/SFPU testing, YAML-driven configuration, and test automation. - Cross-repo integration with tt-metal for SFPU headers, and build/test infrastructure enhancements for faster iterations. - Performance-minded validation through global config, perf mode, and loop_factor, plus FAST_MODE for rapid sweep coverage.
December 2025 performance highlights for tenstorrent/tt-llk: Achieved hardware-aware SFPU behavior modeling, safer LLK kernel configuration, and expanded end-to-end LLK testing. Delivered tangible business value by aligning software simulations with hardware realities, stabilizing pack usage, and introducing scalable test automation for chained LLK operations. The combined work reduces silicon risk, accelerates validation cycles, and enables more reliable performance optimizations across the LLK stack.
December 2025 performance highlights for tenstorrent/tt-llk: Achieved hardware-aware SFPU behavior modeling, safer LLK kernel configuration, and expanded end-to-end LLK testing. Delivered tangible business value by aligning software simulations with hardware realities, stabilizing pack usage, and introducing scalable test automation for chained LLK operations. The combined work reduces silicon risk, accelerates validation cycles, and enables more reliable performance optimizations across the LLK stack.

Overview of all repositories you've contributed to across your timeline