EXCEEDS logo
Exceeds
Jovan Serbedzija

PROFILE

Jovan Serbedzija

Jovan Serbedzija developed advanced compiler and backend features for the tenstorrent/tt-mlir repository, focusing on MLIR-based optimizations for deep learning workloads. He engineered robust operator fusion pipelines, including SDPA and transformer attention fusing, and expanded dialect support for operations like Conv2d, ConvTranspose2d, and Mish activation. Using C++ and Python, Jovan implemented cross-device normalization, enhanced test infrastructure, and improved runtime stability by addressing edge cases and integrating validation logic. His work emphasized maintainability and performance, delivering code refactors, bug fixes, and modular test suites that increased model compatibility, throughput, and reliability across evolving hardware and software backends.

Overall Statistics

Feature vs Bugs

76%Features

Repository Contributions

62Total
Bugs
10
Commits
62
Features
31
Lines of code
35,527
Activity Months19

Work History

April 2026

3 Commits • 3 Features

Apr 1, 2026

April 2026 performance summary for tenstorrent/tt-mlir focused on delivering high-value MLIR-based transformer optimizations and strengthening test infrastructure. Implemented a two-level attention fusing pipeline that merges Q/K/V matmuls into a single matmul and optimizes attention decode with new ops (TTIR TTNN). Expanded RoPE fusing with RoPEExpandedFusing and renamed RoPERotateHalfFusing to improve maintainability, covering GPT-OSS 20B decomposition. Introduced DeferredDevice to defer real device opening until after compilation, stabilizing CI and enabling clean up after execution. Enabled artifact_dir return from compile_and_execute to support fusing verification. Refactored and modularized SDPA fusing tests to broaden coverage across model architectures, improving reliability of fusing validation. Overall impact includes measurable performance/throughput improvements, greater model compatibility, and enhanced testability and maintainability.

March 2026

7 Commits • 1 Features

Mar 1, 2026

March 2026 monthly summary for tenstorrent/tt-mlir focusing on delivering robust runtime behavior and expanding SDPA-based fusion coverage for modern attention models. The work emphasizes business value through increased model throughput, stability, and broader compatibility with optimized fused kernels.

February 2026

4 Commits • 3 Features

Feb 1, 2026

February 2026: Focused on delivering cross-device normalization, SDPA fusing enhancements, and RoPE fusion robustness in tt-mlir, while stabilizing the test suite by reverting the DRAM storage default for Conv2d/ConvTranspose2d pending metal backend fixes. The work strengthens multi-device scalability, operator fusion effectiveness, and overall reliability across TTIR/TTNN/EmitPy, aligning with performance and hardware-backend goals.

January 2026

1 Commits • 1 Features

Jan 1, 2026

January 2026: Strengthened the SDPA TTIR fusion path in tenstorrent/tt-mlir with a new reshape-broadcast-reshape fusion, enabling more efficient attention mechanisms and supporting GQA head expansion. Implemented SDPA fusing improvements with per-tensor scaling, centralized input preparation, and LoadCachedOp parsing. Added post-creation operation constraint validation to prevent invalid IR, while allowing recoverable issues to be addressed in TTNNWorkarounds. Refactored to remove obsolete patterns and consolidated related changes for maintainability. These changes deliver measurable performance and scalability gains in attention computations and set a solid foundation for future MLIR pattern optimizations.

December 2025

6 Commits • 3 Features

Dec 1, 2025

December 2025 performance highlights across tenstorrent/tt-mlir and tenstorrent/tt-xla. Delivered key features for improved transformer workloads and memory reliability, including SDPA fusing enhancements enabling fused scaled dot-product attention paths with optional masks and scaling, and support for newer PyTorch decomposition patterns. Introduced ConvTranspose2d preparation and DRAM-based config tensors to prevent OOM and boost reliability. Fixed a TTNNModel regression by making curPosTensor optional in SDPA decode, stabilizing LLaMA benchmarks. Added a tunable transpose-matmul fusion flag to TT-XLA to balance performance and memory usage. All changes include tests and docs to ensure maintainability. Business value: lower latency and higher throughput for transformer workloads, reduced memory pressure, and greater model compatibility across PyTorch versions and deployment scenarios.

November 2025

2 Commits • 1 Features

Nov 1, 2025

Month 2025-11: Activation-related enhancements in tt-mlir delivering faster fused kernels for matmul/linear and expanding operator coverage with Mish activation in TTIR. Focused on performance through fusion patterns, test coverage, and dialect completeness.

October 2025

3 Commits • 2 Features

Oct 1, 2025

October 2025 TTNN/MLIR workstream delivered three pivotal capabilities in tenstorrent/tt-mlir, driving code simplification, performance configurability, and broader operator coverage for models deployed on Metal and other backends.

September 2025

1 Commits

Sep 1, 2025

September 2025: Focused on stabilizing the tt-metal build by addressing a MemoryConfig compilation issue in the NLP decode path, delivering a targeted include directive fix that eliminates a blocking error and improves integration resilience.

August 2025

6 Commits • 1 Features

Aug 1, 2025

August 2025 (2025-08) monthly summary for tenstorrent/tt-mlir focusing on backend unification, traceability, and runtime stability across TTNN and Chisel components. Key work delivered simplified cross-backend behavior by removing int32 workarounds in TTNN (affecting remainder, reshape, and binary ops) to align with Metal backend updates, reducing backend-specific special casing and maintenance. Location-aware reshape enhancements improved traceability by propagating the originating operation's location to new reshape operations, aiding debugging and verification workflows. Chisel runtime stability fixes addressed missing golden operations, correct data type casting for operator outputs, and robust input tensor handling with improved file path sorting to enhance reliability. These efforts collectively strengthen cross-backend consistency, debugging tooling, and overall code health. Key achievements for August include: - TTNN int32 workaround removals across reshape, remainder, repeat, and other binary ops (commits: c7b85bad7b5f44149a98f73a67619a32afd54501; 0fa2a70107118222b16ec17600fd2b1b9c337f8f; 94af4a5f6f9eaeda05a82b68245bc5e2e4434f17; 26c18fd602fbfb0f17efc5ad59952b1ec1af53e0). - Improve reshape operation location tracing, enabling inherited location from originating operation (commit: 089c84e31147cc95c3f55da975ca1c2c6e3239d0). - Chisel runtime stability and correctness fixes (commit: 8035493f246955748f6c2f6b2da8052856c9589d). - Overall improvement in maintainability and cross-backend consistency through standardized behavior and enhanced traceability.

July 2025

3 Commits • 2 Features

Jul 1, 2025

2025-07 Monthly Summary across tt-metal and tt-forge-fe focused on robustness and performance improvements through new input handling capabilities and configuration-driven optimizations. In tt-metal, added conversion from optional TensorSpec to optional Tensor to support the PrepareConvBias constraint API, enabling handling of absent TensorSpecs and increasing flexibility and robustness of tensor input handling. Commits: 012325b78f7b8f960f0b869b67ce892bb7aa10df; 36b72a2ab65cf54bf11e09a43460d1bf5f4bbf83.

June 2025

1 Commits • 1 Features

Jun 1, 2025

June 2025: focus on strengthening test coverage for Conv2dWeights unpadding in tt-metal. Implemented a new validation test for PrepareConv2dWeights unpadding, with debugging output to diagnose tensor shape issues. This work reduces risk of regressions in unpadding logic and accelerates diagnosis of shape issues in future changes.

May 2025

3 Commits • 2 Features

May 1, 2025

May 2025 monthly summary: Key features delivered include the ExplicateTMs pass to explicitly handle broadcasts and reshapes, improving optimization for EraseInverseOps and addressing implicit-op blockers (with a ttnn.zeros int32 workaround and a refactor of ttnn.repeat lowering). Also added a ComputeKernelConfig attribute to the TTNN conv2d dialect to enable fine-grained control over compute kernel execution parameters (math fidelity, approximation modes, and precision settings). Major bug fixed: documentation typo corrected for the PrepareConv2dWeights build flag, ensuring accurate guidance for enabling the op model library (DTTMLIR_ENABLE_OPMODEL). Overall impact: enhanced optimization opportunities, greater configurability for conv2d workloads on Tenstorrent devices, and improved build reliability and developer experience. Technologies/skills demonstrated: MLIR dialect extensions, pass design and optimization, kernel configuration, dtype/workaround handling, and documentation hygiene.

April 2025

8 Commits • 4 Features

Apr 1, 2025

April 2025 — In tenstorrent/tt-mlir, delivered substantial TTNN enhancements for Conv2D/ConvTranspose2D, improved stability of memory/config handling, and strengthened build/test ergonomics. Key outcomes include end-to-end support for Conv2D/ConvTranspose2D workflows, expanded test coverage, and API/build improvements that reduce integration risk and improve maintainability.

March 2025

3 Commits • 1 Features

Mar 1, 2025

March 2025 summary for tenstorrent/tt-mlir: Stabilized Conv2D post-refactor, reintroduced TTNN verification with expanded tests, and added Conv2D to EmitC conversion to enable EmitC compatibility. These changes improve correctness, test coverage, and cross-toolchain interoperability, reducing risk and accelerating deployment of Conv2D workloads in TTNN/EmitC pipelines.

February 2025

2 Commits • 1 Features

Feb 1, 2025

February 2025 monthly summary focusing on MLIR-based features in tenstorrent/tt-mlir. Key feature delivered this month centers on Conv2d/ConvTranspose2d dialect enhancements with asymmetric padding support and broader integration into TTIR/TTNN dialects, runtime utilities, and verification workflows. Also completed a Conv2d op refactor to align with the TTNN interface and introduced Conv2dConfig, enabling conv2d_config with more flexible attribute types.

January 2025

1 Commits • 1 Features

Jan 1, 2025

January 2025 monthly summary for tenstorrent/tt-mlir focused on expanding MLIR dialect capabilities and deployment readiness. Implemented ConvTranspose2d support for TTIR and TTNN dialects within MLIR, enabling transposed convolution operations, with full signature, attributes, and verification logic. Established end-to-end work flow including conversion patterns from TTIR to TTNN and emission of ConvTranspose2d operations to flatbuffers, aligning with deployment pipelines and flatbuffer-based tooling.

December 2024

4 Commits • 1 Features

Dec 1, 2024

Month: 2024-12 Concise monthly summary focusing on business value and technical achievements. Key features delivered: - TTIR operand_constraints removal and interface simplification: removed operand_constraints from TTIR operations and related GenericOp to simplify the interface and improve usability. Commits: f22c4160920275be35e3beb84c0017c60317cf18; 809b126291285915308dbf66f55bc50a4c930df0. - FlatBuffer embedding serialization fixes: corrected data handling in FlatBuffer serialization for embedding and embedding backward operations to ensure correct output handling and reuse of IDs. Commits: 305ea4768685f8cf1852dad9df4b35d784d2bb2a; 1ce54d5c70ad6f9350f7350c592839654d48782e. Major bugs fixed: - Fixed embedding data mismatch in FlatBuffer serialization. - Fixed flatbuffer serialization for embedding backward op. Overall impact and accomplishments: - Interface simplifications reduce maintenance costs and accelerate onboarding. - Correctness and reliability improvements in embedding-related serialization; downstream components benefit from predictable outputs and ID reuse. Technologies/skills demonstrated: - TTIR and GenericOp interface design, FlatBuffer serialization, embedding operations, cross-repo coordination and commit-driven delivery.

November 2024

3 Commits • 2 Features

Nov 1, 2024

November 2024 monthly summary for tenstorrent/tt-mlir. Delivered core feature and documentation improvements that extend operator capabilities and streamline developer onboarding, with direct business value in model deployment readiness and maintainability. Implemented element-wise log operation across TTIR and TTNN dialects, including operation definition, conversions, TTNN emission, and test coverage. Enhanced TTIR operation decomposition documentation with step-by-step guidance and an Index->Slice example, and fixed code references to improve correctness and maintainability.

October 2024

1 Commits • 1 Features

Oct 1, 2024

October 2024 monthly summary for tenstorrent/tt-mlir: Focused on delivering a new TTIR decomposition pass to simplify TTIR operations by decomposing IndexOp into SliceOp within the TTNN pipeline, improving both maintainability and downstream optimization opportunities. The pass includes header files, pass definitions, and full implementation, and is integrated into the TTNN pipeline for correct conversion.

Activity

Loading activity data...

Quality Metrics

Correctness93.0%
Maintainability86.2%
Architecture90.4%
Performance81.8%
AI Usage32.6%

Skills & Technologies

Programming Languages

CC++CMakeFlatBuffersMLIRMarkdownPythonTableGenTcl

Technical Skills

API developmentBackend DevelopmentBenchmarkingBug FixingC++C++ DevelopmentC++ developmentC++ programmingCMakeCode GenerationCode RefactoringCode Reference ManagementCompiler DesignCompiler DevelopmentCompiler Optimization

Repositories Contributed To

4 repos

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-mlir

Oct 2024 Apr 2026
16 Months active

Languages Used

C++MLIRMarkdownTableGenCCMakeFlatBuffersTcl

Technical Skills

Compiler DevelopmentMLIRPass DevelopmentTensor OperationsCode Reference ManagementDocumentation

tenstorrent/tt-metal

Jun 2025 Sep 2025
3 Months active

Languages Used

C++

Technical Skills

C++ developmenttensor operationsunit testingAPI developmentTensor manipulationTensor operations

tenstorrent/tt-forge-fe

Jul 2025 Jul 2025
1 Month active

Languages Used

C++Python

Technical Skills

BenchmarkingCompiler OptimizationMLIRPerformance Tuning

tenstorrent/tt-xla

Dec 2025 Dec 2025
1 Month active

Languages Used

C++Python

Technical Skills

C++ developmentPython scriptingcompiler designperformance optimization