Exceeds - Team AI Productivity Dashboard

Marko Vlahovic

PROFILE

Marko Vlahovic

Developed and delivered end-to-end hardware performance observability systems for the tenstorrent/tt-llk and tenstorrent/tt-metal repositories, focusing on C++ and Python. Built a C++ performance counter infrastructure with per-thread and later shared L1 buffer architectures, enabling detailed tracking and analysis of hardware metrics across UNPACK, MATH, and PACK threads. Enhanced data collection and reporting through Python scripting, providing automated summaries and derived metrics for performance analysis. Refactored the subsystem to reduce memory usage by 67% and improved synchronization using multithreading techniques. Simplified metrics output for clearer insights, supporting reproducible analysis, targeted optimization, and maintainable system architecture in CI environments.

Overall Statistics

Feature vs Bugs

100%Features

Repository Contributions

2Total

Bugs

Commits

Features

Lines of code

3,329

Activity Months2

Your Network

884 people

Same Organization

@tenstorrent.com

377

Abhishek AgarwalMember

Alex ApostoluMember

Almeet BhullarMember

Andjela BogdanovicMember

Alex BuckMember

Adriel BustamanteMember

Brata ChoudhuryMember

Andrija CicovicMember

Aleksandar ColicMember

Shared Repositories

507

Nikola VelickovicMember

Ryan ZhuMember

Stephen OsborneMember

Lazar PremovicMember

Vishal ChaudharyMember

Matt CraigheadMember

mcw-anasuyaMember

Mouliraj ElamuruganMember

Nikola CvetkovicMember

Work History

March 2026

1 Commits • 1 Features

Mar 1, 2026

March 2026 monthly work summary focusing on delivering a high-impact optimization in the performance counter subsystem of tt-metal, with an emphasis on memory efficiency, data integrity, and maintainability.

1 Commits • 1 Features

Mar 1, 2026

March 2026

February 2026

1 Commits • 1 Features

Feb 1, 2026

February 2026: Delivered the Tensix Performance Counter System for tt-llk, establishing end-to-end hardware performance observability across UNPACK/MATH/PACK threads and enabling data-driven optimization. Core features include a C++ PerfCounters plumbing with per-thread L1 buffers, Python tooling for configuration and readout, and derived metrics with automated summaries. Added matmul kernel instrumentation and kernel-level integration to provide side-by-side REQUESTS vs GRANTS analysis, improving visibility into arbitration, stalls, and bottlenecks. This work lays the foundation for reproducible performance analysis, targeted tuning, and better capacity planning.

February 2026

1 Commits • 1 Features

Feb 1, 2026

Activity

Loading activity data...

Quality Metrics

Correctness100.0%

Maintainability80.0%

Architecture100.0%

Performance80.0%

AI Usage50.0%

Skills & Technologies

Programming Languages

C++Python

Technical Skills

C++ developmentPython scriptingdata analysisdata visualizationhardware integrationmultithreadingperformance analysisperformance optimizationsystem architecture

Repositories Contributed To

2 repos

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-llk

Feb 2026 – Feb 2026

1 Month active

Languages Used

C++Python

Technical Skills

C++ developmentPython scriptingdata visualizationhardware integrationperformance analysis

tenstorrent/tt-metal

Mar 2026 – Mar 2026

1 Month active

Languages Used

C++Python

Technical Skills

data analysismultithreadingperformance optimizationsystem architecture