EXCEEDS logo
Exceeds
Nikola Ostojic

PROFILE

Nikola Ostojic

Over four months, contributed to the tenstorrent/tt-metal and tenstorrent/tt-inference-server repositories by building and optimizing machine learning infrastructure. Focused on CI/CD improvements, expanded model testing, and introduced tracing instrumentation to enable detailed execution analytics for generator models. Enhanced inference server stability by tuning memory management and trace region sizing, supporting large models such as llama-70b and qwen2.5-vl while reducing out-of-memory errors. Leveraged Python, YAML, and configuration management to streamline workflows, improve test coverage, and ensure cross-device compatibility. The work enabled faster feedback cycles, more reliable deployments, and efficient resource utilization for large-scale deep learning inference workloads.

Overall Statistics

Feature vs Bugs

83%Features

Repository Contributions

18Total
Bugs
1
Commits
18
Features
5
Lines of code
241
Activity Months4

Work History

January 2026

1 Commits

Jan 1, 2026

January 2026 performance summary for the tt-inference-server: focused on stabilizing large-model inference and expanding model support. Implemented memory-management and trace-region optimizations to mitigate OOM during large LLM inference, improving reliability for models such as qwen2.5-vl and llama-3.2-3b, and added support for llama-70b. This work reduces outage risk, enables deployment of larger models, and enhances overall throughput under heavy load. The changes were delivered via targeted fixes and tuning associated with the commit f47ab4f8c2e601125e6bb19170273d2f6ff009f4, including trace region adjustments, fixes for qwen2.5-vl and llama-3.2-3b, and new llama-70b support.

October 2025

1 Commits • 1 Features

Oct 1, 2025

Monthly summary for 2025-10 focused on business value and technical execution for tenstorrent/tt-inference-server. Key feature delivered: Llama-3.1-8B Trace Region Sizing Optimization across multiple device specs, enabling better performance and resource allocation. No major bugs fixed this month in this repo based on current records. Overall impact: improved inference throughput and more efficient hardware utilization for the Llama-3.1-8B model. Technologies/skills demonstrated: performance optimization, cross-device compatibility, and change-tracking via explicit commit references.

September 2025

1 Commits • 1 Features

Sep 1, 2025

In September 2025, delivered initial tracing instrumentation for the generator model’s prefill path in tenstorrent/tt-metal. This change enables detailed execution analytics by capturing inputs, outputs, and timing, supporting faster debugging and data-driven performance profiling. The work establishes a foundation for optimization and troubleshooting in the prefill flow; testing and validation are planned for the next cycle to confirm trace accuracy and impact.

August 2025

15 Commits • 3 Features

Aug 1, 2025

August 2025: Delivered CI-focused improvements and expanded model testing in the tt-metal repo, driving reliability, coverage, and faster feedback for Gemma3 releases.

Activity

Loading activity data...

Quality Metrics

Correctness93.2%
Maintainability93.2%
Architecture93.2%
Performance93.2%
AI Usage28.8%

Skills & Technologies

Programming Languages

NonePythonYAML

Technical Skills

CI/CDConfiguration ManagementContinuous IntegrationData ProcessingDeep LearningDevOpsMachine LearningModel OptimizationPythonTestingWorkflow AutomationYAMLYAML ConfigurationYAML configurationdevice configuration

Repositories Contributed To

2 repos

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-metal

Aug 2025 Sep 2025
2 Months active

Languages Used

NonePythonYAML

Technical Skills

CI/CDConfiguration ManagementContinuous IntegrationDevOpsPythonTesting

tenstorrent/tt-inference-server

Oct 2025 Jan 2026
2 Months active

Languages Used

Python

Technical Skills

device configurationmodel optimizationperformance tuningMachine LearningModel OptimizationPython