EXCEEDS logo
Exceeds
Nikola Ostojic

PROFILE

Nikola Ostojic

Nikola Ostojic contributed to the tenstorrent/tt-metal and tt-inference-server repositories, focusing on model optimization, CI/CD reliability, and large-model inference stability. He enhanced continuous integration workflows by expanding test coverage and improving environment stability using Python and YAML, which reduced flakiness and accelerated feedback for Gemma3 releases. In tt-metal, he introduced tracing instrumentation for generator model prefill paths, enabling detailed execution analytics for performance profiling. For tt-inference-server, Nikola optimized trace region sizing and memory management, addressing out-of-memory issues and supporting larger models like llama-70b. His work demonstrated depth in DevOps, machine learning, and performance tuning across evolving model architectures.

Overall Statistics

Feature vs Bugs

83%Features

Repository Contributions

18Total
Bugs
1
Commits
18
Features
5
Lines of code
241
Activity Months4

Work History

January 2026

1 Commits

Jan 1, 2026

January 2026 performance summary for the tt-inference-server: focused on stabilizing large-model inference and expanding model support. Implemented memory-management and trace-region optimizations to mitigate OOM during large LLM inference, improving reliability for models such as qwen2.5-vl and llama-3.2-3b, and added support for llama-70b. This work reduces outage risk, enables deployment of larger models, and enhances overall throughput under heavy load. The changes were delivered via targeted fixes and tuning associated with the commit f47ab4f8c2e601125e6bb19170273d2f6ff009f4, including trace region adjustments, fixes for qwen2.5-vl and llama-3.2-3b, and new llama-70b support.

October 2025

1 Commits • 1 Features

Oct 1, 2025

Monthly summary for 2025-10 focused on business value and technical execution for tenstorrent/tt-inference-server. Key feature delivered: Llama-3.1-8B Trace Region Sizing Optimization across multiple device specs, enabling better performance and resource allocation. No major bugs fixed this month in this repo based on current records. Overall impact: improved inference throughput and more efficient hardware utilization for the Llama-3.1-8B model. Technologies/skills demonstrated: performance optimization, cross-device compatibility, and change-tracking via explicit commit references.

September 2025

1 Commits • 1 Features

Sep 1, 2025

In September 2025, delivered initial tracing instrumentation for the generator model’s prefill path in tenstorrent/tt-metal. This change enables detailed execution analytics by capturing inputs, outputs, and timing, supporting faster debugging and data-driven performance profiling. The work establishes a foundation for optimization and troubleshooting in the prefill flow; testing and validation are planned for the next cycle to confirm trace accuracy and impact.

August 2025

15 Commits • 3 Features

Aug 1, 2025

August 2025: Delivered CI-focused improvements and expanded model testing in the tt-metal repo, driving reliability, coverage, and faster feedback for Gemma3 releases.

Activity

Loading activity data...

Quality Metrics

Correctness93.2%
Maintainability93.2%
Architecture93.2%
Performance93.2%
AI Usage28.8%

Skills & Technologies

Programming Languages

NonePythonYAML

Technical Skills

CI/CDConfiguration ManagementContinuous IntegrationData ProcessingDeep LearningDevOpsMachine LearningModel OptimizationPythonTestingWorkflow AutomationYAMLYAML ConfigurationYAML configurationdevice configuration

Repositories Contributed To

2 repos

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-metal

Aug 2025 Sep 2025
2 Months active

Languages Used

NonePythonYAML

Technical Skills

CI/CDConfiguration ManagementContinuous IntegrationDevOpsPythonTesting

tenstorrent/tt-inference-server

Oct 2025 Jan 2026
2 Months active

Languages Used

Python

Technical Skills

device configurationmodel optimizationperformance tuningMachine LearningModel OptimizationPython