Exceeds - Team AI Productivity Dashboard

Kosta Novokmet

PROFILE

Kosta Novokmet

Kosta Novokmet developed and enhanced core backend features for the tenstorrent/tt-inference-server repository over a two-month period, focusing on scalable API and service architecture for large language model inference. He delivered tokenization and embeddings APIs, modularized the embeddings service, and introduced dynamic batching and streaming capabilities to improve throughput and maintainability. Using Python, C++, and FastAPI, Kosta implemented asynchronous and concurrent programming patterns, robust error handling, and performance optimizations. His work enabled token-based prompts, end-to-end streaming, and support for new GPT-OSS models, resulting in a more extensible, reliable, and high-performance backend suitable for demanding deployment scenarios.

Overall Statistics

Feature vs Bugs

100%Features

Repository Contributions

10Total

Bugs

Commits

Features

Lines of code

3,275

Activity Months2

Your Network

379 people

Same Organization

@tenstorrent.com

347

Abhishek AgarwalMember

Alex ApostoluMember

Almeet BhullarMember

Andjela BogdanovicMember

Alex BuckMember

Adriel BustamanteMember

Brata ChoudhuryMember

Andrija CicovicMember

Aleksandar ColicMember

Shared Repositories

Aleksandar CvejicMember

Anirudh RamchandranMember

Work History

February 2026

3 Commits • 2 Features

Feb 1, 2026

February 2026 monthly summary for tenstorrent/tt-inference-server: Delivered streaming and performance enhancements for LLM/server, modularizing embeddings as a separate service layer, and expanded GPT-OSS model support with refined resource configuration. Implementations improve throughput in high-demand scenarios and enable easier maintenance and future model integrations.

3 Commits • 2 Features

Feb 1, 2026

February 2026

January 2026

7 Commits • 3 Features

Jan 1, 2026

January 2026 summary: Delivered tokenization API enhancements, vLLMRunner core improvements with batching and dynamic sampling, and a dedicated embeddings service to improve modularity and extensibility. Implemented critical bug fixes and stability improvements across tests and linting, contributing to higher reliability and throughput. The changes enable token-based prompts, faster request processing, and a scalable pipeline suitable for larger models and deployments, with clearer API boundaries and improved build health.

January 2026

7 Commits • 3 Features

Jan 1, 2026

Activity

Loading activity data...

Quality Metrics

Correctness90.0%

Maintainability84.0%

Architecture86.0%

Performance84.0%

AI Usage46.0%

Skills & Technologies

Programming Languages

C++DockerfilePython

Technical Skills

API DevelopmentAPI designAPI developmentBackend DevelopmentC++ developmentConcurrencyContainerizationDevOpsError HandlingFastAPIMachine LearningPerformance optimizationPythonService ArchitectureUnit Testing

Repositories Contributed To

1 repo

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-inference-server

Jan 2026 – Feb 2026

2 Months active

Languages Used

DockerfilePythonC++

Technical Skills

API DevelopmentAPI designAPI developmentBackend DevelopmentContainerizationDevOps