EXCEEDS logo
Exceeds
Djordje Madic

PROFILE

Djordje Madic

During a three-month period, Dusan Madic developed and enhanced the tenstorrent/tt-inference-server, focusing on scalable language model inference. He delivered a production-grade C++ LLM engine with paged attention, prefix caching, and a sequence scheduler to improve throughput and latency for long-context tasks. His work included OpenAI-compatible API endpoints, vLLM plugin integration, and Docker-based deployment, emphasizing maintainability and testability through code refactoring, Ruff formatting, and expanded unit testing. By addressing dependency management, CI stability, and performance testing, Dusan ensured reliable deployments and robust backend performance, leveraging Python, C++, and Docker to support evolving AI workloads and developer productivity.

Overall Statistics

Feature vs Bugs

86%Features

Repository Contributions

42Total
Bugs
4
Commits
42
Features
24
Lines of code
8,697
Activity Months3

Work History

February 2026

1 Commits • 1 Features

Feb 1, 2026

February 2026: Delivered a production-grade C++ LLM engine within tenstorrent/tt-inference-server featuring paged attention, prefix caching, and a sequence scheduler. Implemented end-to-end testing, fixed scheduler bugs, and stabilized CI. This work improves inference throughput, reduces latency for long-context tasks, and positions the platform for scalable language-model workloads.

January 2026

5 Commits • 4 Features

Jan 1, 2026

January 2026: Delivered stability and usability improvements to the Tenstorrent Inference Server. Implemented dependency upgrades and MTEB compatibility fixes to prevent Torch downgrades, introduced clearer model spec environment variables, added a UI badge for visibility, and expanded LLM performance testing with a dedicated runner and better observability. These changes reduce deployment risk, improve test reliability, and enhance product credibility across deployments.

December 2025

36 Commits • 19 Features

Dec 1, 2025

Month: 2025-12 — Tenstorrent tt-inference-server monthly summary focusing on delivering business value through OpenAI API compatibility, streaming performance, reliability, and developer tooling. Key features delivered include: OpenAI-compatible completions API; improved TT member detection and username-based identification; two-streaming-decoding in parallel with SSE streaming; vLLM plugin integration and Docker deployment; LLM settings moved to separate config; Demo UI and profiling tooling; Test Runner; and broad code quality improvements. Major bugs fixed include Slack notification JSON formatting, TT member detection fixes, performance test stability, and unit tests. These efforts resulted in improved API compatibility, higher throughput, more predictable performance under tests, and a stronger, maintainable codebase. Technologies demonstrated include OpenAI API compatibility, vLLM plugin integration, SSE streaming, Docker, Ruff formatting, test-driven development, and coding guidelines.

Activity

Loading activity data...

Quality Metrics

Correctness91.4%
Maintainability88.2%
Architecture89.0%
Performance87.6%
AI Usage34.8%

Skills & Technologies

Programming Languages

C++DockerfileHTMLJavaScriptMarkdownPythonShellYAML

Technical Skills

AI IntegrationAPI DevelopmentAPI IntegrationAPI developmentAsynchronous ProgrammingBackend DevelopmentC++ developmentCI/CDCode FormattingCode RefactoringContainerizationContinuous IntegrationDependency ManagementDevOpsDocker

Repositories Contributed To

1 repo

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-inference-server

Dec 2025 Feb 2026
3 Months active

Languages Used

DockerfileHTMLJavaScriptMarkdownPythonShellYAMLC++

Technical Skills

AI IntegrationAPI DevelopmentAPI IntegrationAPI developmentAsynchronous ProgrammingBackend Development