EXCEEDS logo
Exceeds
Djordje Madic

PROFILE

Djordje Madic

Worked on the tenstorrent/tt-inference-server, delivering a production-ready C++ LLM engine with paged attention, prefix caching, and a sequence scheduler to improve inference throughput and latency for long-context tasks. Enhanced API compatibility by implementing an OpenAI-compatible completions API and integrated vLLM plugins for flexible model deployment. Focused on backend development using Python and C++, introduced robust test infrastructure with unit tests and a dedicated LLM test runner, and stabilized CI pipelines. Addressed dependency management, Docker deployment, and performance testing, while improving code maintainability through refactoring, documentation, and coding guidelines. These efforts strengthened reliability and scalability for language model workloads.

Overall Statistics

Feature vs Bugs

86%Features

Repository Contributions

42Total
Bugs
4
Commits
42
Features
24
Lines of code
8,697
Activity Months3

Work History

February 2026

1 Commits • 1 Features

Feb 1, 2026

February 2026: Delivered a production-grade C++ LLM engine within tenstorrent/tt-inference-server featuring paged attention, prefix caching, and a sequence scheduler. Implemented end-to-end testing, fixed scheduler bugs, and stabilized CI. This work improves inference throughput, reduces latency for long-context tasks, and positions the platform for scalable language-model workloads.

January 2026

5 Commits • 4 Features

Jan 1, 2026

January 2026: Delivered stability and usability improvements to the Tenstorrent Inference Server. Implemented dependency upgrades and MTEB compatibility fixes to prevent Torch downgrades, introduced clearer model spec environment variables, added a UI badge for visibility, and expanded LLM performance testing with a dedicated runner and better observability. These changes reduce deployment risk, improve test reliability, and enhance product credibility across deployments.

December 2025

36 Commits • 19 Features

Dec 1, 2025

Month: 2025-12 — Tenstorrent tt-inference-server monthly summary focusing on delivering business value through OpenAI API compatibility, streaming performance, reliability, and developer tooling. Key features delivered include: OpenAI-compatible completions API; improved TT member detection and username-based identification; two-streaming-decoding in parallel with SSE streaming; vLLM plugin integration and Docker deployment; LLM settings moved to separate config; Demo UI and profiling tooling; Test Runner; and broad code quality improvements. Major bugs fixed include Slack notification JSON formatting, TT member detection fixes, performance test stability, and unit tests. These efforts resulted in improved API compatibility, higher throughput, more predictable performance under tests, and a stronger, maintainable codebase. Technologies demonstrated include OpenAI API compatibility, vLLM plugin integration, SSE streaming, Docker, Ruff formatting, test-driven development, and coding guidelines.

Activity

Loading activity data...

Quality Metrics

Correctness91.4%
Maintainability88.2%
Architecture89.0%
Performance87.6%
AI Usage34.8%

Skills & Technologies

Programming Languages

C++DockerfileHTMLJavaScriptMarkdownPythonShellYAML

Technical Skills

AI IntegrationAPI DevelopmentAPI IntegrationAPI developmentAsynchronous ProgrammingBackend DevelopmentC++ developmentCI/CDCode FormattingCode RefactoringContainerizationContinuous IntegrationDependency ManagementDevOpsDocker

Repositories Contributed To

1 repo

Overview of all repositories you've contributed to across your timeline

tenstorrent/tt-inference-server

Dec 2025 Feb 2026
3 Months active

Languages Used

DockerfileHTMLJavaScriptMarkdownPythonShellYAMLC++

Technical Skills

AI IntegrationAPI DevelopmentAPI IntegrationAPI developmentAsynchronous ProgrammingBackend Development