EXCEEDS logo
Exceeds
Sai Gopal Reddy Kovvuri

PROFILE

Sai Gopal Reddy Kovvuri

Worked on the apache/tvm repository to implement device-capability-based gating for WebGPU subgroup shuffle primitives, enabling these operations only on supported hardware while defaulting to shared memory reductions elsewhere. This approach preserved broad compatibility and reduced runtime risks by integrating gating logic into the C++ backend, updating target attributes, and exposing user control through a new CLI flag. The solution involved C++ development, GPU programming, and Python-based end-to-end testing, with validation performed on Llama-3.2-1B-q4f16_1B models. The work provided a maintainable mechanism for toggling advanced primitives and improved runtime performance on compatible devices without sacrificing universality.

Overall Statistics

Feature vs Bugs

100%Features

Repository Contributions

1Total
Bugs
0
Commits
1
Features
1
Lines of code
231
Activity Months1

Work History

April 2026

1 Commits • 1 Features

Apr 1, 2026

April 2026 monthly highlights for apache/tvm: - WebGPU subgroup shuffle gating delivered: subgroup shuffle primitives are now generated only when the target device supports subgroups; otherwise, code paths fall back to shared memory reductions. This preserves compatibility across a broad range of devices while enabling performance on capable hardware. - Key delivery items: 1) TVM target integration: UpdateWebGPUAttrs() now sets thread_warp_size=32 when supports_subgroups=true, gating subgroup reductions at the source. 2) CLI and user surface: Added --enable-subgroups flag in mlc-llm to surface the gating option to users. 3) Reduction-path gating: IsWarpReduction() logic in lower_thread_allreduce.cc ensures subgroup ops are generated only when explicitly enabled, with safe defaults to shared-memory reductions. 4) Validation: End-to-end tests on Llama-3.2-1B-q4f16_1B demonstrate baseline (no subgroups) and subgroup-enabled paths, confirming correct gating behavior and measurable performance opportunities on compatible devices. - Overall impact: Improves runtime performance for WebGPU deployments on capable devices while maintaining universal compatibility, reduces risk of runtime incompatibilities, and provides a maintainable mechanism to toggle advanced primitives. - Technologies/skills demonstrated: TVM WebGPU backend, gating logic design, compile-time flag handling, target attribute manipulation, CLI tool integration, reduction-path instrumentation, end-to-end validation.

Activity

Loading activity data...

Quality Metrics

Correctness100.0%
Maintainability80.0%
Architecture100.0%
Performance80.0%
AI Usage40.0%

Skills & Technologies

Programming Languages

C++Python

Technical Skills

C++ developmentGPU programmingPython testingWebGPU

Repositories Contributed To

1 repo

Overview of all repositories you've contributed to across your timeline

apache/tvm

Apr 2026 Apr 2026
1 Month active

Languages Used

C++Python

Technical Skills

C++ developmentGPU programmingPython testingWebGPU