
Heslami contributed to the nvidia-cosmos/cosmos-rl repository by delivering experimental support for Deepseek V3 GRPO, integrating it into the reinforcement learning framework and updating data packing and rollout mechanisms to accommodate the new model type. Using Python and TOML, Heslami focused on configuration management, distributed training, and LLM integration to enhance RL experimentation and evaluation workflows. In the following month, Heslami removed legacy Deepseek-V3 code and configurations, fixed a GRPO bug related to MoE rollout, and refactored unused logic in the weight mapper. These changes improved stability, reduced runtime risk, and streamlined onboarding through updated documentation and clearer model configuration.

In September 2025, delivered targeted cleanup and stability enhancements for Deepseek-V3 in nvidia-cosmos/cosmos-rl. Removed the legacy Deepseek-V3 implementation and related configuration to reduce confusion after the new Deepseek V3 and R1 support. Fixed a GRPO bug in Deepseek-V3 when using EP by removing it from the MoE rollout and cleaned up unused logic in the weight mapper to improve compatibility and prevent assertion errors. Updated documentation accordingly. These changes reduce runtime risk, align with the latest Deepseek-V3/R1 features, and simplify onboarding and usage.
In September 2025, delivered targeted cleanup and stability enhancements for Deepseek-V3 in nvidia-cosmos/cosmos-rl. Removed the legacy Deepseek-V3 implementation and related configuration to reduce confusion after the new Deepseek V3 and R1 support. Fixed a GRPO bug in Deepseek-V3 when using EP by removing it from the MoE rollout and cleaned up unused logic in the weight mapper to improve compatibility and prevent assertion errors. Updated documentation accordingly. These changes reduce runtime risk, align with the latest Deepseek-V3/R1 features, and simplify onboarding and usage.
Month: 2025-08 – Key accomplishments include delivering Deepseek V3 GRPO experimental support in nvidia-cosmos/cosmos-rl with a new configuration, integrating the model into the reinforcement learning framework, and updating data packing and rollout mechanisms to accommodate the Deepseek V3 GRPO model type. No major bugs fixed this period. This work enhances RL experimentation capabilities and accelerates evaluation of Deepseek V3 within the Cosmos RL pipeline.
Month: 2025-08 – Key accomplishments include delivering Deepseek V3 GRPO experimental support in nvidia-cosmos/cosmos-rl with a new configuration, integrating the model into the reinforcement learning framework, and updating data packing and rollout mechanisms to accommodate the Deepseek V3 GRPO model type. No major bugs fixed this period. This work enhances RL experimentation capabilities and accelerates evaluation of Deepseek V3 within the Cosmos RL pipeline.
Overview of all repositories you've contributed to across your timeline