
Max Picca contributed to the OpenXiangShan project by engineering robust memory subsystem and prefetching enhancements across repositories such as CoupledL2 and XiangShan. He refactored the uncache subsystem for improved throughput, introduced dynamic L2 prefetch control, and strengthened memory protection through PMP validation. Leveraging SystemVerilog, Chisel, and Python, Max implemented CI/CD pipelines for performance testing and automated multi-server orchestration. His work included latency tuning for CHI protocol adaptation and cross-repo debugging to ensure correctness in multi-core environments. These efforts resulted in more reliable, configurable, and performant hardware designs, demonstrating depth in low-level systems, hardware simulation, and performance optimization.

For 2025-10, OpenXiangShan/CoupledL2 delivered enhanced prefetch observability and dynamic performance tuning, strengthening data-driven decisions and overall efficiency of the L2 prefetch pipeline. The work improves metric accuracy, enables finer control of prefetch behavior, and lays groundwork for future performance optimizations across targets such as Berti.
For 2025-10, OpenXiangShan/CoupledL2 delivered enhanced prefetch observability and dynamic performance tuning, strengthening data-driven decisions and overall efficiency of the L2 prefetch pipeline. The work improves metric accuracy, enables finer control of prefetch behavior, and lays groundwork for future performance optimizations across targets such as Berti.
May 2025 – OpenXiangShan/CoupledL2 delivered a performance-focused feature to tune BestOffsetPrefetch latency in response to CHI version changes. By increasing the default and maximum latency values for the data queue, the feature improves throughput and tail latency stability when CHI characteristics shift. The change is committed as 09ed27ede9bd56a1f59d5100d84f802644eb4bac with the message 'perf(BOP): use large fixed latency to adapt CHI version (#413)'. Overall impact: smoother data-path performance, greater stability, and better scalability under CHI updates. Technologies/skills demonstrated: latency tuning, performance engineering, low-level data-path optimization, and CHI protocol awareness.
May 2025 – OpenXiangShan/CoupledL2 delivered a performance-focused feature to tune BestOffsetPrefetch latency in response to CHI version changes. By increasing the default and maximum latency values for the data queue, the feature improves throughput and tail latency stability when CHI characteristics shift. The change is committed as 09ed27ede9bd56a1f59d5100d84f802644eb4bac with the message 'perf(BOP): use large fixed latency to adapt CHI version (#413)'. Overall impact: smoother data-path performance, greater stability, and better scalability under CHI updates. Technologies/skills demonstrated: latency tuning, performance engineering, low-level data-path optimization, and CHI protocol awareness.
April 2025 focused memory-subsystem enhancements and reliability improvements across two OpenXiangShan repositories. The work delivered targeted features for testing memory access patterns and fixed critical memory handling bugs, delivering tangible business value through stronger test fidelity, stability, and reduced production risk.
April 2025 focused memory-subsystem enhancements and reliability improvements across two OpenXiangShan repositories. The work delivered targeted features for testing memory access patterns and fixed critical memory handling bugs, delivering tangible business value through stronger test fidelity, stability, and reduced production risk.
March 2025 performance highlights across OpenXiangShan projects. Implemented targeted features and critical fixes to boost stability, reliability, and cross-repo consistency. Key outcomes include a type-safety and unsigned semantics fix in MSHRCtl.scala, a robust artifact-clean step for multi-config builds, the introduction of a PBMT-based validation for AMO and misaligned memory accesses, and an alignment upgrade of the NEMU reference with RISC-V interpreter build configurations. These efforts reduce runtime errors, improve memory-operating correctness, and streamline build and validation pipelines across CoupledL2, NEMU, and ready-to-run, delivering clearer baselines and faster, more predictable releases.
March 2025 performance highlights across OpenXiangShan projects. Implemented targeted features and critical fixes to boost stability, reliability, and cross-repo consistency. Key outcomes include a type-safety and unsigned semantics fix in MSHRCtl.scala, a robust artifact-clean step for multi-config builds, the introduction of a PBMT-based validation for AMO and misaligned memory accesses, and an alignment upgrade of the NEMU reference with RISC-V interpreter build configurations. These efforts reduce runtime errors, improve memory-operating correctness, and streamline build and validation pipelines across CoupledL2, NEMU, and ready-to-run, delivering clearer baselines and faster, more predictable releases.
January 2025: OpenXiangShan/CoupledL2 delivered foundational enhancements to the unified prefetching subsystem, improving memory safety, predictability, and per-core configurability. The work focused on consolidating prefetching improvements, validating physical memory ranges for vaddr prefetching, refactoring BOP logic to rely on tlbcmd.read, and introducing PrefetchCtrlFromCore for fine-grained control of prefetch components at the core level. These changes reduce risk, improve cache efficiency, and enable more dynamic performance tuning across workloads.
January 2025: OpenXiangShan/CoupledL2 delivered foundational enhancements to the unified prefetching subsystem, improving memory safety, predictability, and per-core configurability. The work focused on consolidating prefetching improvements, validating physical memory ranges for vaddr prefetching, refactoring BOP logic to rely on tlbcmd.read, and introducing PrefetchCtrlFromCore for fine-grained control of prefetch components at the core level. These changes reduce risk, improve cache efficiency, and enable more dynamic performance tuning across workloads.
December 2024 Monthly Summary (OpenXiangShan project): Key features delivered and major bugs fixed, with emphasis on business value and technical excellence. Key features delivered: - Performance Testing CI Workflow Enhancements for XiangShan: configurable server selection, extended run-time, and resilience via continue-on-error. Ensured multi-server execution uses the correct server list via environment variables. Major bugs fixed: - PMP load access violation handling in BestOffsetPrefetch (CoupledL2): added PMP address-filter checks for load operations to prevent incorrect behavior in the virtual buffer of physical memory, improving memory protection and system stability. Overall impact and accomplishments: - Strengthened memory protection and memory subsystem stability, reducing risk of PMP-related violations during prefetch operations. - Expanded performance testing coverage and reliability, enabling longer runs and more representative measurements across multiple servers. - Improved CI reliability and feedback loops, accelerating performance tuning and release readiness. Technologies/skills demonstrated: - Memory protection and prefetcher safety (PMP, BestOffsetPrefetch) - CI automation and performance testing pipelines - Environment-driven configuration for multi-server orchestration - Multi-server orchestration and robust error handling in CI workflows
December 2024 Monthly Summary (OpenXiangShan project): Key features delivered and major bugs fixed, with emphasis on business value and technical excellence. Key features delivered: - Performance Testing CI Workflow Enhancements for XiangShan: configurable server selection, extended run-time, and resilience via continue-on-error. Ensured multi-server execution uses the correct server list via environment variables. Major bugs fixed: - PMP load access violation handling in BestOffsetPrefetch (CoupledL2): added PMP address-filter checks for load operations to prevent incorrect behavior in the virtual buffer of physical memory, improving memory protection and system stability. Overall impact and accomplishments: - Strengthened memory protection and memory subsystem stability, reducing risk of PMP-related violations during prefetch operations. - Expanded performance testing coverage and reliability, enabling longer runs and more representative measurements across multiple servers. - Improved CI reliability and feedback loops, accelerating performance tuning and release readiness. Technologies/skills demonstrated: - Memory protection and prefetcher safety (PMP, BestOffsetPrefetch) - CI automation and performance testing pipelines - Environment-driven configuration for multi-server orchestration - Multi-server orchestration and robust error handling in CI workflows
November 2024 — OpenXiangShan/XiangShan: major uncache subsystem overhaul and introduction of non-coherent outstanding operations to strengthen memory path reliability and performance. Replaced queue-based uncache with a buffer-based design via a unified LoadQueueUncache, consolidating MMIO and NC paths, improving address matching, data forwarding, and overall correctness and throughput. Introduced non-coherent outstanding operation support and removed MMIO store outstanding handling; refactored outstanding management and forwarding logic. Enabled cross-path sharing of LQUncache (MMIO/NC) with NC data writeback to ldu1-2, and performed targeted style cleanup and standardization. These changes reduce critical-path bugs, provide a solid foundation for future coherence work, and increase memory path reliability and throughput across the XiangShan memory subsystem.
November 2024 — OpenXiangShan/XiangShan: major uncache subsystem overhaul and introduction of non-coherent outstanding operations to strengthen memory path reliability and performance. Replaced queue-based uncache with a buffer-based design via a unified LoadQueueUncache, consolidating MMIO and NC paths, improving address matching, data forwarding, and overall correctness and throughput. Introduced non-coherent outstanding operation support and removed MMIO store outstanding handling; refactored outstanding management and forwarding logic. Enabled cross-path sharing of LQUncache (MMIO/NC) with NC data writeback to ldu1-2, and performed targeted style cleanup and standardization. These changes reduce critical-path bugs, provide a solid foundation for future coherence work, and increase memory path reliability and throughput across the XiangShan memory subsystem.
Month: 2024-10 — Focused on reliability, observability, and correctness of the L2 prefetch and address handling in OpenXiangShan/CoupledL2, with targeted fixes and enhanced debugging to enable safer production deployments and faster diagnosis.
Month: 2024-10 — Focused on reliability, observability, and correctness of the L2 prefetch and address handling in OpenXiangShan/CoupledL2, with targeted fixes and enhanced debugging to enable safer production deployments and faster diagnosis.
Overview of all repositories you've contributed to across your timeline