
Max Picca contributed to the OpenXiangShan project by engineering robust memory subsystem features and performance optimizations across repositories such as XiangShan and CoupledL2. He overhauled uncache logic, unified MMIO and non-coherent memory paths, and enhanced prefetching mechanisms to improve throughput and reliability. Using Chisel, Scala, and SystemVerilog, Max refactored RTL components for timing closure, introduced dynamic prefetch controls, and strengthened CI/CD pipelines with Python and shell scripting. His work addressed memory protection, observability, and cross-repo consistency, resulting in more accurate hardware models, streamlined validation, and maintainable code. The depth of his contributions advanced both system stability and test coverage.
December 2025 monthly summary: Four key deliverables across OpenXiangShan/XiangShan and OpenXiangShan/CoupledL2 focusing on clarity of performance results, repository hygiene, enhanced prefetch analytics, and CI reliability. These changes improve user-facing readability of performance tests, maintain a clean repository, provide richer memory-access metrics, and strengthen build stability with submodule checks and modularization. No major customer-facing bugs detected; included a minor edge-case fix for prefetch option default values as part of the refactor. Technologies demonstrated include environment-script alignment, Git hygiene practices, vectorized metric collection, and CI workflow enhancements. Business value: clearer decision data, cleaner codebase, better prefetch visibility, and more reliable deployments.
December 2025 monthly summary: Four key deliverables across OpenXiangShan/XiangShan and OpenXiangShan/CoupledL2 focusing on clarity of performance results, repository hygiene, enhanced prefetch analytics, and CI reliability. These changes improve user-facing readability of performance tests, maintain a clean repository, provide richer memory-access metrics, and strengthen build stability with submodule checks and modularization. No major customer-facing bugs detected; included a minor edge-case fix for prefetch option default values as part of the refactor. Technologies demonstrated include environment-script alignment, Git hygiene practices, vectorized metric collection, and CI workflow enhancements. Business value: clearer decision data, cleaner codebase, better prefetch visibility, and more reliable deployments.
November 2025 monthly summary for OpenXiangShan/CoupledL2 focusing on delivery of PrefetchReqBuffer Timing and Path Length Optimization to improve prefetching performance and timing closure in the L2 cache subsystem. The work refactors handling of request signals to reduce the critical path affected by clock gating, improving throughput and reliability. Key outcomes include improved signal propagation timing, reduced long-path risk, and enhanced maintainability of the prefetch logic.
November 2025 monthly summary for OpenXiangShan/CoupledL2 focusing on delivery of PrefetchReqBuffer Timing and Path Length Optimization to improve prefetching performance and timing closure in the L2 cache subsystem. The work refactors handling of request signals to reduce the critical path affected by clock gating, improving throughput and reliability. Key outcomes include improved signal propagation timing, reduced long-path risk, and enhanced maintainability of the prefetch logic.
For 2025-10, OpenXiangShan/CoupledL2 delivered enhanced prefetch observability and dynamic performance tuning, strengthening data-driven decisions and overall efficiency of the L2 prefetch pipeline. The work improves metric accuracy, enables finer control of prefetch behavior, and lays groundwork for future performance optimizations across targets such as Berti.
For 2025-10, OpenXiangShan/CoupledL2 delivered enhanced prefetch observability and dynamic performance tuning, strengthening data-driven decisions and overall efficiency of the L2 prefetch pipeline. The work improves metric accuracy, enables finer control of prefetch behavior, and lays groundwork for future performance optimizations across targets such as Berti.
May 2025 – OpenXiangShan/CoupledL2 delivered a performance-focused feature to tune BestOffsetPrefetch latency in response to CHI version changes. By increasing the default and maximum latency values for the data queue, the feature improves throughput and tail latency stability when CHI characteristics shift. The change is committed as 09ed27ede9bd56a1f59d5100d84f802644eb4bac with the message 'perf(BOP): use large fixed latency to adapt CHI version (#413)'. Overall impact: smoother data-path performance, greater stability, and better scalability under CHI updates. Technologies/skills demonstrated: latency tuning, performance engineering, low-level data-path optimization, and CHI protocol awareness.
May 2025 – OpenXiangShan/CoupledL2 delivered a performance-focused feature to tune BestOffsetPrefetch latency in response to CHI version changes. By increasing the default and maximum latency values for the data queue, the feature improves throughput and tail latency stability when CHI characteristics shift. The change is committed as 09ed27ede9bd56a1f59d5100d84f802644eb4bac with the message 'perf(BOP): use large fixed latency to adapt CHI version (#413)'. Overall impact: smoother data-path performance, greater stability, and better scalability under CHI updates. Technologies/skills demonstrated: latency tuning, performance engineering, low-level data-path optimization, and CHI protocol awareness.
April 2025 focused memory-subsystem enhancements and reliability improvements across two OpenXiangShan repositories. The work delivered targeted features for testing memory access patterns and fixed critical memory handling bugs, delivering tangible business value through stronger test fidelity, stability, and reduced production risk.
April 2025 focused memory-subsystem enhancements and reliability improvements across two OpenXiangShan repositories. The work delivered targeted features for testing memory access patterns and fixed critical memory handling bugs, delivering tangible business value through stronger test fidelity, stability, and reduced production risk.
March 2025 performance highlights across OpenXiangShan projects. Implemented targeted features and critical fixes to boost stability, reliability, and cross-repo consistency. Key outcomes include a type-safety and unsigned semantics fix in MSHRCtl.scala, a robust artifact-clean step for multi-config builds, the introduction of a PBMT-based validation for AMO and misaligned memory accesses, and an alignment upgrade of the NEMU reference with RISC-V interpreter build configurations. These efforts reduce runtime errors, improve memory-operating correctness, and streamline build and validation pipelines across CoupledL2, NEMU, and ready-to-run, delivering clearer baselines and faster, more predictable releases.
March 2025 performance highlights across OpenXiangShan projects. Implemented targeted features and critical fixes to boost stability, reliability, and cross-repo consistency. Key outcomes include a type-safety and unsigned semantics fix in MSHRCtl.scala, a robust artifact-clean step for multi-config builds, the introduction of a PBMT-based validation for AMO and misaligned memory accesses, and an alignment upgrade of the NEMU reference with RISC-V interpreter build configurations. These efforts reduce runtime errors, improve memory-operating correctness, and streamline build and validation pipelines across CoupledL2, NEMU, and ready-to-run, delivering clearer baselines and faster, more predictable releases.
January 2025: OpenXiangShan/CoupledL2 delivered foundational enhancements to the unified prefetching subsystem, improving memory safety, predictability, and per-core configurability. The work focused on consolidating prefetching improvements, validating physical memory ranges for vaddr prefetching, refactoring BOP logic to rely on tlbcmd.read, and introducing PrefetchCtrlFromCore for fine-grained control of prefetch components at the core level. These changes reduce risk, improve cache efficiency, and enable more dynamic performance tuning across workloads.
January 2025: OpenXiangShan/CoupledL2 delivered foundational enhancements to the unified prefetching subsystem, improving memory safety, predictability, and per-core configurability. The work focused on consolidating prefetching improvements, validating physical memory ranges for vaddr prefetching, refactoring BOP logic to rely on tlbcmd.read, and introducing PrefetchCtrlFromCore for fine-grained control of prefetch components at the core level. These changes reduce risk, improve cache efficiency, and enable more dynamic performance tuning across workloads.
December 2024 Monthly Summary (OpenXiangShan project): Key features delivered and major bugs fixed, with emphasis on business value and technical excellence. Key features delivered: - Performance Testing CI Workflow Enhancements for XiangShan: configurable server selection, extended run-time, and resilience via continue-on-error. Ensured multi-server execution uses the correct server list via environment variables. Major bugs fixed: - PMP load access violation handling in BestOffsetPrefetch (CoupledL2): added PMP address-filter checks for load operations to prevent incorrect behavior in the virtual buffer of physical memory, improving memory protection and system stability. Overall impact and accomplishments: - Strengthened memory protection and memory subsystem stability, reducing risk of PMP-related violations during prefetch operations. - Expanded performance testing coverage and reliability, enabling longer runs and more representative measurements across multiple servers. - Improved CI reliability and feedback loops, accelerating performance tuning and release readiness. Technologies/skills demonstrated: - Memory protection and prefetcher safety (PMP, BestOffsetPrefetch) - CI automation and performance testing pipelines - Environment-driven configuration for multi-server orchestration - Multi-server orchestration and robust error handling in CI workflows
December 2024 Monthly Summary (OpenXiangShan project): Key features delivered and major bugs fixed, with emphasis on business value and technical excellence. Key features delivered: - Performance Testing CI Workflow Enhancements for XiangShan: configurable server selection, extended run-time, and resilience via continue-on-error. Ensured multi-server execution uses the correct server list via environment variables. Major bugs fixed: - PMP load access violation handling in BestOffsetPrefetch (CoupledL2): added PMP address-filter checks for load operations to prevent incorrect behavior in the virtual buffer of physical memory, improving memory protection and system stability. Overall impact and accomplishments: - Strengthened memory protection and memory subsystem stability, reducing risk of PMP-related violations during prefetch operations. - Expanded performance testing coverage and reliability, enabling longer runs and more representative measurements across multiple servers. - Improved CI reliability and feedback loops, accelerating performance tuning and release readiness. Technologies/skills demonstrated: - Memory protection and prefetcher safety (PMP, BestOffsetPrefetch) - CI automation and performance testing pipelines - Environment-driven configuration for multi-server orchestration - Multi-server orchestration and robust error handling in CI workflows
November 2024 — OpenXiangShan/XiangShan: major uncache subsystem overhaul and introduction of non-coherent outstanding operations to strengthen memory path reliability and performance. Replaced queue-based uncache with a buffer-based design via a unified LoadQueueUncache, consolidating MMIO and NC paths, improving address matching, data forwarding, and overall correctness and throughput. Introduced non-coherent outstanding operation support and removed MMIO store outstanding handling; refactored outstanding management and forwarding logic. Enabled cross-path sharing of LQUncache (MMIO/NC) with NC data writeback to ldu1-2, and performed targeted style cleanup and standardization. These changes reduce critical-path bugs, provide a solid foundation for future coherence work, and increase memory path reliability and throughput across the XiangShan memory subsystem.
November 2024 — OpenXiangShan/XiangShan: major uncache subsystem overhaul and introduction of non-coherent outstanding operations to strengthen memory path reliability and performance. Replaced queue-based uncache with a buffer-based design via a unified LoadQueueUncache, consolidating MMIO and NC paths, improving address matching, data forwarding, and overall correctness and throughput. Introduced non-coherent outstanding operation support and removed MMIO store outstanding handling; refactored outstanding management and forwarding logic. Enabled cross-path sharing of LQUncache (MMIO/NC) with NC data writeback to ldu1-2, and performed targeted style cleanup and standardization. These changes reduce critical-path bugs, provide a solid foundation for future coherence work, and increase memory path reliability and throughput across the XiangShan memory subsystem.
Month: 2024-10 — Focused on reliability, observability, and correctness of the L2 prefetch and address handling in OpenXiangShan/CoupledL2, with targeted fixes and enhanced debugging to enable safer production deployments and faster diagnosis.
Month: 2024-10 — Focused on reliability, observability, and correctness of the L2 prefetch and address handling in OpenXiangShan/CoupledL2, with targeted fixes and enhanced debugging to enable safer production deployments and faster diagnosis.
September 2024: Delivered core memory subsystem and ISA-simulation enhancements across OpenXiangShan/XiangShan and OpenXiangShan/riscv-isa-sim. Implemented Write Memory Order (WMO) support for Non-Cacheable (NC) memory, including NCBuffer for NC load requests and refined StoreQueue handling for NC stores. Added SVPBMT support in the RISC-V ISA simulator to broaden memory-operation coverage. These workstreams improve memory ordering correctness, verification coverage, and readiness for future features, delivering business value through more accurate hardware behavior models and faster validation cycles.
September 2024: Delivered core memory subsystem and ISA-simulation enhancements across OpenXiangShan/XiangShan and OpenXiangShan/riscv-isa-sim. Implemented Write Memory Order (WMO) support for Non-Cacheable (NC) memory, including NCBuffer for NC load requests and refined StoreQueue handling for NC stores. Added SVPBMT support in the RISC-V ISA simulator to broaden memory-operation coverage. These workstreams improve memory ordering correctness, verification coverage, and readiness for future features, delivering business value through more accurate hardware behavior models and faster validation cycles.

Overview of all repositories you've contributed to across your timeline