
Andrew Richey engineered robust data automation and integration pipelines for the NYCPlanning/data-engineering repository, focusing on scalable data delivery, governance, and reliability. He designed modular connectors and refactored ingestion workflows to support multi-destination distribution, versioned datasets, and reproducible pipelines, leveraging Python, SQL, and Pydantic for data modeling and validation. His work included CLI tooling for Excel automation, geospatial data packaging, and XML-to-Pydantic model generation, all underpinned by CI/CD and configuration-driven deployment. By modernizing metadata handling, error logging, and storage abstraction, Andrew delivered maintainable, testable systems that improved data quality, operational efficiency, and deployment safety across complex urban datasets.

October 2025 Highlights for NYCPlanning/data-engineering: Delivered cross-source version management tooling, a refactored ingestion pipeline for unified storage paths, and XML-to-Pydantic model tooling. Fixed critical reliability issues in sitemap loading, dataset version fetch, and OpenData publishing race conditions, enhancing data freshness, consistency, and publication stability. Technologies demonstrated include Python, pandas, cloudpathlib, pydantic, and integration tests.
October 2025 Highlights for NYCPlanning/data-engineering: Delivered cross-source version management tooling, a refactored ingestion pipeline for unified storage paths, and XML-to-Pydantic model tooling. Fixed critical reliability issues in sitemap loading, dataset version fetch, and OpenData publishing race conditions, enhancing data freshness, consistency, and publication stability. Technologies demonstrated include Python, pandas, cloudpathlib, pydantic, and integration tests.
September 2025: Focused on stabilizing data pipelines and hardening CI/CD for NYCPlanning/data-engineering. Delivered dataset ingestion stability with version pinning modernization, introduced a temporary path-handling fix for edm.publishing to prevent misinterpreting directories as file extensions, strengthened the distribution workflow with explicit failure handling and CI defaults, and modernized sitemap processing with Pydantic-based validation and JSON configuration with versioned filenames. These efforts reduced manual corrections, improved build reliability, and increased reproducibility across environments. Technologies demonstrated include Python refactoring, data validation with Pydantic, JSON configuration, and CI/CD parameterization.
September 2025: Focused on stabilizing data pipelines and hardening CI/CD for NYCPlanning/data-engineering. Delivered dataset ingestion stability with version pinning modernization, introduced a temporary path-handling fix for edm.publishing to prevent misinterpreting directories as file extensions, strengthened the distribution workflow with explicit failure handling and CI defaults, and modernized sitemap processing with Pydantic-based validation and JSON configuration with versioned filenames. These efforts reduced manual corrections, improved build reliability, and increased reproducibility across environments. Technologies demonstrated include Python refactoring, data validation with Pydantic, JSON configuration, and CI/CD parameterization.
For August 2025, delivered significant data-engineering improvements for the NYCPlanning data ecosystem, focusing on data integration, ingestion reliability, and publishing modernization across SEDAT, NYS, PLUTO, product distribution, and Socrata. The work enhances data completeness, quality, and deployment safety, enabling richer planning analytics and faster, safer data product releases.
For August 2025, delivered significant data-engineering improvements for the NYCPlanning data ecosystem, focusing on data integration, ingestion reliability, and publishing modernization across SEDAT, NYS, PLUTO, product distribution, and Socrata. The work enhances data completeness, quality, and deployment safety, enabling richer planning analytics and faster, safer data product releases.
July 2025 focused on delivering two high-impact data-engineering capabilities for NYC Planning: analytics enablement for critical infrastructure segments and reproducible data pipelines through versioned datasets. The work improves reporting, operational analytics, and data governance for NYC datasets while reinforcing reliability and scalability of data Ops.
July 2025 focused on delivering two high-impact data-engineering capabilities for NYC Planning: analytics enablement for critical infrastructure segments and reproducible data pipelines through versioned datasets. The work improves reporting, operational analytics, and data governance for NYC datasets while reinforcing reliability and scalability of data Ops.
April 2025 monthly summary for NYCPlanning/data-engineering: Delivered a major overhaul of the Data Pipeline Build and Publish lifecycle. Introduced BuildsConnector and integrated it into the lifecycle management of data products, refactoring the publish flow. Modernized the build output folder structure and added new CLI commands for planning and building data pipelines. Updated recipe configurations to support stage-specific settings and environment variable resolution for build notes. Major bugs fixed: none reported this month. Overall impact: established a scalable, reproducible deployment workflow that reduces manual steps, lowers risk of deploy-time errors, and accelerates data product releases. Demonstrated technologies/skills: architectural refactor, modular component design (BuildsConnector), CLI tooling, configuration-driven deployment, stage-aware settings, and environment variable resolution.
April 2025 monthly summary for NYCPlanning/data-engineering: Delivered a major overhaul of the Data Pipeline Build and Publish lifecycle. Introduced BuildsConnector and integrated it into the lifecycle management of data products, refactoring the publish flow. Modernized the build output folder structure and added new CLI commands for planning and building data pipelines. Updated recipe configurations to support stage-specific settings and environment variable resolution for build notes. Major bugs fixed: none reported this month. Overall impact: established a scalable, reproducible deployment workflow that reduces manual steps, lowers risk of deploy-time errors, and accelerates data product releases. Demonstrated technologies/skills: architectural refactor, modular component design (BuildsConnector), CLI tooling, configuration-driven deployment, stage-aware settings, and environment variable resolution.
March 2025 monthly summary for NYCPlanning/data-engineering focused on reliability, data quality, and governance enhancements that enable faster, more trustworthy planning analytics. Delivered robust geospatial data quality improvements for COLUP/PLUTO, and expanded JSON-based ingestion for Population Fact Finder with metadata/versioning, CLI modernization, and QA/QC reporting to strengthen data governance across ACS and Decennial datasets. These efforts reduced ingestion errors, improved nightly pipeline stability, and provided clearer data quality signals for downstream analytics.
March 2025 monthly summary for NYCPlanning/data-engineering focused on reliability, data quality, and governance enhancements that enable faster, more trustworthy planning analytics. Delivered robust geospatial data quality improvements for COLUP/PLUTO, and expanded JSON-based ingestion for Population Fact Finder with metadata/versioning, CLI modernization, and QA/QC reporting to strengthen data governance across ACS and Decennial datasets. These efforts reduced ingestion errors, improved nightly pipeline stability, and provided clearer data quality signals for downstream analytics.
February 2025 performance summary for NYCPlanning/data-engineering: Focused on delivering automated data integration and robust geospatial packaging capabilities that drive data accuracy and operational efficiency. Key features delivered: - Automated Excel data merge utility: automates updating a target Excel from a source Excel using keyed row matching; includes a CLI to apply changes and tests for missing/duplicate keys. Implemented in commit 0c3f2d56a0efe9a4c0df258b14ba4fe90aa6eead (Enable Excel cross-file keyed updates, #1441). Business value: reduces manual reconciliation and ensures data consistency across Excel-based workflows. - Multilayer shapefile packaging and error logging: enables assembling/distributing data from multilayer shapefiles with per-layer processing and improved error logging and clearer argument names. Implemented in commit 507f3f74bf9f3eff795c10295d71cf7743fa157c (Enable Assembling/Distributing from a Multilayer shapefile, #1435). Business value: more reliable geospatial data packaging and easier troubleshooting. Major bugs fixed: - No major bugs fixed this month; the focus was on feature delivery and reliability improvements through enhanced tests and error logging. Overall impact and accomplishments: - Strengthened data automation and geospatial data distribution capabilities; improved data integrity, pipeline efficiency, and maintainability; delivered via verifiable commits and test coverage. Technologies/skills demonstrated: - Python tooling for data processing, CLI design, test-driven development, geospatial data handling (multilayer shapefiles), and robust error logging.
February 2025 performance summary for NYCPlanning/data-engineering: Focused on delivering automated data integration and robust geospatial packaging capabilities that drive data accuracy and operational efficiency. Key features delivered: - Automated Excel data merge utility: automates updating a target Excel from a source Excel using keyed row matching; includes a CLI to apply changes and tests for missing/duplicate keys. Implemented in commit 0c3f2d56a0efe9a4c0df258b14ba4fe90aa6eead (Enable Excel cross-file keyed updates, #1441). Business value: reduces manual reconciliation and ensures data consistency across Excel-based workflows. - Multilayer shapefile packaging and error logging: enables assembling/distributing data from multilayer shapefiles with per-layer processing and improved error logging and clearer argument names. Implemented in commit 507f3f74bf9f3eff795c10295d71cf7743fa157c (Enable Assembling/Distributing from a Multilayer shapefile, #1435). Business value: more reliable geospatial data packaging and easier troubleshooting. Major bugs fixed: - No major bugs fixed this month; the focus was on feature delivery and reliability improvements through enhanced tests and error logging. Overall impact and accomplishments: - Strengthened data automation and geospatial data distribution capabilities; improved data integrity, pipeline efficiency, and maintainability; delivered via verifiable commits and test coverage. Technologies/skills demonstrated: - Python tooling for data processing, CLI design, test-driven development, geospatial data handling (multilayer shapefiles), and robust error logging.
January 2025: Implemented Dynamic Connector Dispatcher for multi-destination dataset distribution in NYCPlanning/data-engineering. Added SFTP as a new distribution connector, refactored packaging/distribution scripts for generic support, and updated workflows and internal file organization to improve maintainability and scalability. This work establishes groundwork for additional connectors and broader data delivery automation, delivering measurable business value through expanded delivery options and operational efficiencies.
January 2025: Implemented Dynamic Connector Dispatcher for multi-destination dataset distribution in NYCPlanning/data-engineering. Added SFTP as a new distribution connector, refactored packaging/distribution scripts for generic support, and updated workflows and internal file organization to improve maintainability and scalability. This work establishes groundwork for additional connectors and broader data delivery automation, delivering measurable business value through expanded delivery options and operational efficiencies.
December 2024: Strengthened data publication and governance for NYCPlanning/data-engineering. Delivered a cohesive data dictionary system and automated distribution pipeline, enabling reliable publication of datasets to Socrata and AWS S3, improved data discoverability via Excel data dictionaries, and consolidated metadata handling to reduce maintenance and confusion.
December 2024: Strengthened data publication and governance for NYCPlanning/data-engineering. Delivered a cohesive data dictionary system and automated distribution pipeline, enabling reliable publication of datasets to Socrata and AWS S3, improved data discoverability via Excel data dictionaries, and consolidated metadata handling to reduce maintenance and confusion.
November 2024 - NYCPlanning/data-engineering: Reliability and configuration enhancements to the NYCOC Checkbook Script delivering more robust data retrieval, secure configuration via environment variables, and improved debugging. groundwork laid to address an invalid date range bug.
November 2024 - NYCPlanning/data-engineering: Reliability and configuration enhancements to the NYCOC Checkbook Script delivering more robust data retrieval, secure configuration via environment variables, and improved debugging. groundwork laid to address an invalid date range bug.
Overview of all repositories you've contributed to across your timeline