
Over thirteen months, Frank Van Krieken engineered and maintained the NYCPlanning/data-engineering repository, delivering robust data pipelines and scalable ETL workflows for New York City planning datasets. He modernized ingestion systems, unified data models, and enhanced data quality controls using Python, SQL, and dbt. Frank implemented automated CI/CD with GitHub Actions, improved geospatial data handling, and introduced modular connectors for sources like ArcGIS and Socrata. His work addressed data validation, versioning, and export reliability, supporting analytics and governance needs. By refactoring pipelines and strengthening test coverage, Frank ensured maintainable, high-quality data operations that accelerated onboarding and improved downstream data trust.

November 2025 monthly summary for NYCPlanning/data-engineering: Delivered a cross-product dbt_utils upgrade, with selective test exclusions to stabilize CI, and removed the dbt-checkpoint pre-commit hook to simplify local development. Fixed gft 25v3 failing tests (commit ab44ae7dd20a68a84f757adba2bb9cdb1ae5db54), restoring CI reliability across products. Result: more reliable data pipelines, faster feedback, and a maintainable upgrade path for dbt utilities across the portfolio.
November 2025 monthly summary for NYCPlanning/data-engineering: Delivered a cross-product dbt_utils upgrade, with selective test exclusions to stabilize CI, and removed the dbt-checkpoint pre-commit hook to simplify local development. Fixed gft 25v3 failing tests (commit ab44ae7dd20a68a84f757adba2bb9cdb1ae5db54), restoring CI reliability across products. Result: more reliable data pipelines, faster feedback, and a maintainable upgrade path for dbt utilities across the portfolio.
October 2025 monthly summary for NYCPlanning/data-engineering: Delivered a focused set of data-pipeline enhancements that improve cross-source data accuracy, maintainability, and pipeline reliability. Key work included deduplication and cross-source matching for charter schools, unification of the LION data product pipeline, and modernization of the ingest system, complemented by enhanced data QA and a Geodatabase export fix. These efforts reduce data mismatches between DOE LCGMS and NYSED Active Institutions, enable easier onboarding of new data sources, and strengthen validation parity between development and production environments. The month also introduced a foundational experimental structure for entity data management and tightened CI/CD practices to improve reliability and deployment consistency.
October 2025 monthly summary for NYCPlanning/data-engineering: Delivered a focused set of data-pipeline enhancements that improve cross-source data accuracy, maintainability, and pipeline reliability. Key work included deduplication and cross-source matching for charter schools, unification of the LION data product pipeline, and modernization of the ingest system, complemented by enhanced data QA and a Geodatabase export fix. These efforts reduce data mismatches between DOE LCGMS and NYSED Active Institutions, enable easier onboarding of new data sources, and strengthen validation parity between development and production environments. The month also introduced a foundational experimental structure for entity data management and tightened CI/CD practices to improve reliability and deployment consistency.
September 2025 monthly summary for NYCPlanning/data-engineering. Delivered a major modernization of the data ingestion pipeline across PLUTO, CAMA, and ArcGIS sources, enabling generic connectors, ArcGIS feature servers, improved geocoding, and streamlined CI workflows. Implemented PLUTO data quality updates for version 25v3 and expanded data export/reporting capabilities to include budget_request_title with clearer export scripts. Completed FacDB data processing refinements, including QC metrics and DOE mapping alignment. Strengthened infrastructure through dependency updates and refined ingestion scheduling for stability and security. Fixed a关键 issue in Landmarks data retrieval with more robust string matching. These changes collectively improved data freshness, accuracy, and reliability, enabling faster, more trustworthy planning decisions and streamlined reporting. Technologies demonstrated include Python dependency management, CI automation, ArcGIS integration, data quality controls, and robust export tooling.
September 2025 monthly summary for NYCPlanning/data-engineering. Delivered a major modernization of the data ingestion pipeline across PLUTO, CAMA, and ArcGIS sources, enabling generic connectors, ArcGIS feature servers, improved geocoding, and streamlined CI workflows. Implemented PLUTO data quality updates for version 25v3 and expanded data export/reporting capabilities to include budget_request_title with clearer export scripts. Completed FacDB data processing refinements, including QC metrics and DOE mapping alignment. Strengthened infrastructure through dependency updates and refined ingestion scheduling for stability and security. Fixed a关键 issue in Landmarks data retrieval with more robust string matching. These changes collectively improved data freshness, accuracy, and reliability, enabling faster, more trustworthy planning decisions and streamlined reporting. Technologies demonstrated include Python dependency management, CI automation, ArcGIS integration, data quality controls, and robust export tooling.
August 2025 monthly summary for NYCPlanning/data-engineering focusing on key features delivered, major improvements, and business impact. This period delivered reliability, ingestion, and QA enhancements across the data-engineering stack, with emphasis on CI stability, compatibility (GDAL/pyarrow), robust ingestion workflows, and thorough documentation. Key outcomes include more reliable test coverage, easier data ingestion, improved data validation, and better onboarding support for Lion product.
August 2025 monthly summary for NYCPlanning/data-engineering focusing on key features delivered, major improvements, and business impact. This period delivered reliability, ingestion, and QA enhancements across the data-engineering stack, with emphasis on CI stability, compatibility (GDAL/pyarrow), robust ingestion workflows, and thorough documentation. Key outcomes include more reliable test coverage, easier data ingestion, improved data validation, and better onboarding support for Lion product.
July 2025 Monthly Summary for NYCPlanning/data-engineering focusing on delivering scalable CI/CD improvements, data quality gains across Lion, NOW, and PLUTO datasets, and robust data ingestion modernization. Business value delivered includes more reliable multi-product builds, accurate centerline interpolation, improved data integrity, and stronger validation tests, enabling faster and more trustworthy analytics for planning and policy decisions. Tech stack highlights include GitHub Actions for multi-product builds with dev buckets, SQL data modeling and refactors, Python dependency management, and enhanced GitHub API error handling.
July 2025 Monthly Summary for NYCPlanning/data-engineering focusing on delivering scalable CI/CD improvements, data quality gains across Lion, NOW, and PLUTO datasets, and robust data ingestion modernization. Business value delivered includes more reliable multi-product builds, accurate centerline interpolation, improved data integrity, and stronger validation tests, enabling faster and more trustworthy analytics for planning and policy decisions. Tech stack highlights include GitHub Actions for multi-product builds with dev buckets, SQL data modeling and refactors, Python dependency management, and enhanced GitHub API error handling.
June 2025 monthly summary for NYCPlanning/data-engineering. Delivered a focused set of data pipeline enhancements and governance updates that improve public data accessibility, data correctness, and deployment reliability across core datasets (Zoning, PLUTO, LION). These changes support faster, higher-quality releases with clearer ownership and auditability.
June 2025 monthly summary for NYCPlanning/data-engineering. Delivered a focused set of data pipeline enhancements and governance updates that improve public data accessibility, data correctness, and deployment reliability across core datasets (Zoning, PLUTO, LION). These changes support faster, higher-quality releases with clearer ownership and auditability.
May 2025: NYCPlanning/data-engineering monthly summary highlighting key features delivered, major fixes, impact, and technologies demonstrated. Key features delivered: - Center of curvature and curve attribute support for LION (refactored SQL, extended LION model); commit 1cfc8662336427d2f3adca8b978af3e31cd25f85 - SEDAT data ingestion templates and split_election_district_flag calculation; commit 22eb2c93ee707c9360dbfbb5dc5b8d2a299d3f15 - NYPD service area data integration (NYPD beat areas ingestion, link to centerlines within LION); commit 438c5cd8caf40435e9fd01f9415dad1a59192c39 - Community Board Budget Requests (CBBR) data pipeline enhancements (new source data, templates, refactor for input changes and geocoding); commit 1a4f537001ca86b78bcefebf654dfbabb7b7ae7f - LION PoC data export and validation enhancements (row-length validation scripts, improved data transformations, loading utility improvements); commit 94710552ba40e4fcf9511a36984fdd23931c2d10 Major bugs fixed: - Ingestion model filename to filepath alignment in edm-publishing ingest model; commit f6d6983b9e0fff6e6f89675fa8b30424c8783a93 - Library/archive version check refactor via private _is_library method to ensure correct validation; commit d91dd5b73e483db0146d88339419f9badd42282a - CEQR DEP workflow path fix (correct working directory and clarified workflow name); commit 3f05df42e279a4267b905941dc8d818ba09d3e4c - Data file type handling and CRS for archived datasets (pin to pg_dump for missing CRS; unpin some CEQR source file types); commits 87c763a3684e9287ab576d27d7c6e3c82589bbf4, 34b2d969e4a7b179c559b594e23d9aa4565aa0bc - Dependencies, URL updates, and scraping improvements (Python dependencies, SQL macros; Geosupport URL updates; ingest template URLs and container usage); commits db3b2f6feb66e8516c500c9a38fbccc99898d77b, 60be8d8cd1789849d3bb22edab5c2fbd3786da4f, 3cdfc9dca3f7e2340da1e3bc90285321a8539439 Overall impact and accomplishments: - Expanded data sources and robust validation improved data availability, quality, and governance. Production readiness increased through enhanced exports, validation, and dataset preservation, enabling reliable analytics for curved-road, LION, and related workflows. Reduced maintenance by consolidating checks and aligning ingestion-model references with connectors. Technologies/skills demonstrated: - SQL refactors and data-model evolution for LION curves; Python-based ingestion templates and workflow fixes; data QC/validation enhancements; Geosupport URL management; dependency management and containerized pipelines.
May 2025: NYCPlanning/data-engineering monthly summary highlighting key features delivered, major fixes, impact, and technologies demonstrated. Key features delivered: - Center of curvature and curve attribute support for LION (refactored SQL, extended LION model); commit 1cfc8662336427d2f3adca8b978af3e31cd25f85 - SEDAT data ingestion templates and split_election_district_flag calculation; commit 22eb2c93ee707c9360dbfbb5dc5b8d2a299d3f15 - NYPD service area data integration (NYPD beat areas ingestion, link to centerlines within LION); commit 438c5cd8caf40435e9fd01f9415dad1a59192c39 - Community Board Budget Requests (CBBR) data pipeline enhancements (new source data, templates, refactor for input changes and geocoding); commit 1a4f537001ca86b78bcefebf654dfbabb7b7ae7f - LION PoC data export and validation enhancements (row-length validation scripts, improved data transformations, loading utility improvements); commit 94710552ba40e4fcf9511a36984fdd23931c2d10 Major bugs fixed: - Ingestion model filename to filepath alignment in edm-publishing ingest model; commit f6d6983b9e0fff6e6f89675fa8b30424c8783a93 - Library/archive version check refactor via private _is_library method to ensure correct validation; commit d91dd5b73e483db0146d88339419f9badd42282a - CEQR DEP workflow path fix (correct working directory and clarified workflow name); commit 3f05df42e279a4267b905941dc8d818ba09d3e4c - Data file type handling and CRS for archived datasets (pin to pg_dump for missing CRS; unpin some CEQR source file types); commits 87c763a3684e9287ab576d27d7c6e3c82589bbf4, 34b2d969e4a7b179c559b594e23d9aa4565aa0bc - Dependencies, URL updates, and scraping improvements (Python dependencies, SQL macros; Geosupport URL updates; ingest template URLs and container usage); commits db3b2f6feb66e8516c500c9a38fbccc99898d77b, 60be8d8cd1789849d3bb22edab5c2fbd3786da4f, 3cdfc9dca3f7e2340da1e3bc90285321a8539439 Overall impact and accomplishments: - Expanded data sources and robust validation improved data availability, quality, and governance. Production readiness increased through enhanced exports, validation, and dataset preservation, enabling reliable analytics for curved-road, LION, and related workflows. Reduced maintenance by consolidating checks and aligning ingestion-model references with connectors. Technologies/skills demonstrated: - SQL refactors and data-model evolution for LION curves; Python-based ingestion templates and workflow fixes; data QC/validation enhancements; Geosupport URL management; dependency management and containerized pipelines.
April 2025 monthly summary for NYCPlanning/data-engineering: Delivered core Lion ETL enhancements, IPIS data maintenance, and general pipeline reliability improvements that collectively raise data quality, governance, and downstream analytics readiness. The month focused on delivering feature-rich ETL modeling, robust ingestion capabilities, and rigorous quality controls across the data stack, aligned with business goals of accurate geographic reporting, faster build cycles, and scalable data operations.
April 2025 monthly summary for NYCPlanning/data-engineering: Delivered core Lion ETL enhancements, IPIS data maintenance, and general pipeline reliability improvements that collectively raise data quality, governance, and downstream analytics readiness. The month focused on delivering feature-rich ETL modeling, robust ingestion capabilities, and rigorous quality controls across the data stack, aligned with business goals of accurate geographic reporting, faster build cycles, and scalable data operations.
March 2025: Delivered major build/CI modernization, ingest/catalog enhancements, and data hygiene improvements across NYCPlanning/data-engineering. This suite of changes reduces build times, strengthens data freshness and reliability, and enhances observability and governance, delivering tangible business value for planning analytics and downstream consumers. The work demonstrates strong Python/Docker CI, data ingestion engineering, and tooling automation capabilities.
March 2025: Delivered major build/CI modernization, ingest/catalog enhancements, and data hygiene improvements across NYCPlanning/data-engineering. This suite of changes reduces build times, strengthens data freshness and reliability, and enhances observability and governance, delivering tangible business value for planning analytics and downstream consumers. The work demonstrates strong Python/Docker CI, data ingestion engineering, and tooling automation capabilities.
February 2025 monthly summary for NYCPlanning/data-engineering: Delivered targeted data platform improvements that increase reliability, maintainability, and publishing speed for NYC planning datasets. The work spans ingestion modernization, metadata publishing enhancements, quality assurance for PLUTO data, a deterministic fix for CAMA row_numbers, and proactive dependency and environment upgrades. These changes reduce data-related incidents, improve release readiness, and support scalable data operations across the team.
February 2025 monthly summary for NYCPlanning/data-engineering: Delivered targeted data platform improvements that increase reliability, maintainability, and publishing speed for NYC planning datasets. The work spans ingestion modernization, metadata publishing enhancements, quality assurance for PLUTO data, a deterministic fix for CAMA row_numbers, and proactive dependency and environment upgrades. These changes reduce data-related incidents, improve release readiness, and support scalable data operations across the team.
January 2025 monthly summary for NYCPlanning/data-engineering focused on delivering business value through data platform enhancements including CDBG analysis, data ingestion reliability, format modernization, and CI/CD efficiency. Key outcomes include new borough-level CDBG outputs, improved data traceability, Parquet adoption, and streamlined eligibility workloads.
January 2025 monthly summary for NYCPlanning/data-engineering focused on delivering business value through data platform enhancements including CDBG analysis, data ingestion reliability, format modernization, and CI/CD efficiency. Key outcomes include new borough-level CDBG outputs, improved data traceability, Parquet adoption, and streamlined eligibility workloads.
December 2024 (NYCPlanning/data-engineering) — Delivered a set of data-engineering pipeline enhancements focused on geospatial data availability, ingestion lifecycle efficiency, versioning governance, security, and CI/CD reliability. Key features delivered include geospatial ingestion templates with quarterly ingestion integration; ingestion lifecycle modernization for zoning tax lots with an ingest/library CLI target and archival workflow improvements; COLP and KPDB ingestion enhancements with a first_of_month versioning strategy and recipe-driven loading; Socrata integration with environment-based authentication and updated credentials; CI/CD and Docker image testing enhancements to reduce build/test failures and speed delivery. Additional improvements include boolean representation standardization and data validation, with a development database data correction to prevent QA issues.
December 2024 (NYCPlanning/data-engineering) — Delivered a set of data-engineering pipeline enhancements focused on geospatial data availability, ingestion lifecycle efficiency, versioning governance, security, and CI/CD reliability. Key features delivered include geospatial ingestion templates with quarterly ingestion integration; ingestion lifecycle modernization for zoning tax lots with an ingest/library CLI target and archival workflow improvements; COLP and KPDB ingestion enhancements with a first_of_month versioning strategy and recipe-driven loading; Socrata integration with environment-based authentication and updated credentials; CI/CD and Docker image testing enhancements to reduce build/test failures and speed delivery. Additional improvements include boolean representation standardization and data validation, with a development database data correction to prevent QA issues.
November 2024 performance highlights for NYCPlanning/data-engineering: Delivered data ingestion reliability, data quality improvements, and feature integrations across the data pipeline. Implemented base dataset alignment for key sources, automated maintenance tasks, and targeted bug fixes to reduce downstream data issues and accelerate onboarding of new data sources. The work enhances data trust for analytics and supports faster feature delivery and governance across the DevDB and ingest pipelines.
November 2024 performance highlights for NYCPlanning/data-engineering: Delivered data ingestion reliability, data quality improvements, and feature integrations across the data pipeline. Implemented base dataset alignment for key sources, automated maintenance tasks, and targeted bug fixes to reduce downstream data issues and accelerate onboarding of new data sources. The work enhances data trust for analytics and supports faster feature delivery and governance across the DevDB and ingest pipelines.
Overview of all repositories you've contributed to across your timeline