
Developed a Python-based web scraping tool for the professor-jon-white/COSC_352_FALL_2025 repository, focused on extracting and organizing tabular data from HTML sources. The solution loads HTML from URLs or local files, parses tables, cleans extracted text, and exports each table to a separate CSV file. Docker was used to containerize the workflow, ensuring reproducibility, while a shell script automated crawling across multiple URLs and output organization. The project emphasized modularity with helper functions for HTML parsing and data cleaning, and delivered a repeatable data extraction pipeline. Work demonstrated practical application of Python scripting, Docker, and CSV handling for data engineering tasks.
Month: 2025-10 — Delivered a Python-based web scraping tool for tabular data extraction, dockerized for reproducible deployment, with a shell-script-driven workflow to crawl multiple URLs and organize outputs. While no major bugs were reported, minor parsing robustness improvements and packaging stability were addressed to ensure reliable data collection.
Month: 2025-10 — Delivered a Python-based web scraping tool for tabular data extraction, dockerized for reproducible deployment, with a shell-script-driven workflow to crawl multiple URLs and organize outputs. While no major bugs were reported, minor parsing robustness improvements and packaging stability were addressed to ensure reliable data collection.
Month: 2025-09 — COSC_352_FALL_2025: Delivered a Python-based Web HTML Table Scraper to CSV (Data Extraction Tool). The tool loads HTML content from URLs or local files, parses HTML tables, cleans extracted data, and exports each table to a separate CSV file. It includes helper utilities for loading HTML, identifying tables, and preparing clean data; Docker containerization is available (optional) with a requirements file. Demonstrated practical web-scraping workflow on representative pages such as language comparison tables. Delivered in three commits to show progressive delivery: f00ead83043b049506910e38ea929130da8a7148 (The first project submission), 582f785eb26cb9f6f65ae0d0d023ebe7bcc29b86 (project_2 commit), and 9e58f3f600780a9910156f225cec55bbe19fa9fa (just committed all files).
Month: 2025-09 — COSC_352_FALL_2025: Delivered a Python-based Web HTML Table Scraper to CSV (Data Extraction Tool). The tool loads HTML content from URLs or local files, parses HTML tables, cleans extracted data, and exports each table to a separate CSV file. It includes helper utilities for loading HTML, identifying tables, and preparing clean data; Docker containerization is available (optional) with a requirements file. Demonstrated practical web-scraping workflow on representative pages such as language comparison tables. Delivered in three commits to show progressive delivery: f00ead83043b049506910e38ea929130da8a7148 (The first project submission), 582f785eb26cb9f6f65ae0d0d023ebe7bcc29b86 (project_2 commit), and 9e58f3f600780a9910156f225cec55bbe19fa9fa (just committed all files).

Overview of all repositories you've contributed to across your timeline