Region 9 US EPA Data Scraper

This is a piece of software I conceived and helped develop, along with an engineer from Facebook, that allows an activist to access the Region 9 EPA website and download all publicly available PDF’s for every Superfund toxic waste site covered under the Region 9 EPA.  Rather than manually go from site to site and collect these PDF’s, this application allows you to enter in one directory and download the entire cache of records with one click.

You will need to have Python 2.7 installed on your system.  Download, and unzip this file to the directory of your choice.  Open the directory and double click  “main.py.”  A command prompt window will open, and then a small interface.  Enter in the name of the directory and click “go.”

EPA Superfund Region 9 PDF Scraper V1.2 (beta)

Your system will now scrape the US EPA website and download the entire PDF document Superfund cache for California, Nevada, and Arizona.

Notes:

This is a very early beta.  It could potentially error out.  

This version does not work with a proxy.

The estimated download time at average speeds is about eight hours.

There has been some question to the legality of this scraper but I remind you that data mining publicly available records is not a crime, yet.

 

 

Leave a Reply