Posts

Webscraping Starter Kit

I often want to scrape a website, but I don’t know what packages which be required to scrape a given website. Some websites require a headless browser for some part of the webscraping process, often to acquire a cookie, token, or header that is rendered by JavaScript or network requests that are …

Exploring ETL Options for …

I have been scraping millions of court records for the past year. These records are scraped and saved as individual JSON files in Amazon S3, a cheap and durable storage solution on AWS. These records are then catalogued in AWS Glue so analysts and data scientists can analyze the data using SQL …

Solved - AWS Lambda, …

I recently discovered that a couple lambda functions were not failing properly. These lambda functions were configured to send messages to a queue which were then relayed to a slack channel; however, I was finding that data was missing with no clear explanation. Looking upstream, I found that the …