Posts
Why Can't I Connect to my …
Problem: I recently received an error while attempting to SSH into an EC2 instance. The error specifically said “ssh: connect to host ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com port 22: Operation timed out” I checked the normal issue areas (connected to internet gateway and no unhealthy status …
Webscraping Starter Kit
I often want to scrape a website, but I don’t know what packages which be required to scrape a given website. Some websites require a headless browser for some part of the webscraping process, often to acquire a cookie, token, or header that is rendered by JavaScript or network requests that are …
Exploring ETL Options for …
I have been scraping millions of court records for the past year. These records are scraped and saved as individual JSON files in Amazon S3, a cheap and durable storage solution on AWS. These records are then catalogued in AWS Glue so analysts and data scientists can analyze the data using SQL …