This repo contains the steps and files I used to build two EMR clusters with Spark (used for just EMR) and Hive (used for EMR Serverless) apps. See step 9 for issues I encountered to look out for if ...
This guide walks you through submitting a Scala Spark application to EMR that queries 500k job urls from Common Crawl and saves the results to an S3 bucket in CSV format. Running the application on ...