Apache Spark Installation on EC2
Installation Steps
1. Update System
# Update system packages
# Follow instructiom in https://docs.aws.amazon.com/linux/al2023/ug/updating.html for latest version
sudo dnf --releasever=2023.8.20250818 update
2. Install Java and Scala
# Install Java
sudo dnf install java-17-amazon-corretto-headless
# Install Scala
curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
3. Install pip
# python3 is already installed
# Install Python 3 and pip
sudo dnf install python3-pip
4. Install Spark
# Create directory for Spark
sudo mkdir -p /opt/spark
cd /opt/spark
# Download Spark (adjust version as needed)
sudo wget https://dlcdn.apache.org/spark/spark-4.0.1/spark-4.0.1-bin-hadoop3.tgz
# Unpack Spark
sudo tar -xzf spark-4.0.1-bin-hadoop3.tgz