Machine Learning with Apache Spark

Day 1 Schedule

Morning:

Intro to Spark + DataFrames

Lunch: 12 - 1 pm

Afternoon:

Built-in Functions

User Defined Functions (UDFs)

Caching + Partitioning

Day 2 Schedule

Morning:

Data Cleansing & EDA

Linear Regression

Lunch: 12 - 1 pm

Afternoon:

Transformer, Estimator, Pipeline API

MLflow Tracking

MLflow Model Registry

Day 3 Schedule

Morning:

Decision Trees

Model Tuning, Cross-Validation, and Grid Search

MLlib Deployment Options

Lunch: 12 - 1 pm

Afternoon:

XGBoost & 3rd Party Libraries

Pandas UDFs & Koalas

Capstone Project & Course Recap

Course Objectives

RDDs, DataFrames, Datasets

When/where to use Spark and SparkML

Track, version, and deploy models with MLflow

Use Spark to scale the inference or hyperparameter tuning of single-node models

Types of common ML problems and gotchas

Survey

Spark before?


Machine Learning?


Language: Python? Scala?

Introductions

  1. Professional
  2. Name + Responsibilities

  3. Personal
  4. Interests/Fun fact

  5. Expectations?

Let's get started!