Keras and Neural Network Fundamentals
MLflow and Spark UDFs
Hyperparameter Tuning with Hyperopt
Horovod: Distributed Model Training
LIME, SHAP & Model Interpretability
CNNs and ImageNet
Transfer Learning
Object Detection
Generative Adversarial Networks (GANs)
Pandas/Spark?
Machine Learning? Deep Learning?
Expectations?
Performs well on complex datasets like images, sequences, and natural language
Scales better as data size increases
Theoretically can learn any shape (universal approximation theorem)
Composing representations of data in a hierarchical manner
High-level Python API to build neural networks
Official high-level API of TensorFlow
Has over 250,000 users
Released by François Chollet in 2015
GPUs are prefered for training due to speed of computation, but not good in data transfer
CPUs are generally acceptable for inference
Input layer
Zero or more hidden layers
Output layer
Measure "closeness" between label and prediction
Evaluation metrics:
$Error_{i} = (y_{i} - \hat{y_{i}})$
$SE_{i} = (y_{i} - \hat{y_{i}})^2$
$SSE = \sum_{i=1}^n (y_{i} - \hat{y_{i}})^2$
$MSE = \frac{1}{n}\sum_{i=1}^n (y_{i} - \hat{y_{i}})^2$
$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n (y_{i} - \hat{y_{i}})^2}$
Calculate gradients to update weights
Provide non-linearity in our neural networks to learn more complex relationships
Sigmoid
Tangent
ReLU
Leaky ReLU
PReLU
ELU
Saturates and kills gradients
Not zero-centered
Zero centered!
BUT, like the sigmoid, its activations saturate
BUT, gradients can still go to zero
For x < 0: $$f(x) = \alpha * x$$ For x >= 0: $$f(x) = x$$
These functions are not differentiable at 0, so we set the derivative to 0 or average of left and right derivative
Choosing a proper learning rate can be difficult
Easy to get stuck in local minima
Accelerates SGD: Like pushing a ball down a hill
Take average of direction we’ve been heading (current velocity and acceleration)
Limits oscillating back and forth, gets out of local minima
Adaptive Moment Estimation (Adam)
Which dataset should we use to select hyperparameters? Train? Test?
Split the dataset into three!
Created by Alexander Sergeev of Uber, open-sourced in 2017
Simplifies distributed neural network training
Supports TensorFlow, Keras, PyTorch, and Apache MXNet
# Only one line of code change!
optimizer = hvd.DistributedOptimizer(optimizer)
Focus on Local Connectivity (fewer parameters to learn)
Filter/kernel slides across input image (often 3x3)
Image Kernels VisualizationClassify images in one of 1000 categories
2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%)
One of the most widely used architectures for its simplicity
IDEA: Intermediate representations learned for one task may be useful for other related tasks
Estimates generative models
Simultaneously trains two models
G: a generative model captures the data distribution
D: a discriminative model predicts probability of data coming from G
Used in generating art, deep fakes, up-scaling graphics, and astronomy research
G takes noise as input, outputs a counterfeit
D takes counterfeits and real values as input, outputs P(counterfeit)
To prevent overfitting...