Machine Learning Guide

Machine Learning Guide

India

This series aims to teach you the high level fundamentals of machine learning from A to Z. I'll teach you the basic intuition, algorithms, and math. We'll discuss languages and frameworks, deep learning, and more. Audio may be an inferior medium to task; but with all our exercise, commute, and chores hours of the day, not having an audio supplementary education would be a missed opportunity. And where your other resources will provide you the machine learning trees, I’ll provide the forest. Additionally, consider me your syllabus. At the end of every episode I’ll provide the best-of-the-best resources curated from around the web for you to learn each episode’s details.

Episodes

23. Deep NLP 2  

RNN review, bi-directional RNNs, LSTM & GRU cells.

## Resources
- Overview Articles:
** Unreasonable Effectiveness of RNNs (http://karpathy.github.io/2015/05/21/rnn-effectiveness/) `article:easy`
** Deep Learning, NLP, and Representations (http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/) `article:medium`
** Understanding LSTM Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) `article:medium`
- Stanford cs224n: Deep NLP (https://www.youtube.com/playlist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6) `course:medium` (replaces cs224d)
- TensorFlow Tutorials (https://www.tensorflow.org/tutorials/word2vec) `tutorial:medium` (start at Word2Vec + next 2 pages)
- The usual DL resources (pick one):
** Deep Learning Book (http://amzn.to/2tXgCiT) (Free HTML version (http://www.deeplearningbook.org/)) `book:hard` comprehensive DL bible; highly mathematical
** Fast.ai (http://course.fast.ai/) `course:medium` practical DL for coders
** Neural Networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) `book:medium` shorter online "book"


## Episode

RNN Review
** Vanilla: When words + running context is sufficient.
** POS, NER, stocks, weather
** Bidirectional RNN (BiLSTM): When stuff from right helps too
** Encoder/decoder or Seq2seq: When you should hear everything first / spin a different way
** Classification, sentiment, translation
** Now w/ word embeddings

Train: backprop through time
** Vanishing/exploding gradient

LSTMs (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
** ReLU vs Sigmoid vs TanH (Nonlinearities future episode)
** Forget gate layer
** Input gate layer: decides which values to update
** Tanh layer: creates new candidate values
** Output layer

22. Deep NLP 1  

Recurrent Neural Networks (RNNs) and Word2Vec.

## Resources
- Overview Articles:
** Unreasonable Effectiveness of RNNs (http://karpathy.github.io/2015/05/21/rnn-effectiveness/) `article:easy`
** Deep Learning, NLP, and Representations (http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/) `article:medium`
** Understanding LSTM Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) `article:medium`
- Stanford cs224n: Deep NLP (https://www.youtube.com/playlist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6) `course:medium` (replaces cs224d)
- TensorFlow Tutorials (https://www.tensorflow.org/tutorials/word2vec) `tutorial:medium` (start at Word2Vec + next 2 pages)
- Deep Learning Resources (http://ocdevel.com/podcasts/machine-learning/9)


## Episode

Deep NLP pros
- Language complexity & nuances
** Feature engineering / learning
** Salary = degree*field, not +
** Multiple layers: pixels => lines => objects
** Multiple layers of language
- Once model to rule them all; E2E models

Sequence vs non-sequence
- DNN = ANN = MLP = Feed Forward
- RNNs for sequence (time series)

RNNs
- Looped hidden layers, learns nuances by combined features
- Carries info through time: language model
- Translation, sentiment, classification, POS, NER, ...
- Seq2seq, encode/decode

Word2Vec (https://www.tensorflow.org/tutorials/word2vec)
- One-hot (sparse) doesn't help (plus sparse = compute)
- Word embeddings
** Euclidean distance for synonyms / similar, Cosine for "projections" . king + queen - man = woman
** t-SNE (t-distributed stochastic neighbor embedding)
- Vector Space Models (VSMs). Learn from context, predictive vs count-based
- Predictive methods (neural probabilistic language models) - Learn model parameters which predict contexts
** Word2vec
** CBOW / Skip-Gram (cbow predicts center from context, skip-gram context from center. Small v large datasets)
** DNN, Softmax hypothesis fn, NCE loss (noise contrastive estimation)
- Count-based methods / Distributional Semantics - (compute the statistics of how often some word co-occurs with its neighbor words in a large text corpus, and then map these count-statistics down to a small, dense vector for each word)
** GloVe
** Linear algebra stuff (PCA, LSA, SVD)
** Pros (?): faster, more accurate, incremental fitting. Cons (?): data hungry, more RAM. More info (http://blog.aylien.com/overview-word-embeddings-history-word2vec-cbow-glove/)
- DNN for POS, NER (or RNNs)

21. Update  

Update on Patreon and resources.

Keep the podcast alive, donate on Patreon (https://www.patreon.com/machinelearningguide)

20. Natural Language Processing 3  

Natural Language Processing classical/shallow algorithms.

## Episode
- Parsing
** Constituents
** Grammar: Context Free Grammars (CFGs), Probabalistic CFGs (PCFGs), Cocke–Younger–Kasami (CYK)
** Dependency Tree: Greedy transition-based parsing (stack/buffer)
** SyntaxNet (English = Parsey McParseface)
- Relationship Extraction
- Question Answering / Textual Entailment (TF-IDF+Cosine Similarity; Parsing; NER)
- Automatic summarization (TF-IDF; TextRank)
- Machine Translation (details here (https://www.youtube.com/watch?v=QuELiw8tbx8&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=9))

## Resources
- Speech and Language Processing (http://amzn.to/2uZaNyg)
- Stanford NLP YouTube (https://www.youtube.com/playlist?list=PL6397E4B26D00A269)
** Setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x --audio-format mp3 https://www.youtube.com/playlist?list=PL6397E4B26D00A269`
- NLTK Book (http://www.nltk.org/book)

19. Natural Language Processing 2  

Natural Language Processing classical/shallow algorithms.

## Episode

- Edit distance: Levenshtein distance
- Stemming/lemmatization: Porter Stemmer
- N-grams, Tokens: regex
- Language models
** Machine translation, spelling correction, speech recognition
- Classification / Sentiment Analysis: SVM, Naive bayes
- Information Extraction (POS, NER): Models: MaxEnt, Hidden Markov Models (HMM), Conditional Random Fields (CRF)
- Generative vs Discriminative models
** Generative: HMM, Bayes, LDA
** Discriminative: SVMs, MaxEnt / LogReg, ANNs
** Pros/Cons
** Generative depends on fewer data (NLP tends to be few data)
** MaxEnt vs Naive Bayes: Independence assumption of Bayes, etc ("Hong" "Kong")
- Topic Modeling and keyword extraction: Latent Dirichlet Allocation (LDA)
** LDA ~= LSA ~= LSI: Latent diriclet allocation, latent semantic indexing, latent semantic analysis
- Search / relevance / document-similarity: Bag-of-words, TF-IDF
- Similarity: Jaccard, Cosine, Euclidean

## Resources
- Speech and Language Processing (http://amzn.to/2uZaNyg)
- Stanford NLP YouTube (https://www.youtube.com/playlist?list=PL6397E4B26D00A269)
** Setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x --audio-format mp3 https://www.youtube.com/playlist?list=PL6397E4B26D00A269`
- NLTK Book (http://www.nltk.org/book)

18. Natural Language Processing 1  

Introduction to Natural Language Processing (NLP) topics.

## Errata
22:21 "cat & car different by one word" should be "different by one letter"

## Episode
Syntax vs Semantics

Parts
- Corpus
- Lexicon
- Morphology
** Lemmas & Stems (reduce morphological variation; lemmatization more sophisticated)
** Tokens
** Stop words
** Edit-distance
** Word sense disambiguation

Syntax / Tasks
- Info Extraction (POS, NER, Relationship extraction)
- Parsing

Goals
- Spell check
- Classification
** Tagging (topic modeling / keyword extraction)
** Sentiment analysis
- Search / relevance, document similarity
- Natural language understanding
** Question answering
** Textual entailment
** Machine Translation (AI-complete)
** NLU vs NLP
- Natural language generation
** Image captioning
** Chatbots
** Automatic summarization
- Won't cover
** Optical character recognition (OCR)
** Speech (TTS, STT, Segmentation, Diarization)

## Resources
- Speech and Language Processing (https://web.stanford.edu/~jurafsky)
- Stanford NLP YouTube (https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26D00A269)
** Setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x --audio-format mp3 https://www.youtube.com/playlist?list=PL6397E4B26D00A269`
- NLTK Book (http://www.nltk.org/book)

17. Checkpoint  

Checkpoint - learn the material offline!

## Resources
45m/d ML
- Coursera (https://www.coursera.org/learn/machine-learning) `course` (last time mentioning)
- Python (http://amzn.to/2mVgtJW) `book`
- TensorFlow (https://www.tensorflow.org/get_started/get_started) `tutorial`
- Deep Learning (http://www.deeplearningbook.org/) `book`
- Go deeper on shallow algos
** Elements of Statistical Learning (https://statweb.stanford.edu/~tibs/ElemStatLearn/) `book`
** Pattern Recognition and Machine Learning (http://www.springer.com/us/book/9780387310732) `book` (Free PDF (https://goo.gl/aX038j)?)

15m/d Math (KhanAcademy) `courses`
- LinAlg (https://www.khanacademy.org/math/linear-algebra)
- Stats (https://www.khanacademy.org/math/statistics-probability)
- Calc (https://www.khanacademy.org/math/calculus-home)

Audio
- CS229 - Machine Learning (https://see.stanford.edu/Course/CS229)
- The Master Algorithm (http://amzn.to/2kLOQjW)
- Mathematical Decision Making (https://goo.gl/V75I49)
- Statistics 1 (https://goo.gl/sIBOjw) 2 (https://goo.gl/b15Aug)
- Calculus 1 (https://goo.gl/fcLP3l) 2 (https://goo.gl/sBpljN) 3 (https://goo.gl/8Hdwuh)

Kaggle.com (https://www.kaggle.com/)

16. Consciousness  

Can AI be conscious?

## Episode

Inspirations for AI
- economic automation
- singularity
- consciousness

Definitinitions
- cogsci: neuroscience, neuro-x(biology, physiology, computational __, etc), psychology, philosophy, AI
** computational neuroscience => perceptron
** frank rosenblatt, warren McCulloche, walter pitts - all brain guys (neurobiology, neurophysiology, computational neuroscience respectively)
- intelligence (computation) vs consciousness (soul); intelligence in scale (animals); brain in scale; consciousness in scale?
- perception, self-identity, memory, attention; (self reflection is just a human-special component)
- awereness (qualia / sentience / subjective experience); modified by attention? (driving, dreams, coma)
- missing: emotions; just built-in goal reinforcemer. plus we don't know how machines experience reinforcement (floor-is-lava)

Hard vs soft problem
** soft problem = neuroscience
** hard problem = philosophy
** dualism: pineal gland, issue with physical->metaphysical; society of mind / connected intelligences
** maybe definitively non-science, since subjective
** maybe matter of time; phil is pre-science at each juncture; science turns magic => known (sickness). Either hard problem is unscientific (phil) or around the corner

Emergence (emergent property)

Computational theory of mind
- intelligence & consciousness connected / same
- think: word2vec = understanding?
- consciousness in scale; does this mean every layer has its own consciousness? Panpsychism. I don't know - just concerned with that which does exhibit intelligence
- integrated information theory
- freewill; conscious / awareness center activated after decision made; all the information in place before whole ; westworld

Biological plausibility
- planes, brains
- sans bio-plaus, functionalism; zombies; turing test; searle's chinese room

# Resources

Philosophy of Mind: Brains, Consciousness, and Thinking Machines (http://amzn.to/2kQGgk5) `audio`

15. Performance  

Performance evaluation & improvement

## Episode

Performance evaluation

- Performance measures: accuracy, precision, recall, F1/F2 score
- Cross validation: split your data into train, validation, test sets
- Training set is for training your algorithm
- Validation set is to test your algorithm's performance. It can be used to inform changing your model (ie, hyperparameters)
- Test set is used for your final score. It can't be used to inform changing your model.

Performance improvement

- Modify hyperpamaraters
- Data: collect more, fill in missing cells, normalize fields
- Regularize: reduce overfitting (high variance) and underfitting (high bias)

14. Shallow Algos 3  

Speed run of Anomaly Detection, Recommenders(Content Filtering vs Collaborative Filtering), and Markov Chain Monte Carlo (MCMC)

## Episode
- Anomoly Detection algorithm
- Recommender Systems (Content Filtering, Collaborative Filtering)
- Markov Chains & Monte Carlo

## Resources
- Andrew Ng Week 9 (https://www.coursera.org/learn/machine-learning/resources/szFCa)

13. Shallow Algos 2  

Speed run of Support Vector Machines (SVMs) and Naive Bayes Classifier.

## Episode
- Support Vector Machines (SVM)
- Naive Bayes Classifier

## Resources
- Andrew Ng Week 7 (https://www.coursera.org/learn/machine-learning/resources/Es9Qo)
- Machine Learning with R (http://amzn.to/2n5fSUF)
- Mathematical Decision Making (https://goo.gl/V75I49)
- Which algo to use?
** Pros/cons table for algos (https://blog.recast.ai/machine-learning-algorithms/2/)
** Decision tree of algos (http://scikit-learn.org/stable/tutorial/machine_learning_map/) `article`

12. Shallow Algos 1  

Speed-run of some shallow algorithms: K Nearest Neighbors (KNN); K-means; Apriori; PCA; Decision Trees

## Episode
KNN (supervised)

Unsupervised
- Clustering -> K-Means
- Association rule learning / Market basket -> Apriori
- Dimensionality Reduction -> PCA

Decision Trees (supervised, classify/regress)
- Random Forests
- Gradient Boost

## Resources
- Andrew Ng Week 8 (https://www.coursera.org/learn/machine-learning/resources/kGWsY)
- Tour of ML Algos (http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) `article`
- Decision Tree of algos (http://scikit-learn.org/stable/tutorial/machine_learning_map/) `article`
- Elements of Statistical Learning (https://statweb.stanford.edu/~tibs/ElemStatLearn/) `book`
- Machine Learning with R (http://amzn.to/2n5fSUF)
- Pattern Recognition and Machine Learning (http://www.springer.com/us/book/9780387310732) `book`
** Free PDF (https://goo.gl/aX038j)?

11. Checkpoint  

Checkpoint - start learning the material offline!

## Resources
45m/d ML
- Coursera (https://www.coursera.org/learn/machine-learning)
- Deep Learning (http://www.deeplearningbook.org/)
- Python (http://amzn.to/2mVgtJW)
- TensorFlow (https://www.tensorflow.org/get_started/get_started)

15m/d Math (KhanAcademy)
- LinAlg (https://www.khanacademy.org/math/linear-algebra)
- Stats (https://www.khanacademy.org/math/statistics-probability)
- Calc (https://www.khanacademy.org/math/calculus-home)

Audio
- (Very optional)
** Philosophy of Mind: Brains, Consciousness, and Thinking Machines (http://amzn.to/2kQGgk5)
** The Singularity Is Near (http://amzn.to/2lzCqKk)
- The Master Algorithm (http://amzn.to/2kLOQjW)
- Mathematical Decision Making (https://goo.gl/V75I49)
- Statistics 1 (https://goo.gl/sIBOjw) 2 (https://goo.gl/b15Aug)
- Calculus 1 (https://goo.gl/fcLP3l) 2 (https://goo.gl/sBpljN) 3 (https://goo.gl/8Hdwuh)
- Great Courses Plus (http://thegreatcoursesplus.7eer.net/c/358692/225371/3896) subscription, might save $

10. Languages & Frameworks  

Languages & frameworks comparison. Languages: Python, R, MATLAB/Octave, Julia, Java/Scala, C/C++. Frameworks: Hadoop/Spark, Deeplearning4J, Theano, Torch, TensorFlow.

## Episode
Languages
- C/C++
** Performance
** GPU (CUDA/cuDNN)
- Math Langs
** R
** MATLAB / Octave
** Julia
- Java / Scala
** Data mining
** Hadoop + Mahout / Spark + SparkML
** Deeplearning4j
- Python
** R => Pandas
** MATLAB => numpy
** C/C++/GPU => TensorFlow (or other symbolic graph)
** Data Mining => PySpark
** Server (Flask, Django)
- Analogy: Data => Analytics (biz intelligence, etc) => Adsense
- Other languages like Node, Go, Rust (forgot to mention) see my answer (https://goo.gl/9d21xE) for why NOT to use them.
- Articles
** Best Programming Language for Machine Learning (http://machinelearningmastery.com/best-programming-language-for-machine-learning)
** Data Science Job Report 2017 (http://r4stats.com/2017/02/28/r-passes-sas)

Frameworks
- ML libraries
** Numpy, Pandas, scikit-learn
- Computational/symbolic graphs
** Automatic differentiation
- Theano
** Math layer
** Blocks/Lasagne ML layer
** Keras DL layer
- Torch
** CNNs
** note about RNNs
- TensorFlow
** Perf over time
** Mobile etc
** Keras
- Others
** Caffe (old-n-dying, C++)
** CNTK (MS)
** mxnet (Amazon)
** DL4J
** OpenCV (vision only)
- Articles
** An Overview of Python Deep Learning Frameworks (http://www.kdnuggets.com/2017/02/python-deep-learning-frameworks-overview.html)
** Evaluation of Deep Learning Toolkits (https://github.com/zer0n/deepframeworks/blob/master/README.md)
** Comparing Frameworks: Deeplearning4j, Torch, Theano, TensorFlow, Caffe, Paddle, MxNet, Keras & CNTK (https://deeplearning4j.org/compare-dl4j-torch7-pylearn) - grain of salt, it's super heavy DL4J propaganda (written by them)

## Resources
- Python (http://amzn.to/2mVgtJW)
- TensorFlow Tutorials (https://www.tensorflow.org/get_started/get_started)

9. Deep Learning  

Deep learning and neural networks. How to stack our logisitic regression units into a multi-layer perceptron.

## Episode
- Value
** Represents brain? Magic black-box
** Feature learning (layer removed from programmer)
** Subsumes AI
- Stacked shallow learning
** Logistic regression = lego, Neural Network = castle
- Deep Learning => ANNs => MLPs (& RNNs, CNNs, DQNs, etc)
** MLP: Perceptron vs LogReg / sigmoid activation
- Architecture
** (Feed forward) Input => Hidden Layers => Hypothesis fn
** "Feed forward" vs recursive (RNNs, later)
** (Loss function) Cross entropy
** (Learn) Back Propagation
- Price ~ smoking + obesity + age^2
** 1-layer MLP
- Face? ~ pixels
** Extra layer = hierarchical breakdown
** Inputs => Employees => Supervisors => Boss
- Backprop / Gradient descent
** Optimizers: adagrad, adam, ... vs gradient descent
- Silver bullet, but don't abuse
** linear (housing market)
** features don't combine
** expensive: like hiring a company when the boss h(x) does all the work
- Brian comparison (dentrites, axons); early pioneers as neuroscientists / cogsci
- Different types
** vs brain
** RNNs
** CNNs
- Activation fns
** Activation units / neurons (hidden layer)
** Relu, TanH

## Resources
- Deep Learning Simplified (https://www.youtube.com/watch?v=b99UVkWzYTQ) `video` quick series to get a lay-of-the-land.
- ☞ Deep Learning Book (http://www.deeplearningbook.org) `book`
- You'll also see Neural Networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) recommended, but DL Book is more thorough and updated.

8. Math  

Introduction to the branches of mathematics used in machine learning. Linear algebra, statistics, calculus.

## Episode
- Linear Algebra = Matrix (or "Tensor") math. Wx + b. Chopping in our analogy.
- Stats = Probability/inference, the heart of machine learning. Recipes/cookbook.
- Calculus = Learning. Moving our error dot to the bottom of the valley. Baking, the actual "cook" step.

## Resources
Come back here after you've finished Ng's course; or learn these resources in tandem with ML (say 1 day a week).

☞ KhanAcademy:
- LinAlg (https://www.khanacademy.org/math/linear-algebra)
- Stats (https://www.khanacademy.org/math/statistics-probability)
- Calc (https://www.khanacademy.org/math/calculus-home)

Primers (PDFs)
- See "Section Notes" of cs229 (http://cs229.stanford.edu/materials.html)

Books
- "Linear Algebra Done Right"
- "All of statistics"
- (Not sure on Calc, comment if you know a good one)

The Great Courses `audio` highly recommend audio supplementary material
- Stats (https://goo.gl/sIBOjw)
- Calc 1 (https://goo.gl/fcLP3l) 2 (https://goo.gl/sBpljN) 3 (https://goo.gl/8Hdwuh)
- ☞ Mathematical Decision Making (https://goo.gl/V75I49) `recommended` basically an ML course ("Operations Research", a very similar field)
- Relevant others:
** Game Theory (https://goo.gl/yEEOG1)
** Discrete Math (https://goo.gl/CBKCJE)
- Conversion Script: `for f in *.mp4; do ffmpeg -i "$f" "${f%.mp4}.mp3" && rm "$f"; done`

7. Logistic Regression  

Your first classifier: Logistic Regression. That plus Linear Regression, and you're a 101 supervised learner!

## Episode
See Andrew Ng Week 3 Lecture Notes (https://www.coursera.org/learn/machine-learning/resources/Zi29t)

## Resources
You've started Ng's Coursera course (https://www.coursera.org/learn/machine-learning), right? Riight?

6. Certificates & Degrees  

Discussion on certificates and degrees from Udacity to a Masters degree.

## Episode
Self-edify
- Coursera Specialization - flat $500
- Udacity Nanodegree - $200/m (discount if timely completion)
** Great for self-teaching, not recognized degree
** Machine Learning (https://www.udacity.com/course/machine-learning-engineer-nanodegree--nd009)
** Self Driving Car (https://www.udacity.com/drive)
** Artificial Intelligence (https://www.udacity.com/ai)

OMSCS (https://www.omscs.gatech.edu/): Great & cheap online masters degree

Portfolio: Most important for getting a job

## Resources
- Discussions: 1 (http://canyon289.github.io/DSGuide.html#DSGuide) 2 (https://news.ycombinator.com/item?id=13654127) 3 (http://cole-maclean.github.io/blog/Self%20Taught%20AI/) 4 (https://news.ycombinator.com/item?id=12516441)

5. Linear Regression  

Introduction to the first machine-learning algorithm, the 'hello world' of supervised learning - Linear Regression

## Episode
See Andrew Ng Week 2 Lecture Notes (https://www.coursera.org/learn/machine-learning/resources/QQx8l)

## Resources
- ☞ Andrew Ng's Machine Learning Coursera course (https://www.coursera.org/learn/machine-learning) `mooc`
No question about it, this is the most essential, important, recommended resource in my entire series _period_. Consider it required, not optional.

4. Algorithms - Intuition  

Overview of machine learning algorithms. Infer/predict -> error/loss -> train/learn. Supervised, unsupervised, reinforcement learning.

## Episode
Learning (ML)
- 3-step process
** Infer / Predict
** Error / Loss
** Train / Learn
- First as batch from spreadsheet, then "online" going forward
** Pre-train your "model"
** "Examples"
** "Weights"
- Housing cost example
** "Features"
** Infer cost based on num_rooms, sq_foot, etc
** Error / Loss function

Categories
- Supervised learning
** Vision (CNN)
** Speech (RNN)
- Unsupervised
** Market segmentation
- Reinforcement & Semi-Supervised
** Planning (DQN): Games (chess, Mario); Robot movement

## Resources
- Tour of Machine Learning Algorithms (http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms) `article`
- The Master Algorithm (http://amzn.to/2kLOQjW) `audio` Semi-technical overview of ML basics & main algorithms

0:00/0:00
Video player is in betaClose