Episodes
-
Support my new podcast: Lefnire's Life Hacks
Discussing Databricks with Ming Chang from Raybeam (part of DEPT®)
-
Support my new podcast: Lefnire's Life Hacks
Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker)
Dirk-Jan Verdoorn - Data Scientist at Dept Agency
Kubeflow. (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
TensorFlow Extended (TFX). If using TensorFlow with Kubeflow, combine with TFX for maximum power. (From the website:) TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models from research to production, use TFX to create and manage a production pipeline.
Alternatives:
Airflow MLflow -
Missing episodes?
-
Support my new podcast: Lefnire's Life Hacks
Chatting with co-workers about the role of DevOps in a machine learning engineer's life
Expert coworkers at Dept
Matt Merrill - Principal Software Developer Jirawat Uttayaya - DevOps Lead The Ship It Podcast (where Matt features often)Devops tools
Terraform AnsiblePictures (funny and serious)
Which AWS container service should I use? A visual guide on troubleshooting Kubernetes deployments Public Cloud Services Comparison Killed by Google aCloudGuru AWS curriculum -
Support my new podcast: Lefnire's Life Hacks
(Optional episode) just showcasing a cool application using machine learning
Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it performed.
Descript The Ship It Podcast How to ship software, from the front lines. We talk with software developers about their craft, developer tools, developer productivity and what makes software development awesome. Hosted by your friends at Rocket Insights. AKA shipit.io Brandbeats Podcast by BASIC An agency podcast with views on design, technology, art, and culture. Explore the new microsite at www.brandbeats.basicagency.com -
Support my new podcast: Lefnire's Life Hacks
Show notes: ocdevel.com/mlg/mla-17
Developing on AWS first (SageMaker or other)
Consider developing against AWS as your local development environment, rather than only your cloud deployment environment. Solutions:
Stick to AWS Cloud IDEs (Lambda, SageMaker Studio, Cloud9Connect to deployed infrastructure via Client VPN
Terraform example YouTube tutorial Creating the keys LocalStackInfrastructure as Code
Terraform CDK Serverless -
Support my new podcast: Lefnire's Life Hacks
Part 2 of deploying your ML models to the cloud with SageMaker (MLOps)
MLOps is deploying your ML models to the cloud. See MadeWithML for an overview of tooling (also generally a great ML educational run-down.)
SageMaker Jumpstart Deploy Pipelines Monitor Kubernetes Neo -
Support my new podcast: Lefnire's Life Hacks
Show notes Part 1 of deploying your ML models to the cloud with SageMaker (MLOps)
MLOps is deploying your ML models to the cloud. See MadeWithML for an overview of tooling (also generally a great ML educational run-down.)
SageMaker DataWrangler Feature Store Ground Truth Clarify Studio AutoPilot Debugger Distributed TrainingAnd I forgot to mention JumpStart, I'll mention next time.
-
Support my new podcast: Lefnire's Life Hacks
Server-side ML. Training & hosting for inference, with a goal towards serverless. AWS SageMaker, Batch, Lambda, EFS, Cortex.dev
-
Support my new podcast: Lefnire's Life Hacks
Client, server, database, etc.
-
Support my new podcast: Lefnire's Life Hacks
Use Docker for env setup on localhost & cloud deployment, instead of pyenv / Anaconda. I recommend Windows for your desktop.
-
Support my new podcast: Lefnire's Life Hacks
Show notes at ocdevel.com/mlg/32.
L1/L2 norm, Manhattan, Euclidean, cosine distances, dot product
Normed distances link
A norm is a function that assigns a strictly positive length to each vector in a vector space. link Minkowski is generalized. p_root(sum(xi-yi)^p). "p" = ? (1, 2, ..) for below. L1: Manhattan/city-block/taxicab. abs(x2-x1)+abs(y2-y1). Grid-like distance (triangle legs). Preferred for high-dim space. L2: Euclidean. sqrt((x2-x1)^2+(y2-y1)^2. sqrt(dot-product). Straight-line distance; min distance (Pythagorean triangle edge) Others: Mahalanobis, Chebyshev (p=inf), etcDot product
A type of inner product.
Outer-product: lies outside the involved planes. Inner-product: dot product lies inside the planes/axes involved link. Dot product: inner product on a finite dimensional Euclidean space linkCosine (normalized dot)
-
Support my new podcast: Lefnire's Life Hacks
Kmeans (sklearn vs FAISS), finding n_clusters via inertia/silhouette, Agglomorative, DBSCAN/HDBSCAN
-
Support my new podcast: Lefnire's Life Hacks
NLTK: swiss army knife. Gensim: LDA topic modeling, n-grams. spaCy: linguistics. transformers: high-level business NLP tasks.
-
Support my new podcast: Lefnire's Life Hacks
matplotlib, Seaborn, Bokeh, D3, Tableau, Power BI, QlikView, Excel
-
Support my new podcast: Lefnire's Life Hacks
EDA + charting. DataFrame info/describe, imputing strategies. Useful charts like histograms and correlation matrices.
-
Support my new podcast: Lefnire's Life Hacks
Run your code + visualizations in the browser: iPython / Jupyter Notebooks.
-
Support my new podcast: Lefnire's Life Hacks
Salary based on location, gender, age, tech... from O'Reilly.
-
Support my new podcast: Lefnire's Life Hacks
Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
-
Support my new podcast: Lefnire's Life Hacks
Comparison of different data storage options when working with your ML models.
-
Support my new podcast: Lefnire's Life Hacks
Some numerical data nitty-gritty in Python.
- Show more