Chapter 1. ML Roles and the interview process
ML Interview process
Skills | Job titles | ||||
---|---|---|---|---|---|
Data scientist (DS) | ML Engineer (MLE) |
MLOps engineer |
Data engineer |
Data analyst |
|
Data visualization, communication | â â â | â â | â | â | â â â |
Data exploration, cleaning, intuition | â â â | â â â | â | â â â | â â â |
ML theory, statistics | â â â | â â â | â â | â | â |
Programming tools (Python, SQL) | â â â | â â â | â â â | â â â | â |
Software infrastructure (Docker, Kubernetes, CI/CD) | â
|
â to â â â | â â â | â | â |
Skills to the Job titles |
ML Lifecycle
Common ML job titles and how they correspond to the ML lifecycle.
- (A) Data
- (B) Machine learning development
- (C.1) ML/software infrastructure
- (C.2) ML hypothesis testing/monitoring
- (D) Reports and dashboards
The interview process can be a shortcut with a strong referral.
Chapter 2. ML Job Application and Resume
ML Job Application Guide
Applications Ă Effectiveness per application (EPA) â Interview invites
Job applications and their effectiveness per application
Asking for referral
Just a template:
Hi XYZ,
Hope you are doing well. I saw that ABC is hiring for DEF position and also just noticed that you are working there.
I am curious to learn about your working experience at ABC and if you would recommend applying? Thanks.
-
State a connection.
- They stated where they had met me before. In some cases, job seekers mention reading my blog or seeing me speak.
- They may mention something as simple as seeing one of my LinkedIn posts (itâs important to be specific about which one).
-
Be specific.
- They linked the job posting or mentioned details about why they were reaching out.
- Sometimes I get very broad questions, such as âHow do I enter data science?â In those situations, even if I have a coffee chat with them, Iâll be duplicating and repeating information that they could get in one of my blog posts, or from this book! A call or meeting should be meant for a deeper conversation.
-
Politeness goes a long way.
- They werenât pushy or rude and were very respectful of my time.
A significant amount of hiring occurs through channels such as cold-emailing managers, warm introductions via referrals, or networking events.
In fact, I advise my mentees to never apply through the job board/company website unless it is absolutely necessary.
-Suhas Pai, CTO of Bedrock AI
Experience Writing
Here are some more tips to improve your initial bullet points:
- Start the experience writing points with action verbs.
- Specify your impact, ideally in a way thatâs quantified and easy to understand.
- Add tools and programming languages you used.
Resume Resources
- âResume Checklistâ to make sure your resume looks polished (University of Waterloo)
- Action verbs for when you run out of ideas:Â âAction Verbs for Resume Writingâ (University of Washington)
- Resume format and checklist via CareerCup (North America focused)
- Resume templates on Overleaf (LaTeX markdown): The one Iâve used for the last five years is AltaCV (two-column). I personalized the template by removing the graphics, leaving only text. A popular single-column template is Modern-Deedy.
Table 2-2. Spreadsheet example of tracking applications and interviews
Application date | Company | Job posting URL | Interview type | Interview date | Interviewers | Emails | Notes | Results |
---|---|---|---|---|---|---|---|---|
2023-08-02 | ARI Corp | https://[url-to-job-description] | Hiring manager: behavioral and past project deep dive | 2023-08-15 | Xue-La (hiring manager) | [email protected] | Recruiter says this is the ad revenue ML team | Pending |
2023-08-03 | Taipaw AI | https://[url-to-job-description] | Recruiter screen | 2023-08â5 | Max (recruiter) | [email protected] | Asked about PyTorch exp | Passed |
Chapter 3. Technical Interview: Machine Learning Algorithms
As a rule of thumb, itâs vital to explain algorithms and ML concepts at two levels: on a simple âexplain like Iâm five years oldâ level and at a deeper, technical level, one more appropriate for a college course. A second rule of thumb is to be prepared to answer follow-up questions to these ML algorithm interview questions. This is so the interviewer knows that you didnât just memorize and then regurgitate the answer but that you can apply it to various real-life scenarios on the job.
Read: https://huyenchip.com/ml-interviews-book/
Chapter 4. Technical Interview: Model Training and Evaluation
- Data Preprocessing
- EDA
- Feature Engineering
- Model Training
Simplified ML Task selection
- ML Algorithms
Here are some algorithms and libraries that can be used as simple starting points for each task. Note that many libraries are versatile and can be used for multiple purposes (e.g., decision trees can be used for both classification and regression), but I list some simplified examples for understanding:- Classification: Algorithms include logistic regression, decision trees, random forest, and the like. Example Python libraries to start with include scikit-learn, CatBoost, and LightGBM.
- Regression: Algorithms include decision trees, and the like. Example Python libraries to start with are scikit-learn and statsmodels.
- Clustering (unsupervised learning): Algorithms include k-means clustering, DBSCAN, and the like. An example Python library to start with is scikit-learn.
- Time-series prediction: Algorithms include ARIMA, LSTM, and the like. Example Python libraries to start with include statsmodels, Prophet, Keras/TensorFlow, and so on.
- Recommender systems: Algorithms include matrix factorization techniques such as collaborative filtering. Example libraries and tools to start with include Sparkâs MLlib or Amazon Personalize on AWS.
- Reinforcement learning: Algorithms include multiarmed bandit, Q-learning, and policy gradient. Example libraries to start with include Vowpal Wabbit, TorchRL (PyTorch), and TensorFlow-RL.
- Computer vision: Deep learning techniques are common starting points for computer vision tasks. OpenCV is an important computer vision library that also supports some ML models. Popular deep learning frameworks include TensorFlow, Keras, PyTorch, and Caffe.
- Natural language processing: All the deep learning frameworks mentioned before can also be used for NLP. In addition, itâs common to try out transformer-based methods or find something on Hugging Face. Nowadays, using the OpenAI API and GPT models is also common. LangChain is a fast-growing library for NLP workflows. There is also Googleâs recently launched Bard.
- Model Evaluation
Different metrics for model evaluation are:- Classification: Accuracy, Precision, Recall, F1-Score, AUC (Area under curve), Confusion Matrics
- Regression: MSE (Mean Square Error), RMSE, R-squared
- Clustering: Silhouette coefficient, Calinski-Harabasz Index
- Ranking: Precision at K, Mean reciprocal rank (MRR), Normalized discounted cumulative gain (NDCG)
- Model Versioning
Chapter 5. Technical Interview: Coding
Overall, for Data scientists or ML Engineers, the questions are asked the following categories:
- A learning roadmap if you donât know Python
- Python questions related to data
- Python brainteaser questions
- SQL questions related to data
Resources
Basic
- CatBoost:Â âTutorialâ
- NumPy:Â âThe Absolute Basics for Beginnersâ
- pandas:Â â10 Minutes to pandasâ
Data and ML Interview Questions
Here are some resources for further practicing data- and ML-related interview questions:
- NumPy exercises with solutions (10.6k stars on GitHub at the time of writing)
- pandas exercises (9.2k stars on GitHub at the time of writing)
- pandas practice with Google Colab (University of Berkeley)
Brainteasers
Here are some of the patterns to look out for:
- Array and string manipulation
- Sliding window
- Two pointers
- Fast and slow pointers
- Merge intervals
- Graph traversal, such as depth first search (DFS) and breadth-first search (BFS)
Practice platforms for coding interviews
- The following are common platforms for practicing LeetCode-style or brainteaser-style coding questions: LeetCode
- An online platform offering coding challenges and interview preparation resources for software engineers and developers: HackerRank
- An online platform offering online coding tests and technical interviews: Pramp
- A free, online peer-to-peer platform for practicing technical interviews: Interviewing.io
- Anonymous mock interviews with engineers from tech companies
Curated study resources for coding interviews
The following are popular and useful guides for this type of question; basically, they are the same resources youâd use for a regular software engineer interview loop:
-
Cracking the Coding Interview by Gayle Laakmann McDowell: This book is considered one of the most popular introductions to big-tech-style coding interviews. Itâs focused on the software engineering interview loop, but if you are interviewing for ML roles with a lot of overlap with software engineering loops (such as some MLE roles, ML software engineer, etc.), then you will need to prepare for those interview loops as well.
-
âHow to Stand Out in a Python Coding Interviewâ by James Timmins.
Curated practice problems for coding interviews
For more patterns, you can check resources such as the LeetCode 75 Study Plan for a full list of categories and read about them in resources such as the blog post â7 of the Most Important LeetCode Patterns for Coding Interviewsâ by Hunter Johnson.
Resources for SQL Coding Interview Questions
The example given here is for you to get an idea of basic questions in SQL that test on joins. However, more advanced tables may include more tables, more complex tables, window functions, subqueries, and so on. Use these resources to further your preparation:
- Learn SQL Basics (Coursera, UC Davis).
- SQL questions on LeetCode: as usual, you can (and should) start with the free questions first; there are a lot of them.
- âAdvanced SQL Queries: Window Function Practiceâ (from OâReilly; requires a sign-in).
Chapter 6. Technical Interview: Model Deployment and End-to-End ML
Kubernetes Books
- Kubernetes: Up and Running by Brendan Burns et al.
- Kubernetes Best Practices by Brendan Burns et al.
In terms of implementation, here are some common tools for visualization and monitoring:
- Custom dashboards: Seaborn, Plotly, Matplotlib, and Bokeh
- End-to-end platforms: Amazon SageMaker dashboards and Googleâs Vertex AI monitoring
- Other business intelligence (BI) tools: Microsoft Power BI, Tableau, and Looker
Tools for data checks or data unit tests include:
- Great Expectations
- deequ
- dbt (pipelines can include tests)
For greater depth on this subject, I recommend the following resources:
- âML Systems Design Interview Guideâ by Patrick Halina
- Machine Learning System Design Interview by Ali Aminian and Alex Xu (ByteByteGo)
- Search YouTube videos on example system design interviews for ML; this is a good example: âHarmful Content Removal: Machine Learning (System Design) Staff Level Mentorshipâ by Interviewing.io. (This question is aimed at the L7 staff position.)
Source
- https://www.youtube.com/watch?v=h4wb3YktQCY
- https://learning.oreilly.com/library/view/machine-learning-interviews/9781098146535/ch01.html