Amazon is one of the world’s largest internet company and the biggest online retailer. With so many products and services, Amazon is constantly in the market for motivated and innovative data scientists to meet its ever-growing data needs.
As a company, Amazon prides itself on being known for its disruptive innovation of well-established industries through technological innovation. Its vision is to become the Earth’s most customer-centric company.
What is the Data Science Role?
The role of a data scientist at Amazon depends on the specific team. Amazon is a large conglomerate corporation with many teams working on different products and services.
These teams include AWS (Amazon Web Services), Alexa, forecasting team in the Supply Chain Optimization Technologies (SCOT), the NASCO Team (North America Supply Chain Organization), Middle Mile Planning Research and Optimization Science (mmPROS) team, and many more.
General requirements are:
- Designing, developing, evaluating, deployment and updating of data-driven models and analytical solutions for machine learning (ML) and natural language (NL) applications.
- Develop cutting edge data pipelines, build accurate predictive models, and deploy automated software solutions to provide forecasting insights.
- Research, design, and improve models with business impact in mind.
Required Skills
- Masters’ degree in any quantitative field such as Statistics, Quantitative Finance, Economics, Computer Science, Mathematics, Physics, Computational Biology, Operational Research) or equivalent practical experience.
- 2+ year’s work experience (4+ years for Senior Data Scientist) in an analytical role involving machine learning techniques, data extraction, analysis, and communication.
- Proficiency (4+ years’ experience for Senior Data Scientist) in statistical software packages and functional programming languages such as R, Stata, Matlab, Python, SQL, C++, or Java.
- Experience in designing and implementing machine learning algorithms tailored to specific business needs and tested on large data set.
- Experience in data mining and using databases in a business environment with large-scale, complex datasets.
- Excellent verbal and written communication skills with the ability to effectively advocate technical solutions to research scientists, engineering teams and business audiences.
What are the types of data scientist?
Amazon has three main types of job roles related to data science. Here’s what the Amazon data science skills graph looks like for the interview and their day to day roles:

The type of data science teams on Amazon are listed below:
Data Analytics/Business Intelligence
This role focuses mainly on creating forecasts, identifying strategic opportunities, and providing informed and business-related insights. Data visualizations tools like Tableau and data warehousing skills and are often required.
Machine Learning Research Scientists
This role focuses mainly on cutting edge researches in areas like NLP, deep learning, video recommendations, streaming data analysis, social networks etc. Generally, the position spans from PhDs up to internationally renowned researchers.
Read more on the Amazon machine learning interview.
Data/Applied Scientists
The most popular and generalized role, the data scientist dive into big data sets in order to build simulations and experimentation systems at scale, build optimization algorithms, and leverage cutting-edge technologies across Amazon
Data Engineer
This is the team that built tools or products that are used inside and outside the company. Think AWS or Alexa. The role significantly overlap with ML Engineer positions. Object-oriented languages like C++/Java skills often required.
The Amazon Interview

The interview process at Amazon is similar to other tech companies. The first thing to note is that Amazon does not do take-home challenges. The interview process consists of instead an initial phone screen by a recruiter or hiring manager, then technical phone screen, and finally an onsite interview which is usually done in five stages with an informal interview over lunch.
Initial Screen
The initial phone interview is usually conducted by a recruiter or hiring manager at Amazon. This is a resume-based phone interview that normally goes over your resume as well as the position on the team. Given that Amazon is one of the largest organizations in the world, the hiring manager will explain what their team does and where it falls in the organization.
The Technical Screen
The technical screening comes after the initial phone screen. This interview will involve coding, statistics, and machine learning. You can expect, at least, two coding questions, one involving SQL and the other an algorithm coding type question. The coding portion is done over a shared code editor. Remember to take the time to go over your thought process with the interviewer. There is also a section on “approach”, detailing how you got to the solution and why you use the steps you used.
Additionally the interviewer will ask a machine learning concept question. This is generally pretty conversational but remember to brush up on general ML concepts.
Example Technical Screen Question
We’re given two tables. Table A has one million records with fields ID and AGE. Table B has 100 records with two fields as well, ID and SALARY.
Let’s say in Table B, the mean salary is 50K and the median salary is 100K.
SELECT A.ID,A.AGE,B.SALARY
FROM A
LEFT JOIN B
ON A.ID = B.ID
WHERE B.SALARY > 50000
Given the query above gets run, about how many records would be returned?
Onsite Interview
After passing the technical phone screen, the recruiter will arrange an onsite interview. Most of the topics will involve A/B testing, more machine learning conceptual questions, exploratory data analysis, and some coding questions.
This stage comprises of 5 or 6 back-to-back interviews, each, one on one, or with two people; a manager and a junior data scientist. In total, the onsite interview will last for around six hours.
This is what the process looks like:
- A behavioral interview to access culture-fit
- A technical interview involving data analysis and ab testing
- SQL-based interview with a data scientist
- Algorithms and optimizations
- Machine learning and modeling case study interview
Each stage will likely test your knowledge of Amazon leadership principles and your critical thinking and problem-solving ability.
Tips and Tricks
- Amazon cares about technical ability a lot when it comes to the data science roles! Remember to brush up on solving algorithms, optimizing queries, and memorizing how many of the machine learning algorithms work under the hood.
- Amazon assesses every applicant based on their 14 leadership principles. Try to memorize all the 14 leadership principles as you would be expected to exhibit these principles in the behavioral interview. The tip here is to think about your past projects or experiences and how you have demonstrated those principles.
- Practice modeling, machine learning, and business case questions. Amazon will likely ask many types of vague case questions in which they’ll expect you to apply machine learning to a business scenario. Check out how we approach a case question on Amazon’s products.
Amazon Data Science Interview Questions
- Given a large string and a smaller string, write a program to find out if the smaller string can be generated from letters from the larger string.
- The probability that an item is at location A is 0.6, and 0.8 at location B. What is the probability that item would be found on the Amazon website?
- Implement the union and intersection of two arrays (in an efficient way). Note that elements of the two given array may be repeated but cannot be repeated in union and intersection array.
- You have two files in HDFS. One has a date range with two columns: Start date and End date. Another file has two columns with Date and number of visitors field. Write Spark code which gives the date range with the most number of visitors.
- Implement a circular queue using an array.
- What’s the difference between Lasso and Ridge Regression?
- When users are navigating through the Amazon website, they are performing many actions, clicking on buttons, doing searches, etc…. What is the best way to model if their next action would be a purchase?
- How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
- What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?
- How would you modify a table with over a billion rows? (Check out the solution in a mock interview).