A/B testing is the single most common experimentation topic in data science interviews. If you're interviewing at Meta, Google, Netflix, Airbnb, or any company with a product analytics function, expect at least one round focused on experimentation.
This isn't a topic you can bluff through. Interviewers push past textbook definitions into messy real-world scenarios: what happens when your test has network effects, when a PM wants to peek at results on day 3, or when your metric moves in one direction but revenue doesn't budge.
Here are the top 30 A/B testing interview questions organized by the five areas that come up repeatedly, with real questions asked at top companies.
A/B Testing Interview Questions
A/B testing is the most common experimentation topic in data science interviews. These questions cover experiment design, statistical foundations, metric selection, common pitfalls, and advanced methods asked at top tech companies.
Experiment Design
This is where most interviews start. Before you touch a p-value, can you set up the experiment correctly? Interviewers assess whether you think about randomization units, sample size, runtime, and guardrail metrics before jumping into analysis. The most common mistake candidates make is diving into statistical testing without nailing the design.
One question you'll almost certainly face: "how would you determine the sample size for this experiment?" The answer depends on the relationship between effect size, significance level, and statistical power. As the chart below shows, required sample size drops sharply as your minimum detectable effect grows. Detecting a 0.2 Cohen's d shift needs about 400 users per group at 80% power, but a 0.5 shift only needs about 64.

Experiment Design
The foundation of any A/B test. Interviewers want to see that you can set up an experiment correctly before worrying about analysis.
How would you design an A/B test for a new homepage layout?
Sample Answer
Start by defining the primary metric (e.g., streaming minutes per user per day) and guardrail metrics (e.g., revenue, crash rate). Randomize at the user level, not session level, because layout changes affect long-term behavior. Use a 50/50 control/treatment split. Run a power analysis: with a 2% MDE on streaming minutes (baseline ~45 min/day, SD ~30 min), you need roughly 35K users per arm at 80% power. Run for at least 2 weeks to capture weekly seasonality.
How do you determine sample size for an A/B test?
What is the difference between an A/B test and an A/B/n test?
What are guardrail metrics and why do they matter?
A PM wants to run an experiment on the checkout flow but worries about revenue loss. How do you address this?
How do you decide how long to run an A/B test?
Statistical Foundations
You don't need to derive the CLT on a whiteboard. But you need crisp explanations of p-values, confidence intervals, and error types, especially when the interviewer says "explain this to a non-technical stakeholder." The real test is whether you understand what these concepts mean in practice, not whether you can recite formulas.
The visual below breaks down the full hypothesis testing framework you should have internalized: the two-tailed test distribution, rejection regions at α/2, and the distinction between Type I errors (false positives, rejecting a true null) and Type II errors (false negatives, missing a real effect). Interviewers love asking "what's worse, a Type I or Type II error?" The answer always depends on the business context.

Statistical Foundations
The math behind A/B testing. You don't need to derive formulas on a whiteboard, but you need to explain what they mean and when they break.
How would you explain a p-value to a non-technical product manager?
Sample Answer
A p-value answers: 'If the new feature had zero real effect, how likely would we see data this extreme just by chance?' If p = 0.03, there's a 3% chance of seeing this result if the feature truly does nothing. We set a threshold (usually 5%) — below it, we call the result statistically significant. A p-value does NOT mean there's a 3% chance the feature doesn't work. And it does NOT tell you how large the effect is.
What is a Type I error vs. a Type II error in A/B testing?
What is the difference between statistical significance and practical significance?
Your A/B test shows p = 0.048. The PM says 'ship it.' What do you do?
What is the multiple comparisons problem and how do you handle it?
Explain the difference between a one-tailed and two-tailed test. When would you use each?
Metric Selection & Analysis
Choosing the wrong metric is the fastest way to fail an experimentation question. Interviewers want to see that you can translate a vague business goal ("improve user engagement") into a specific, measurable primary metric with appropriate guardrails. Senior candidates get pushed on ratio metrics, the delta method, and variance reduction.
Once your experiment concludes, you'll need to interpret the results, and that means understanding confidence intervals. A 95% CI doesn't mean "95% chance the true value is in this range." It means that if you repeated the experiment many times, about 95% of the intervals would contain the true parameter. The repeated sampling visualization below makes this concrete: most intervals (blue) capture the true mean μ, but some (red) miss it entirely. Interviewers test whether you can explain this distinction clearly.

Metric Selection & Analysis
Choosing the right metric is half the battle. Interviewers probe whether you can translate business goals into measurable outcomes and catch subtle metric traps.
How do you choose the primary metric for an A/B test?
Sample Answer
The primary metric should capture long-term user value, not just immediate behavior. For a new onboarding flow, D7 retention (% of users active 7 days later) beats 'completed onboarding' because completion is too easy to game (shorten onboarding = higher completion, worse retention). Pick one primary metric to make the ship/no-ship decision. Use secondary metrics to understand why, and guardrails to make sure nothing breaks.
What's the difference between a ratio metric and a mean metric?
Your experiment shows a 5% increase in clicks but no change in revenue. What happened?
What is a novelty effect and how do you detect it?
How would you handle outliers in A/B test metrics like revenue?
What is an Overall Evaluation Criterion (OEC) and how do you construct one?
Common Pitfalls
This is where interviewers separate candidates who have actually run experiments from those who've only studied them. Peeking, sample ratio mismatch, novelty effects, interference: these are the failure modes that cost companies real money. If you can identify and explain these without prompting, you'll stand out.
Common Pitfalls
Where experiments go wrong. These questions separate candidates who have actually run experiments from those who have only read about them.
What is peeking in A/B testing and why is it a problem?
Sample Answer
Peeking is checking experiment results before the planned end date and making a ship/kill decision based on early data. The problem: statistical tests assume you look at the data once. Each peek inflates your false positive rate — with daily checks over 2 weeks, your effective α can exceed 25%. Solutions: (1) use sequential testing methods like always-valid p-values that account for multiple looks, (2) use a Bayesian framework with a loss function, or (3) commit to not looking until the test is complete.
What is a sample ratio mismatch (SRM) and what causes it?
Your randomization has a bug — 60% of traffic went to treatment. Is the experiment salvageable?
What is Simpson's paradox and how can it affect A/B test results?
How do network effects complicate A/B testing? Give an example.
You discover that your control and treatment groups have different distributions of iOS vs. Android users. What do you do?
Advanced Methods
Staff+ and senior data science roles get questions on experimentation infrastructure and cutting-edge methodology. CUPED, switchback experiments, Bayesian testing, and multi-armed bandits show up frequently at companies that run thousands of experiments per year. You won't get these at entry level, but they're expected for L5+ at Meta or Google.
Advanced Methods
Senior and Staff+ candidates get asked about experimentation infrastructure and methodology at scale. These topics show up at companies running thousands of experiments per year.
What is CUPED and how does it help A/B testing?
Sample Answer
CUPED (Controlled-experiment Using Pre-Experiment Data) reduces metric variance by adjusting for pre-experiment behavior. If a user's metric during the experiment is correlated with their pre-experiment value, you subtract out that predictable component: Y_adjusted = Y - θ·(X - E[X]), where θ = Cov(Y,X)/Var(X). This can reduce variance by 50%+ for metrics like revenue or sessions, meaning you need roughly half the sample size or can detect smaller effects. Use it when you have stable pre-experiment data.
What is the difference between frequentist and Bayesian A/B testing?
Explain switchback experiments. When are they better than user-level randomization?
What are multi-armed bandits and when would you use them instead of a traditional A/B test?
How would you design an experimentation platform from scratch?
What is the delta method and when do you need it in A/B testing?
Watch: A/B Testing Interview Walkthrough
See how a Google Data Scientist approaches A/B testing interview questions, from experiment design through statistical analysis and decision-making.
How to Prepare for A/B Testing Interviews
Think in frameworks, not formulas
Every A/B testing question follows the same skeleton: define the hypothesis, choose metrics, design the experiment, analyze results, make a decision. Practice walking through this framework out loud until it's automatic. The interviewer is evaluating your structure as much as your statistical knowledge.
Practice the "explain to a PM" questions
At least one question will be: "explain [statistical concept] to a non-technical stakeholder." Practice explaining p-values, confidence intervals, and statistical power in plain language. If you find yourself saying "probability of observing the data given the null hypothesis," simplify further.
Build end-to-end muscle memory
Don't study concepts in isolation. Take a scenario like "we want to test a new checkout flow" and walk through the entire lifecycle: metric selection, power analysis, randomization, runtime, analysis, edge cases, decision. Timed practice at 30 minutes per question builds the speed you need for live interviews.
How Ready Are You for A/B Testing Interviews?
1 / 6A PM asks you to test 5 button colors simultaneously. What's your first concern?
Frequently Asked Questions
How much statistics do I need for A/B testing interviews?
You need working intuition, not textbook proofs. Know what a p-value means (and doesn't), how to size an experiment, what drives power, and common pitfalls like peeking and multiple comparisons. If you can explain confidence intervals to a PM without jargon, you're in good shape.
Do I need to know Bayesian A/B testing?
For most data science roles, frequentist is sufficient. Bayesian comes up at companies that use it in production like some teams at Netflix and Spotify. Know the conceptual difference: Bayesian gives you P(hypothesis|data) and lets you peek without penalty, frequentist gives you P(data|null hypothesis). If interviewing at a Bayesian-forward company, study credible intervals and loss functions.
Which companies ask the most A/B testing questions?
Meta, Google, Netflix, Microsoft, Airbnb, Uber, Spotify, and LinkedIn have the heaviest experimentation interview focus. Any company with a mature experimentation platform will ask these questions for data science roles. Quant trading firms and pure ML roles typically don't focus on A/B testing.
Should I know how to code A/B test analyses?
Yes — expect to write Python or SQL during interviews. Common asks include computing a t-test from raw data, writing a power calculation, aggregating metrics by experiment arm in SQL, or simulating a sequential test. You won't need to implement CUPED from scratch, but know the concept well enough to explain it.
How are A/B testing questions different for Data Analyst vs. Data Scientist roles?
Data Analyst interviews focus on metric definition, interpreting results, and communicating tradeoffs to stakeholders. Data Scientist interviews go deeper into power analysis, variance reduction (CUPED), ratio metrics and the delta method, sequential testing, and network effects. Staff+ roles add experimentation platform design.
What if I've never actually run a real A/B test?
Practice with public datasets like Kaggle A/B test datasets and walk through the full lifecycle: hypothesis, metric selection, power analysis, randomization, analysis, and decision. Frame personal or academic projects as experiments. Interviewers care about your reasoning process, not your job title.

