Top 30 A/B Testing Interview Questions (2026)

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 13, 2026

A/B testing is the single most common experimentation topic in data science interviews. If you're interviewing at Meta, Google, Netflix, Airbnb, or any company with a product analytics function, expect at least one round focused on experimentation.

This isn't a topic you can bluff through. Interviewers push past textbook definitions into messy real-world scenarios: what happens when your test has network effects, when a PM wants to peek at results on day 3, or when your metric moves in one direction but revenue doesn't budge.

Here are the top 30 A/B testing interview questions organized by the five areas that come up repeatedly, with real questions asked at top companies.

Intermediate30 questions

A/B Testing Interview Questions

A/B testing is the most common experimentation topic in data science interviews. These questions cover experiment design, statistical foundations, metric selection, common pitfalls, and advanced methods asked at top tech companies.

Data ScientistData AnalystProduct Data ScientistMetaGoogleNetflixAirbnbUberSpotifyMicrosoftLinkedIn

Experiment Design

This is where most interviews start. Before you touch a p-value, can you set up the experiment correctly? Interviewers assess whether you think about randomization units, sample size, runtime, and guardrail metrics before jumping into analysis. The most common mistake candidates make is diving into statistical testing without nailing the design.

One question you'll almost certainly face: "how would you determine the sample size for this experiment?" The answer depends on the relationship between effect size, significance level, and statistical power. As the chart below shows, required sample size drops sharply as your minimum detectable effect grows. Detecting a 0.2 Cohen's d shift needs about 400 users per group at 80% power, but a 0.5 shift only needs about 64.

Sample size calculation: required sample size vs. effect size at different power levels, with formulas for significance level and statistical power

Experiment Design

The foundation of any A/B test. Interviewers want to see that you can set up an experiment correctly before worrying about analysis.

How would you design an A/B test for a new homepage layout?

SpotifySpotifyMediumExperiment Design

Sample Answer

Start by defining the primary metric (e.g., streaming minutes per user per day) and guardrail metrics (e.g., revenue, crash rate). Randomize at the user level, not session level, because layout changes affect long-term behavior. Use a 50/50 control/treatment split. Run a power analysis: with a 2% MDE on streaming minutes (baseline ~45 min/day, SD ~30 min), you need roughly 35K users per arm at 80% power. Run for at least 2 weeks to capture weekly seasonality.

Practice more Experiment Design questions

Statistical Foundations

You don't need to derive the CLT on a whiteboard. But you need crisp explanations of p-values, confidence intervals, and error types, especially when the interviewer says "explain this to a non-technical stakeholder." The real test is whether you understand what these concepts mean in practice, not whether you can recite formulas.

The visual below breaks down the full hypothesis testing framework you should have internalized: the two-tailed test distribution, rejection regions at α/2, and the distinction between Type I errors (false positives, rejecting a true null) and Type II errors (false negatives, missing a real effect). Interviewers love asking "what's worse, a Type I or Type II error?" The answer always depends on the business context.

Hypothesis testing framework: two-tailed test distribution with rejection regions, Type I and Type II error definitions, p-value interpretation, and common test variants

Statistical Foundations

The math behind A/B testing. You don't need to derive formulas on a whiteboard, but you need to explain what they mean and when they break.

How would you explain a p-value to a non-technical product manager?

GoogleGoogleEasyStatistics

Sample Answer

A p-value answers: 'If the new feature had zero real effect, how likely would we see data this extreme just by chance?' If p = 0.03, there's a 3% chance of seeing this result if the feature truly does nothing. We set a threshold (usually 5%) — below it, we call the result statistically significant. A p-value does NOT mean there's a 3% chance the feature doesn't work. And it does NOT tell you how large the effect is.

Practice more Statistical Foundations questions

Metric Selection & Analysis

Choosing the wrong metric is the fastest way to fail an experimentation question. Interviewers want to see that you can translate a vague business goal ("improve user engagement") into a specific, measurable primary metric with appropriate guardrails. Senior candidates get pushed on ratio metrics, the delta method, and variance reduction.

Once your experiment concludes, you'll need to interpret the results, and that means understanding confidence intervals. A 95% CI doesn't mean "95% chance the true value is in this range." It means that if you repeated the experiment many times, about 95% of the intervals would contain the true parameter. The repeated sampling visualization below makes this concrete: most intervals (blue) capture the true mean μ, but some (red) miss it entirely. Interviewers test whether you can explain this distinction clearly.

Confidence intervals: repeated sampling coverage demonstration showing 15 samples with intervals that capture or miss the true mean, plus margin of error and t-interval formulas

Metric Selection & Analysis

Choosing the right metric is half the battle. Interviewers probe whether you can translate business goals into measurable outcomes and catch subtle metric traps.

How do you choose the primary metric for an A/B test?

AirbnbAirbnbMediumMetrics

Sample Answer

The primary metric should capture long-term user value, not just immediate behavior. For a new onboarding flow, D7 retention (% of users active 7 days later) beats 'completed onboarding' because completion is too easy to game (shorten onboarding = higher completion, worse retention). Pick one primary metric to make the ship/no-ship decision. Use secondary metrics to understand why, and guardrails to make sure nothing breaks.

Practice more Metric Selection & Analysis questions

Common Pitfalls

This is where interviewers separate candidates who have actually run experiments from those who've only studied them. Peeking, sample ratio mismatch, novelty effects, interference: these are the failure modes that cost companies real money. If you can identify and explain these without prompting, you'll stand out.

Common Pitfalls

Where experiments go wrong. These questions separate candidates who have actually run experiments from those who have only read about them.

What is peeking in A/B testing and why is it a problem?

MediumPitfalls

Sample Answer

Peeking is checking experiment results before the planned end date and making a ship/kill decision based on early data. The problem: statistical tests assume you look at the data once. Each peek inflates your false positive rate — with daily checks over 2 weeks, your effective α can exceed 25%. Solutions: (1) use sequential testing methods like always-valid p-values that account for multiple looks, (2) use a Bayesian framework with a loss function, or (3) commit to not looking until the test is complete.

Practice more Common Pitfalls questions

Advanced Methods

Staff+ and senior data science roles get questions on experimentation infrastructure and cutting-edge methodology. CUPED, switchback experiments, Bayesian testing, and multi-armed bandits show up frequently at companies that run thousands of experiments per year. You won't get these at entry level, but they're expected for L5+ at Meta or Google.

Advanced Methods

Senior and Staff+ candidates get asked about experimentation infrastructure and methodology at scale. These topics show up at companies running thousands of experiments per year.

What is CUPED and how does it help A/B testing?

MicrosoftMicrosoftHardAdvanced

Sample Answer

CUPED (Controlled-experiment Using Pre-Experiment Data) reduces metric variance by adjusting for pre-experiment behavior. If a user's metric during the experiment is correlated with their pre-experiment value, you subtract out that predictable component: Y_adjusted = Y - θ·(X - E[X]), where θ = Cov(Y,X)/Var(X). This can reduce variance by 50%+ for metrics like revenue or sessions, meaning you need roughly half the sample size or can detect smaller effects. Use it when you have stable pre-experiment data.

Practice more Advanced Methods questions

Watch: A/B Testing Interview Walkthrough

See how a Google Data Scientist approaches A/B testing interview questions, from experiment design through statistical analysis and decision-making.

How to Prepare for A/B Testing Interviews

Think in frameworks, not formulas

Every A/B testing question follows the same skeleton: define the hypothesis, choose metrics, design the experiment, analyze results, make a decision. Practice walking through this framework out loud until it's automatic. The interviewer is evaluating your structure as much as your statistical knowledge.

Practice the "explain to a PM" questions

At least one question will be: "explain [statistical concept] to a non-technical stakeholder." Practice explaining p-values, confidence intervals, and statistical power in plain language. If you find yourself saying "probability of observing the data given the null hypothesis," simplify further.

Build end-to-end muscle memory

Don't study concepts in isolation. Take a scenario like "we want to test a new checkout flow" and walk through the entire lifecycle: metric selection, power analysis, randomization, runtime, analysis, edge cases, decision. Timed practice at 30 minutes per question builds the speed you need for live interviews.

How Ready Are You for A/B Testing Interviews?

1 / 6
Experiment Design

A PM asks you to test 5 button colors simultaneously. What's your first concern?

Frequently Asked Questions

How much statistics do I need for A/B testing interviews?

You need working intuition, not textbook proofs. Know what a p-value means (and doesn't), how to size an experiment, what drives power, and common pitfalls like peeking and multiple comparisons. If you can explain confidence intervals to a PM without jargon, you're in good shape.

Do I need to know Bayesian A/B testing?

For most data science roles, frequentist is sufficient. Bayesian comes up at companies that use it in production like some teams at Netflix and Spotify. Know the conceptual difference: Bayesian gives you P(hypothesis|data) and lets you peek without penalty, frequentist gives you P(data|null hypothesis). If interviewing at a Bayesian-forward company, study credible intervals and loss functions.

Which companies ask the most A/B testing questions?

Meta, Google, Netflix, Microsoft, Airbnb, Uber, Spotify, and LinkedIn have the heaviest experimentation interview focus. Any company with a mature experimentation platform will ask these questions for data science roles. Quant trading firms and pure ML roles typically don't focus on A/B testing.

Should I know how to code A/B test analyses?

Yes — expect to write Python or SQL during interviews. Common asks include computing a t-test from raw data, writing a power calculation, aggregating metrics by experiment arm in SQL, or simulating a sequential test. You won't need to implement CUPED from scratch, but know the concept well enough to explain it.

How are A/B testing questions different for Data Analyst vs. Data Scientist roles?

Data Analyst interviews focus on metric definition, interpreting results, and communicating tradeoffs to stakeholders. Data Scientist interviews go deeper into power analysis, variance reduction (CUPED), ratio metrics and the delta method, sequential testing, and network effects. Staff+ roles add experimentation platform design.

What if I've never actually run a real A/B test?

Practice with public datasets like Kaggle A/B test datasets and walk through the full lifecycle: hypothesis, metric selection, power analysis, randomization, analysis, and decision. Frame personal or academic projects as experiments. Interviewers care about your reasoning process, not your job title.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn