A geo experiment is a controlled test that measures the incremental impact of advertising by comparing business outcomes, revenue, orders, signups, between geographic markets where a campaign ran and matched markets where it didn't.
It's one of the cleanest methods for measuring whether advertising is actually driving business results.
A geo experiment treats different geographic markets as treatment and control groups
The Logic Behind Geo Experiments
The idea is simple: run your campaign in some markets (test), pause it or go dark in others (control), and measure the difference in outcomes.
If your test markets show 14% higher weekly revenue than your control markets during the campaign period (accounting for their pre-test baseline), that gap is attributable to the advertising. You can convert that lift into incremental revenue and calculate an iROAS.
The reason geo experiments work well is that geographic markets are relatively self-contained. A consumer in Denver is unlikely to be directly influenced by advertising running only in Austin. This geographic separation creates a reasonably clean experimental boundary, something that's hard to achieve when you're splitting a digital audience, where the same person might appear on multiple devices, browsers, or platforms.
When to Use a Geo Experiment
Geo experiments are particularly useful in situations where audience-level holdout tests aren't possible:
Channels you can't suppress by audience: Television, out-of-home, radio, podcast, and even some forms of digital video can't be turned off for a specific subset of users. You can turn them off by market.
Measuring total business lift: Attribution tools report conversions they can observe. A geo experiment measures actual revenue in your backend system, it captures online purchases, in-store sales, and any other channel that flows through your real business data.
Calibrating other measurement approaches: A geo experiment provides ground-truth causal data that you can use to validate MMM outputs or calibrate your attribution model.
Key Tools for Running Geo Experiments
Google GeoX / Google Ads Experiments: Built into Google Ads, GeoX helps you design matched market experiments directly in the platform. It includes a market selection tool that identifies which markets are similar to each other based on historical conversion data.
Meta GeoLift: An open-source R library from Meta that uses synthetic control methodology to estimate lift from geo experiments. It handles market matching, power analysis, and result interpretation. Free to use, requires R skills.
Uber CausalImpact: A general-purpose Bayesian time-series library (also in R) originally developed at Google. Not marketing-specific, but flexible and widely used for geo experiment analysis.
How Geo Experiments Differ from Holdout Tests
A holdout test splits an audience, you show ads to 80% of your retargeting list and suppress them for the other 20%. A geo experiment splits markets, you run ads in some cities or regions and not others.
The practical difference:
- Holdout tests work when you can control ad delivery at the user level (retargeting, email, in-app). They're faster to set up and require less spend to run.
- Geo experiments work when you can't control at the user level, or when you want to measure total business outcomes across all channels simultaneously.
Geo experiments are generally considered more robust for measuring brand or reach campaigns. Holdout tests are better for measuring direct-response digital campaigns where audience-level suppression is possible.
What a Good Geo Experiment Requires
Market matching: Before the test starts, you need to verify that test and control markets behave similarly. Run at least 4–8 weeks of pre-period data to confirm the markets move together on your key metric. Mismatched markets will produce biased results.
Sufficient duration: Minimum 4 weeks. Less than that and you'll pick up day-of-week noise, short-term spikes, or promotional effects that obscure the true signal. For high-carryover channels like TV, 6–8 weeks is better.
Enough markets: You need statistical power. With 3 test markets and 3 control markets, you have very limited ability to detect moderate effects. Aim for at least 6 total markets, ideally more. The minimum detectable effect should be calculated before you start.
Business outcome measurement: Measure actual revenue or orders from your backend system, not platform-attributed conversions. Attributed conversions in the test group will always look inflated compared to what actually happened.
Common Mistakes
Not pre-testing market comparability is the most common error. Two markets may look similar on population and demographics but behave very differently on your specific metric. Always verify with historical data first.
Running the test for too short a period is the second most common mistake. Four weeks feels like a long time to pause advertising in some markets, but two weeks is rarely enough to separate signal from noise.
Measuring the wrong thing is the third: using platform-attributed conversions instead of backend revenue. The whole point of a geo experiment is to get outside the attribution system and measure real business impact.