Should You Let AI Run Your Meta, TikTok and Google Ads, or Just Analyse Them?
Meta says Advantage+ delivers 22% higher ROAS than manual campaigns. An independent analysis of 640 geo-holdout incrementality experiments found it under-performed manual by 12% by the time campaigns wrapped, with post-treatment lift of +17% for Advantage+ versus +32% for manual (Haus, 2024). Both numbers are real. The gap between them is the entire point of this post.
Key Takeaways
- The evidence on AI-run ad campaigns is mixed, not settled. Platform-reported wins consistently overstate true incrementality once you measure it properly.
- Analysis-only is the most under-used AI lane in paid media, and the one with the cleanest ROI. No black-box problem, no incentive misalignment, no guessing.
- AI creative works, but only when it doesn't look AI. Perception is the variable most teams ignore.
- Account shape (monthly conversions, creative velocity, measurement rigour) decides which side of the split you sit on. There isn't one right answer for everyone.
What does the honest evidence say about Advantage+, Performance Max, and Smart+?
Platform-reported wins look strong. Meta's own number for Advantage+ is a 22% ROAS lift. Google has published a 14% conversion uplift for AI Max. TikTok claims Smart+ outperforms manual setups in more than 80% of early tests. The problem with all three is that the platforms are marking their own homework, using their own attribution, inside their own walls. That's not fraud; it's a methodology choice that flatters the model they're selling.
Independent measurement tells a more nuanced story. The Haus study above, which ran geo-holdout experiments averaging $14M in annual spend per client, found Advantage+ beat manual by 9% at campaign midpoint, then slipped 12% below manual by the end. Post-treatment lift (the number your CFO actually cares about because it measures whether the spend changed behaviour) was almost twice as high for manual campaigns (Haus, 2024).
Why the gap? Because AI-run campaigns optimise toward whatever the platform can measure today, and the things they can measure are disproportionately lower-funnel. Upper-funnel spend gets starved. In-platform conversions look good; long-term brand lift looks worse. That's the pattern to remember whenever you read a platform's "AI delivered X% more" headline.
Our take: The measurement window is the whole argument. If you judge Advantage+ on what it does inside the platform during the flight, it looks like a win. If you judge it on whether it moved genuine demand after the flight ended, manual still wins on average. Pick the KPI your CFO actually cares about, then decide.
Where does AI actually win in paid media?
The single most under-used AI lane in paid media is weekly performance analysis. Pattern spotting across campaigns, anomaly detection on spend and CPA, cross-platform reconciliation against CRM data, creative post-mortems clustered by hook and format. This is where the evidence is unambiguous and the risk is close to zero: you can see exactly what the model is doing and check its work. No black box, no incentive misalignment.
The reason analysis wins cleanly is the same reason execution gets messy. When AI is running your bids, it's optimising toward a signal the platform controls. When AI is analysing your data, it's reading inputs you control. The difference is subtle in theory and enormous in practice. A model analysing last week's spend across Google, Meta, and TikTok can tell you what happened. A model bidding inside Google's walled garden can only tell you what Google thinks happened.
The specific workflows that work are narrow and reliable: weekly readouts that summarise spend vs target by campaign, creative clusters tagged by hook and format, anomaly alerts when a campaign drifts beyond 15% of its seven-day baseline, and conversion reconciliation that lines up GA4, platform-reported conversions, and CRM closed-won. None of these require the AI to decide anything. They just require it to read, summarise, and flag.
When does letting AI run the buying actually make sense?
Full AI-run campaigns work reliably in a narrow but real set of conditions: accounts with at least 30 monthly conversions of stable signal, broad-catalogue ecommerce where creative velocity is the bottleneck, app install with clean events, and retargeting pools where signal density is already high. Outside those conditions, the independent data gets uglier fast.
The 30-conversions-per-month threshold is the single most important number nobody talks about. Below it, the optimisation signal is too noisy for the algorithm to exit the noise floor. It'll spend, it'll report conversions, and the numbers will drift. Above it, the algorithm has enough to work with and the pattern matching starts to compound. This is why the same Advantage+ campaign delivers for a DTC brand pushing 2,000 orders a week and ruins a B2B account pushing 12 MQLs a month.
The second signal is verticals. Broad DTC ecommerce with dozens of SKUs, subscription offers with a clean activation event, app installs with deterministic attribution, and direct-response lead-gen in unregulated categories all work well. B2B with long sales cycles, regulated verticals with creative constraints, and any account where brand safety or channel placement matters do not.
Our rule: When a client is above the 30-conversion threshold in a vertical where AI-run actually works, we still only hand over 40-60% of spend to the algorithm. The rest stays manual, on the same offer, running as a structural holdout. That way, if Advantage+ or PMax starts drifting, we catch it in a comparison we already own, not a quarterly incrementality study we have to commission.
Where is letting AI run your ads actually a trap?
Three account shapes lose to a competent human almost every time: low-conversion-volume accounts below the signal threshold, accounts where creative diversity matters more than media optimisation, and accounts where incrementality (not platform-reported conversions) is the real KPI. There's also a fourth, less-discussed one: the convergence trap.
The convergence trap is this. If every advertiser in your category runs Advantage+ or Performance Max on similar signals, the algorithm optimises everyone toward the same audiences, the same time slots, and the same winning creative shapes. Advantage in 2023 came from being early. In 2026, with 71% of Google Ads advertisers using Performance Max and Meta's Advantage+ adoption climbing every quarter, early is over. You're paying a premium to bid against yourself through the same black box as your competitors.
The numbers back it up. Wicked Reports analysed 55,661 Meta campaigns across a year and found Advantage+ new-customer acquisition cost rose from $257 in May 2024 to $528 by May 2025, while manual nCAC stayed flat or declined over the same window (Wicked Reports, 2025). That's not the AI getting worse. That's the AI hitting saturation as everyone turns it on.
The black-box problem compounds the saturation one. When PMax hides placement data, you can't see which surfaces are eating your budget. When Advantage+ hides audience composition, you can't see whether it's trading prospects for retargets. Recent transparency updates helped, but neither platform lets you actually act on the insight at channel level. You can see it. You can't change it. That's not automation; that's hostage-taking with nicer dashboards.
Does AI creative actually perform, or just feel modern?
AI-generated creative is the one execution lane where the evidence is genuinely encouraging, with a large asterisk. Early field studies on large native ad networks show AI-generated ads clearing human ads on click-through rate, but only when consumers can't tell the difference. Ask the same consumers whether an ad looks AI-made and purchase intent drops by double digits. The performance gap isn't about the image. It's about the perception.
Practically, this means AI creative is a volume play, not a quality play. Top-performing accounts now ship 50-70 ad variants a week to fight creative fatigue. Humans can't produce that many briefs, let alone finished assets. AI can, and most of the variants don't need to be brilliant; they need to be fresh and indistinguishable from human work. The teams that win with AI creative treat it as a variant engine sitting behind a human brief, not as a generator that ships to production untouched.
The rule we follow on client work is unglamorous: every AI variant goes through a human editor before it ships. Bad copy gets rewritten. Stock-feeling images get replaced. Anything that screams "AI made this" gets killed. What remains tends to outperform the all-human baseline, because volume plus human taste beats volume alone and quality alone. If that sounds like more work than the vendor demos suggest, that's because it is.
What does an analyst-only AI stack actually look like?
The stack we run across client accounts uses AI exclusively for analysis and creative variant generation. Every platform decision — bid, budget, audience, placement — stays with a human. The weekly workflow takes under two hours, replaces roughly 10-15 hours of manual reporting and hypothesis work, and produces outputs we can defend in a board meeting without hedging.
The concrete pieces are: Claude Code with a custom audit skill that runs a cross-platform readout against CSV exports, a Python + Claude pipeline that tags ad creative by hook, format, and theme and clusters winners versus losers, a Claude-driven anomaly check on GA4 and platform spend, and a library of prompts that take the weekly report and output a ranked test queue for the following Monday. Every step is human-reviewable. Nothing pushes changes into the ad platforms.
So why keep execution human when the rest of the stack is AI? Not nostalgia. It's that the failure modes are asymmetric. A missed analysis costs you an hour of catch-up next week. A wrong bid change, pushed automatically, can burn a month of budget before anyone notices. Until autonomous agents earn a track record on the budgets that matter, the governance answer is the same answer every engineering team has converged on: AI for diagnostics, humans on the lever. The audit-skill workflow we use to run the weekly readout is the deep dive on the diagnostic side, and the cross-platform reporting pipeline is what feeds it. If you want the human-in-the-loop creative half of this stack as its own workflow, Claude design use cases for marketing covers the variant-generation side in detail.
How do you decide which side of the split your account sits on?
Four variables settle the question: monthly conversion volume, creative velocity, measurement rigour, and account complexity. Plot your account on those four axes and the right split falls out. It's not a tier list; it's a decision tree you can run in ten minutes before your next planning call.
So what's the actual difference between the two lanes when you strip out the marketing? The table below is how we brief new clients before we touch a campaign.
| Dimension | AI runs the buying | AI on analysis only |
|---|---|---|
| Data visibility | Platform decides what you see | You control inputs and outputs |
| Incentive alignment | Optimised for platform-reported KPIs | Optimised for your KPIs |
| Failure mode | Month of burnt budget before noticed | One hour of catch-up next week |
| Signal floor | 30+ conversions/month minimum | Works at any volume |
| Best account shape | Broad DTC, high-volume lead gen, app install | B2B, regulated, small budget, complex |
| Biggest risk | Convergence trap, saturation, drift | Under-using the tool you already pay for |
| Review cadence | Weekly, plus quarterly incrementality test | Weekly readout, adjust prompts monthly |
Decision framework by account shape
- Under 30 monthly conversions. Manual campaigns only. Use AI for analysis, anomaly checks, and creative variants. The algorithm can't exit the noise floor on your signal, and handing it spend is an expensive way to learn that.
- 30-300 monthly conversions, single-vertical DTC. Hybrid. Fence one Advantage+ or Performance Max campaign at 40-60% of spend, run the rest manually on the same offer as a structural holdout. AI for analysis and creative variants across the whole account.
- 300+ monthly conversions, broad catalogue. AI can run the majority of spend, but always with a manual holdout you can compare against. Commission or run an incrementality test every six months. Never trust the platform's own lift number.
- B2B, regulated, or brand-sensitive. Manual execution, AI for analysis and creative variant generation only. The creative diversity matters more than the media math, and the black box will cost you more in brand risk than it saves in CPA.
- Any account below 30 conversions AND new to AI. Start with analysis-only. Add creative variants in month two. Add a small fenced AI-run campaign in month three if (and only if) the conversion volume has climbed.
Common mistake: Teams skip the holdout. They turn Advantage+ on across 100% of spend, watch the in-platform ROAS go up, and conclude the AI is working. Without a manual baseline running against the same offer, you can't tell whether you won or whether the platform just re-attributed spend that would have converted anyway. Keep the holdout. It's the cheapest measurement you'll ever run.
Frequently Asked Questions
Is Meta Advantage+ actually better than manual campaigns?
Sometimes, and less than the marketing suggests. Meta's own figure is a 22% ROAS lift. The independent Haus analysis of 640 incrementality experiments found Advantage+ under-performed manual by 12% by campaign end, with post-treatment lift of +17% versus +32% for manual. Both are true. The answer depends on what you measure and how long you wait.
When should you not use Performance Max?
Below 30 monthly conversions, in low-volume B2B, in regulated verticals, or anywhere you need channel-level reporting to actually act on. Performance Max added transparency in 2025, but it still doesn't let you push levers per channel. Seeing the data and being able to change the data are different things, and PMax only gives you the first.
Does AI-generated ad creative perform as well as human creative?
Sometimes better, but only when consumers can't tell it's AI. Field studies on large native networks show AI ads clearing human ads on CTR when perception is neutral. When the same consumers identify an ad as AI-generated, premium rating and purchase intent drop by double digits. Use AI to generate variants; edit out anything that screams AI.
Can small budgets use Advantage+ or Performance Max effectively?
Generally not below 30-50 monthly conversions. The algorithm needs that much signal to exit the noise floor. Small-budget accounts get far more from AI on the analysis side — weekly readouts, anomaly detection, creative clustering — than from handing spend decisions to a signal-starved model. Start with the analyst workflow and add execution AI later, if at all.
What's the highest-ROI AI use case in paid media today?
Weekly performance analysis and creative variant generation, hands down. The first saves roughly 10-15 hours a week on a five-platform book with no black-box risk. The second fixes the creative-volume problem that kills campaigns long before the media math does. Both are under-adopted in 2026, which is why the returns are still outsized.
What should you actually do this week?
Pick one of the three lanes and commit. If your account sits under 30 monthly conversions, start the analyst-only stack this Monday. Run one AI-driven weekly readout on your own data, compare it to what you'd normally write by hand, and measure the time difference honestly. If you're over 300 conversions a month and still running everything manually, fence a single Advantage+ or Performance Max campaign at 30% of spend and measure against a manual holdout for 30 days. Either way, make the decision explicitly. Drifting into AI-run campaigns because your platform rep asked nicely is how the nCAC charts above happen.
Is AI "good" or "bad" at paid media? Wrong question. The wider pattern is familiar from every other corner of applied AI. Analysis is easy. Execution is hard. The teams that win in 2026 will be the ones that match those shapes to the workflows where they actually work, instead of arguing about the label. It's neither good nor bad. It's a tool with a specific envelope, and the whole job is learning the edges. For the tooling side of the analyst-only stack, the Claude Ads audit workflow is the deep companion; for the reporting half, see how to automate marketing reporting with Windsor.ai. For the landing-page side of the same AI-vs-human split, our Optimeleon review covers where AI does and doesn't belong in CRO.
Member discussion