How to Evaluate an AI-Native ESP: 7 Questions to Ask Before Switching

Watercolor mountain pass overlooking seven distinct valleys in warm light

The category of AI-native email platforms is new enough that most enterprise buyers don't have an evaluation framework for it. Traditional ESP evaluations focus on template editors, flow builders, integration catalogs, and pricing tiers. Those criteria don't capture what matters about an AI-native platform, because the product works fundamentally differently.

If you're evaluating a switch from a traditional ESP (like Klaviyo, Mailchimp, or Salesforce Marketing Cloud) to an AI-native platform, these are the seven questions that will reveal whether the platform is genuinely AI-native, whether it's right for your brand, and whether the vendor is being honest with you.

1. Does the AI initiate campaigns, or only assist with campaigns you've already decided to build?

This is the single most important question and it separates AI-native from AI-assisted immediately.

AI-assisted: You decide to build a campaign. You open the editor. The AI helps you write a subject line, suggests a segment, or optimizes send time. The AI made your existing workflow faster.

AI-native: The system identifies that a campaign opportunity exists (based on product data, customer behavior, performance signals, or calendar events) and generates the full campaign: subject line, copy, design, product selection, audience targeting. You review and approve.

Ask the vendor to demo the campaign creation flow from the very beginning. If the demo starts with a marketer opening an editor, the platform is AI-assisted regardless of how sophisticated the assistance is. If the demo starts with the system presenting a completed campaign for review, it's AI-native.

Why it matters: The operational cost savings of AI-native come from replacing the initiation and creation work, not just accelerating it. If the AI only assists, you still need the same team size to initiate and direct every campaign.

2. How does the AI personalize: at the segment level or the individual level?

Ask specifically: "If I send a campaign to 100,000 customers, how many unique versions of that email exist?"

Segment-level: The platform creates 5-10 versions based on audience segments. Each version goes to thousands or tens of thousands of people. This is how Klaviyo, Omnisend, and most traditional ESPs work.

Individual-level: The platform generates a computationally unique email for each of the 100,000 recipients. Different copy, product selections, imagery, and offers based on each person's behavioral profile. This is how LTV.ai works.

The data is clear: companies that excel at personalization generate 40% more revenue from those activities. The gap between segment-level and individual-level is where most of that 40% lives.

Red flag: If the vendor says "individual personalization" but describes it as "dynamic content blocks that change based on segment membership," that's segment personalization with dynamic elements. Not the same thing.

3. What does the AI know about each customer, and does that knowledge grow over time?

Traditional ESPs store customer data as flat attributes and event logs: name, email, purchase history, click events. Ask the AI-native vendor: what does your system actually know about each customer beyond attributes and events?

The answer should describe something like a behavioral profile or customer memory that captures preferences, response patterns, content affinities, price sensitivity, purchase occasion types, and engagement cadence. LTV.ai calls this customer memory: persistent, evolving profiles that accumulate context over every interaction.

The compounding test: Ask "will the AI's output be better for a customer it's interacted with for 12 months versus a new customer?" If the answer is yes (and it should be), the system has a learning mechanism that compounds over time. If the output quality is the same regardless of history length, the AI is stateless and doesn't learn from interactions.

Why it matters: A compounding learning mechanism is what creates the LTV flywheel. Each interaction makes future interactions more relevant, which extends customer lifespan, which generates more interactions. Without compounding, the AI provides a one-time improvement rather than an accelerating advantage.

4. How do you measure incrementality?

This question will reveal more about the vendor's integrity than any feature demo. Ask: "How do you prove that your platform generates revenue my previous platform would not have?"

The right answer: Holdout-based testing. The platform suppresses a control group from receiving AI-generated campaigns and compares revenue per customer between the test group (received AI campaigns) and the control group (received no campaigns or received previous-platform campaigns). The difference is the incremental revenue.

The wrong answer: "We measure attributed revenue." Or: "We show you the revenue from campaigns we sent." Attribution tells you which revenue email touched. Incrementality tells you which revenue email created. The difference is typically 30-60%. A platform that can only report attribution is hiding behind a flattering number.

Red flag: If the vendor resists holdout testing or says it's "not necessary," they're either not confident in their incremental impact or their platform doesn't support the methodology.

5. What does brand safety look like when AI generates creative autonomously?

Autonomous generation is only as good as the guardrails. If the AI is writing copy, selecting products, and designing emails without a human building each one, the risk of off-brand output is real. Ask specifically:

How do I define brand guidelines? The vendor should describe a process for inputting your brand voice, visual identity, approved copy patterns, forbidden language, and tone parameters. This should be more sophisticated than uploading a brand guide PDF.

Can I approve campaigns before they send? The answer must be yes. Any platform that sends AI-generated content without human review is a risk you shouldn't accept at enterprise scale. The workflow should be: AI generates, human reviews and approves (or edits), then the system sends.

What happens when the AI gets it wrong? Ask for examples of when the AI produced suboptimal output and how the system learned from it. A vendor that claims their AI never makes mistakes is lying. A vendor that describes a feedback loop where corrections improve future output is being honest.

Show me real output, not demo screenshots. Ask to see actual AI-generated campaigns from real customers (with permission). The quality of real output tells you more than a curated demo environment.

6. What does the migration look like, and what happens during the transition?

ESP migrations are historically painful. AI-native migrations involve an additional layer of complexity because the operating model changes, not just the tool. Ask:

How long from contract to first AI-generated campaign? LTV.ai positions this as days, not months (Shopify app install + DNS records). Verify this with reference customers.

Can the platform run alongside my current ESP during evaluation? You should be able to test the AI-native platform on a portion of your list while your current ESP continues running the rest. A parallel holdout test is the safest evaluation methodology.

What data do you need from my current platform? Understand what customer history, campaign history, and behavioral data the AI-native platform ingests to train its models. The more historical context it can access, the faster the AI reaches its performance potential.

What if I want to switch back? A confident vendor will make it easy to leave. If the platform locks you in through data portability restrictions or long-term contracts, that's a red flag about whether their performance can stand on its own.

7. What results can I realistically expect, and on what timeline?

"We'll increase your email revenue by X%" is a sales claim. "Here's what similar brands in your vertical saw, measured through holdout testing, over what period" is evidence. Ask for:

Published case studies with specific metrics. LTV.ai publishes results like 79% conversion rate increase (Fresh Clean Threads), 435% conversion uplift (Spongellé), and 28% AOV increase (The Sill). These are specific, attributed to named brands, and measured through incrementality testing.

Reference customers you can speak with. Any vendor should provide 2-3 reference customers at a similar scale and vertical to yours. If they can't, the customer base is either too small or the results aren't consistent enough to share.

A realistic timeline. Initial results from an AI-native platform typically appear within 30-60 days (the AI starts generating campaigns quickly). Full performance potential takes 3-6 months (the customer memory system needs time to accumulate context and learn from interactions). Be skeptical of claims of dramatic results in the first week.

What's the worst case? Ask what happens if the platform doesn't outperform your current ESP during the evaluation. A good vendor should say: "You go back to your current platform with no harm done." A bad vendor will avoid this question.

The meta-question

After asking all seven questions, step back and ask yourself one more: did the vendor's answers make me feel like I understand how the platform works, or like I've been marketed to?

AI-native email is a genuinely different approach to a problem enterprise brands are familiar with. The best vendors explain the difference clearly, acknowledge the trade-offs, and make it easy to validate claims through testing. The worst vendors hide behind buzzwords, resist measurement, and push for commitment before you've seen results.

Evaluate accordingly.

LTV.ai answers all seven questions with data, not marketing. Holdout-based incrementality testing. Published customer results. No long-term contracts. See for yourself →