Trial Conversion Engine

The problem this solved

A mid-market SaaS company was drowning in trial signups that nobody knew what to do with. The product was good enough to attract volume, but the sales team couldn’t tell which trials were worth chasing, and by the time they got around to the ones that were, the moment was gone. Product analytics showed engagement. The CRM showed contact info. Neither answered the actual question: which of these trials is my next customer, and how fast do I need to move?

The Trial Conversion Engine takes a raw trial signup and turns it into a scored, routed, sales-ready opportunity. It enriches the record with firmographic data, extracts behavioral signals from product usage, scores against ICP cohorts, routes to the right sales motion based on confidence tier, generates an intelligence dossier for the AE, and drafts a personalized first-touch email — all before the trial is 90 seconds old. An independent quality gate reviews every output before sales sees it.

The result: the sales team stopped guessing which trials to prioritize, and the trials that routed to an AE arrived with a full context brief already written.

Architecture

The system is an 8-stage pipeline. Each stage has a defined input, output, and failure mode. Five stages are deterministic — pure business logic, auditable, reproducible. Three are AI-powered — they handle tasks that require judgment or synthesis. One is a hybrid.

Intake Validation Deterministic

Schema validation, deduplication, ICP pre-filter

Enrichment AI · Claude

Firmographic research, industry classification, company intelligence

Product Signal Extraction Deterministic

Behavioral events → engagement classification (active / exploring / ghosted / churned)

ICP Scoring + Cohort Matching Hybrid

Deterministic score (0–100) + AI cohort rationale across 7 segments

Routing Engine Deterministic

ICP tier × engagement → sales motion (AE fast-track / BDR / nurture / suppress)

Lead Dossier Generation AI · Claude

Intelligence brief with lookalike customers, pain points, talking points

Email Generation AI · Claude

Personalized first-touch email matched to routing context

Quality Gate AI · Separate

Independent review — 6 criteria, numeric thresholds, max 2 regen attempts

Deterministic — auditable, reproducible AI-powered — judgment and synthesis Hybrid — deterministic formula + AI rationale

The design principle: deterministic where the decision needs to be auditable, AI where the task requires judgment. Routing is never left to the LLM.

The stages

Intake Validation

Deterministic

In Raw signup data

→

Out Validated, deduplicated record

◆ Schema enforcement with typed defaults — malformed data gets defaults, not rejection

Validates schema, deduplicates against existing records, and enforces required fields. Malformed data gets sensible defaults rather than pipeline rejection — the system degrades gracefully.

Enrichment

AI · Claude

In Company name + email domain

→

Out Firmographic profile (industry, size, tech stack)

◆ Multi-source research, not just email domain lookup

Goes beyond email domain lookup. The AI agent researches the company across multiple sources, returning industry classification, employee count, technology stack, and sub-industry categorization. This matters because industry match is the highest-weighted factor in ICP scoring — getting it wrong poisons every downstream stage.

Product Signal Extraction

Deterministic

In Product analytics events

→

Out Engagement classification + feature map

◆ Distinguishes system-seeded activity from real user behavior

Where most workflow-based approaches fall apart. The system analyzes product analytics events to classify engagement as active, exploring, ghosted, or churned — but it distinguishes real user behavior from system-seeded demo data. If the product pre-populates trial accounts with sample assets and work orders, those interactions don't count. Only deliberate user actions map to engagement signals.

ICP Scoring + Cohort Matching

Hybrid

In Enriched profile + engagement signals

→

Out Numeric score (0–100) + cohort assignment

◆ Deterministic formula for the score, LLM for cohort rationale

Runs against defined cohorts, not a single generic score. A logistics company and a hospitality chain get scored against different baselines — different industry weights, different size expectations, different behavioral benchmarks. The scoring formula is deterministic and auditable: firmographic fit (0–40 points), behavioral engagement (0–30), and intent signals from acquisition context (0–30). The LLM adds cohort rationale — a natural-language explanation of why this lead matches a specific cohort — but doesn't control the score.

Routing Engine

Deterministic

In ICP tier + engagement tier

→

Out Sales motion assignment

◆ Confidence-tiered routing with auditable rules

Pure business rules. ICP tier crossed with engagement tier maps to a specific sales motion. High-confidence Tier 1 leads with active engagement get fast-tracked directly to an AE. Tier 1 leads who signed up and ghosted get priority BDR outreach. Tier 2 leads with active engagement get standard BDR sequences. Tier 2 leads who ghosted enter nurture. Non-ICP leads are suppressed — no BDR time wasted.

Lead Dossier Generation

AI · Claude

In Full enriched profile + scoring data

→

Out Intelligence brief for sales

◆ Includes lookalike customers from the matched cohort

Produces everything the AE or BDR needs in one document. Company context from enrichment. ICP match reasoning from scoring. Behavioral signals from product extraction. Suggested talking points derived from the matched cohort's known pain points. And critically — lookalike customers from the same cohort, drawn from real customer evidence. When the dossier says "similar companies in your space have seen $500K/year in maintenance cost savings," that's backed by actual customer data.

Email Generation

AI · Claude

In Dossier + routing context

→

Out Personalized first-touch email

◆ Tone matched to routing context, not a template with merge fields

Generates a personalized first-touch email from the dossier data. The tone matches the routing context — consultative and direct for a fast-track lead, value-oriented and educational for a nurture lead. These aren't templates with merge fields. Each email references the prospect's specific industry, likely pain points, and relevant features.

Quality Gate

AI · Separate

In Dossier + email + source data

→

Out Pass/fail + feedback

◆ Independent agent with 6 grading criteria and numeric thresholds

Architecturally separate from the agents that produce the dossier and email. It reviews the output against 6 specific criteria — accuracy, personalization, tone, actionability, brevity, and completeness — with numeric thresholds. If the quality score falls below threshold, the system regenerates automatically, up to twice. After two failed attempts, it flags the lead for human review. The quality gate doesn't grade its own homework.

Design principles

Deterministic routing over probabilistic

When a lead scores Tier 1 with active engagement, it routes to an AE. There’s no “maybe” layer, no probabilistic scoring that might send it somewhere else on a different day. AI adds judgment in enrichment and dossier generation. Routing is rules-based and auditable.

This is a deliberate architectural choice. Sales teams don’t trust black boxes. When an AE asks “why did this lead get fast-tracked?” the system shows the score breakdown, the cohort match, and the behavioral signals. Every routing decision is reproducible — run the same lead through twice, get the same route.

Pre-mortem methodology

Every failure mode is identified before implementation, not discovered after. Five specific failure modes are documented and mitigated by design:

API rate limits during bulk processing — mitigated by pre-computed results with file-based persistence. The pipeline can process ahead of time; results survive server restarts.
Garbage-in-garbage-out cascade — if enrichment hallucinates a wrong industry, every downstream stage produces incorrect output. Mitigated by confidence scoring at every AI stage. When confidence drops below threshold, the system flags the lead rather than silently passing bad data downstream.
LLM output schema violations — the LLM returns malformed JSON, wrong field names, or conversational preamble instead of structured data. Mitigated by a robust extraction layer with typed defaults. The pipeline degrades gracefully — it never crashes on a schema violation.
Quality gate theater — the gate rubber-stamps everything, undermining the review layer. Mitigated by 6 specific grading criteria with numeric thresholds, and architectural separation — the gate is a different LLM instance with a different prompt.
State desynchronization — the dashboard shows stale data or crashes on missing fields. Mitigated by file-based persistence and defensive rendering.

Confidence scoring

Every AI stage reports its confidence as a numeric value. Low-confidence enrichment gets flagged with a visual indicator, not passed through silently. The dashboard shows warnings on any stage where confidence dropped below threshold.

This matters most in enrichment, where the LLM is synthesizing company information that may be incomplete or ambiguous. A confidence score of 0.4 on industry classification means “I’m guessing” — and the pipeline treats it accordingly.

Independent quality gate

The agent that checks the output is architecturally separate from the agents that produce it. Different LLM instance. Different system prompt. Different evaluation criteria.

The gate scores against accuracy, personalization, tone, actionability, brevity, and completeness. Each criterion has a numeric weight. The aggregate score determines pass, regenerate, or flag-for-human.

Calibration as ongoing work

The first pipeline run processed 12 sample leads. 8 routed correctly. 4 were misrouted. That’s not a failure — that’s calibration data.

Root cause analysis on the 4 misroutes revealed three independent issues: a hospitality industry gap in the ICP cohort data, a churned-behavior threshold that was too restrictive, and missing engagement sub-rules for Tier 2 leads. Each issue had a documented fix scoped to a specific file and line number. The architecture supports this — scoring logic is isolated, routing rules are explicit, and every decision is traceable. Real systems need calibration. The architecture is designed for it.

Tech approach

Key implementation choices for this build:

Claude and OpenRouter for AI stages — enrichment, dossier generation, email drafting, and quality gate. Claude Sonnet for stages that require synthesis and nuance. Cost-optimized models for classification and grading. Full pipeline cost: under $0.04 per lead.
Direct CRM API for intake and routing — the system reads from and writes to the CRM through its API, which is what made sub-minute processing possible.
File-based persistence for auditability and debugging — every stage’s output is stored as a JSON artifact. Results survive server restarts. Atomic writes prevent corruption. During calibration, you can inspect exactly what each stage produced and why.
Dashboard with three views: Pipeline (all leads with stage-by-stage status and routing), Dossier (deep dive into a single lead’s intelligence brief with interactive score breakdown), and Observability (aggregate metrics, cost tracking, confidence distributions).

Trial Conversion Pipeline

12 leads enriched, scored, and routed through an 8-stage pipeline

12 leads | $0.49 total cost | 46.7s avg

Company	Industry	Tier	Engagement	Route	Score
Shearer's Foods	Food Manufacturing	Tier 1	active	ae fast track	93
Tri-City Medical Center	Healthcare	Tier 1	active	ae fast track	95
Mueller Water Products	Water Infrastructure	Tier 1	active	ae fast track	85
Clearway Energy	Energy	Tier 1	churned	bdr priority	80
Drury Hotels	Hospitality	Tier 2	ghosted	nurture	57
Pretium Packaging	Manufacturing	Tier 2	churned	nurture	73
Tijuana Flats	Food Service	Tier 1	exploring	ae fast track	81
Cabot Creamery	Food & Beverage	Tier 2	churned	nurture	78
Salvation Army Kroc Centers	Non-profit	Tier 2	ghosted	nurture	62
Alum Rock Union School District	Education	Tier 2	churned	nurture	60
Braze	Technology	Tier 3	active	low priority	25
Plaid	Financial Technology	Tier 3	churned	low priority	25

Shearer's Foods

Food Manufacturing

Employees: 450

Routing Decision

Tier 1 ICP match (Manufacturing & Industrial Plants), score 93 with active engagement → ae fast track

ICP Score: 93/100

Firmographic 40/40

✓ Industry match (Manufacturing) +20 pts
✓ Company size (450 employees) +10 pts
✓ Title match (Reliability Engineer) +10 pts

Behavioral 27/30

✓ 8 active sessions over 2 weeks
✓ Created assets, work orders, PMs
✓ Invited 8 team members

Intent 26/30

✓ Google Ads acquisition (high intent) +15 pts
✓ Onboarding completed + milestones +11 pts

Company Summary

Shearer's Foods is a mid-market contract manufacturer and private label producer of snack foods serving major retail and foodservice brands. With 450 employees and complex food production lines, they likely face significant maintenance challenges around equipment downtime, FDA/cGPA compliance documentation, and inventory management across their manufacturing operations.

Pain Points

Production line downtime directly impacts contract fulfillment for major retail brands, potentially costing thousands per hour
FDA and cGPA compliance requirements demand detailed maintenance documentation and traceability
Contract manufacturing model requires consistent quality and uptime to maintain relationships with major retail partners

BDR Talking Points

Already exploring actively with 8 sessions — created work orders and assets. Ask about their maintenance team experience so far.
Companies like Water Lilies Food (who also serves Walmart and Target) reduced their downtime from 2–4 hours to just 8 minutes per shift.
With FDA and cGPA compliance requirements, digital maintenance records and traceability is crucial.

Pipeline Observability

Aggregate performance metrics, cost tracking, and quality monitoring across all processed leads

Total Leads 12

Total API Cost $0.49

Avg Confidence 91.3%

Total Processing 560.2s

Route Distribution

nurture

ae fast track

low priority

bdr priority

Tier Distribution

Tier 1

Tier 2

Tier 3

Stage Performance

Stage	Type	Avg Confidence	Pass Rate	Avg Duration	Cost / Lead	Total Cost
intake	deterministic	100.0%	100%	0ms	—	—
enrichment	claude-sonnet-4	80.0%	100%	5.3s	0.69¢	$0.083
product signals	deterministic	100.0%	100%	0ms	—	—
icp scoring	deepseek-v3	80.0%	100%	3.4s	0.03¢	$0.004
routing	deterministic	100.0%	100%	0ms	—	—
dossier	claude-sonnet-4	85.0%	100%	18.2s	2.10¢	$0.252
email	claude-sonnet-4	90.0%	100%	9.7s	1.05¢	$0.126
quality gate	deepseek-v3	88.0%	100%	5.1s	0.15¢	$0.024

This is one approach

A standalone scoring-and-routing system was the right answer for this client because their trial volume was high enough to matter, their existing product analytics didn’t connect to the CRM in any useful way, and the sales team needed more than notifications — they needed context. For a smaller trial volume, a lighter touch might be enough: better HubSpot scoring properties, a smarter Slack alert, a routing rule on top of an enrichment tool the team already pays for. The diagnosis decides how much system the situation actually justifies.

Where an engagement starts

Not every trial conversion problem needs a standalone pipeline. Most engagements start by figuring out whether it does.

Start with an audit. Look at the trial flow end-to-end: what data is captured, where it lives, which signals the sales team actually uses, and where leads are falling through. Sometimes the gap is instrumentation — events that should be firing but aren’t. Sometimes it’s routing logic that exists but isn’t trusted. Sometimes it’s an ICP model that was never written down. The audit tells you which of those is the real bottleneck.

When the audit points at a scoring-and-routing build, the engagement looks like this:

ICP and data source design — define scoring cohorts from the existing customer base, map the product analytics events worth using, identify enrichment signals that actually differentiate.
Architecture scoped to your stack — pipeline stages tailored to your CRM and data sources, routing rules mapped to your actual sales motions.
Staged build with checkpoints — each pipeline stage delivered and reviewed independently. You see working output at every checkpoint, not just at the end.
Calibration against real data — run the pipeline against actual trial signups, diagnose misroutes, tune scoring thresholds.
Handoff with documentation — the system is yours. Full code, architecture docs, calibration playbook.

Ongoing calibration is available as needed — new cohorts as the ICP evolves, scoring adjustments as the product does.

Trial Conversion Engine

The problem this solved

Architecture

The stages

Intake Validation

Enrichment

Product Signal Extraction

ICP Scoring + Cohort Matching

Routing Engine

Lead Dossier Generation

Email Generation

Quality Gate

Design principles

Deterministic routing over probabilistic

Pre-mortem methodology

Confidence scoring

Independent quality gate

Calibration as ongoing work

Tech approach

Trial Conversion Pipeline

Shearer's Foods

Pipeline Observability

Route Distribution

Tier Distribution

Stage Performance

This is one approach

Where an engagement starts

Most of your trial signups never talk to sales

Building a Trial Conversion Engine for a Mid-Market SaaS Platform