Trial Conversion Engine
Behavioral signal extraction, ICP scoring, routing, follow-up generation
The problem this solved
A mid-market SaaS company was drowning in trial signups that nobody knew what to do with. The product was good enough to attract volume, but the sales team couldn’t tell which trials were worth chasing, and by the time they got around to the ones that were, the moment was gone. Product analytics showed engagement. The CRM showed contact info. Neither answered the actual question: which of these trials is my next customer, and how fast do I need to move?
The Trial Conversion Engine takes a raw trial signup and turns it into a scored, routed, sales-ready opportunity. It enriches the record with firmographic data, extracts behavioral signals from product usage, scores against ICP cohorts, routes to the right sales motion based on confidence tier, generates an intelligence dossier for the AE, and drafts a personalized first-touch email — all before the trial is 90 seconds old. An independent quality gate reviews every output before sales sees it.
The result: the sales team stopped guessing which trials to prioritize, and the trials that routed to an AE arrived with a full context brief already written.
Architecture
The system is an 8-stage pipeline. Each stage has a defined input, output, and failure mode. Five stages are deterministic — pure business logic, auditable, reproducible. Three are AI-powered — they handle tasks that require judgment or synthesis. One is a hybrid.
Schema validation, deduplication, ICP pre-filter
Firmographic research, industry classification, company intelligence
Behavioral events → engagement classification (active / exploring / ghosted / churned)
Deterministic score (0–100) + AI cohort rationale across 7 segments
ICP tier × engagement → sales motion (AE fast-track / BDR / nurture / suppress)
Intelligence brief with lookalike customers, pain points, talking points
Personalized first-touch email matched to routing context
Independent review — 6 criteria, numeric thresholds, max 2 regen attempts
The design principle: deterministic where the decision needs to be auditable, AI where the task requires judgment. Routing is never left to the LLM.
The stages
Intake Validation
DeterministicValidates schema, deduplicates against existing records, and enforces required fields. Malformed data gets sensible defaults rather than pipeline rejection — the system degrades gracefully.
Enrichment
AI · ClaudeGoes beyond email domain lookup. The AI agent researches the company across multiple sources, returning industry classification, employee count, technology stack, and sub-industry categorization. This matters because industry match is the highest-weighted factor in ICP scoring — getting it wrong poisons every downstream stage.
Product Signal Extraction
DeterministicWhere most workflow-based approaches fall apart. The system analyzes product analytics events to classify engagement as active, exploring, ghosted, or churned — but it distinguishes real user behavior from system-seeded demo data. If the product pre-populates trial accounts with sample assets and work orders, those interactions don't count. Only deliberate user actions map to engagement signals.
ICP Scoring + Cohort Matching
HybridRuns against defined cohorts, not a single generic score. A logistics company and a hospitality chain get scored against different baselines — different industry weights, different size expectations, different behavioral benchmarks. The scoring formula is deterministic and auditable: firmographic fit (0–40 points), behavioral engagement (0–30), and intent signals from acquisition context (0–30). The LLM adds cohort rationale — a natural-language explanation of why this lead matches a specific cohort — but doesn't control the score.
Routing Engine
DeterministicPure business rules. ICP tier crossed with engagement tier maps to a specific sales motion. High-confidence Tier 1 leads with active engagement get fast-tracked directly to an AE. Tier 1 leads who signed up and ghosted get priority BDR outreach. Tier 2 leads with active engagement get standard BDR sequences. Tier 2 leads who ghosted enter nurture. Non-ICP leads are suppressed — no BDR time wasted.
Lead Dossier Generation
AI · ClaudeProduces everything the AE or BDR needs in one document. Company context from enrichment. ICP match reasoning from scoring. Behavioral signals from product extraction. Suggested talking points derived from the matched cohort's known pain points. And critically — lookalike customers from the same cohort, drawn from real customer evidence. When the dossier says "similar companies in your space have seen $500K/year in maintenance cost savings," that's backed by actual customer data.
Email Generation
AI · ClaudeGenerates a personalized first-touch email from the dossier data. The tone matches the routing context — consultative and direct for a fast-track lead, value-oriented and educational for a nurture lead. These aren't templates with merge fields. Each email references the prospect's specific industry, likely pain points, and relevant features.
Quality Gate
AI · SeparateArchitecturally separate from the agents that produce the dossier and email. It reviews the output against 6 specific criteria — accuracy, personalization, tone, actionability, brevity, and completeness — with numeric thresholds. If the quality score falls below threshold, the system regenerates automatically, up to twice. After two failed attempts, it flags the lead for human review. The quality gate doesn't grade its own homework.
Design principles
Deterministic routing over probabilistic
When a lead scores Tier 1 with active engagement, it routes to an AE. There’s no “maybe” layer, no probabilistic scoring that might send it somewhere else on a different day. AI adds judgment in enrichment and dossier generation. Routing is rules-based and auditable.
This is a deliberate architectural choice. Sales teams don’t trust black boxes. When an AE asks “why did this lead get fast-tracked?” the system shows the score breakdown, the cohort match, and the behavioral signals. Every routing decision is reproducible — run the same lead through twice, get the same route.
Pre-mortem methodology
Every failure mode is identified before implementation, not discovered after. Five specific failure modes are documented and mitigated by design:
- API rate limits during bulk processing — mitigated by pre-computed results with file-based persistence. The pipeline can process ahead of time; results survive server restarts.
- Garbage-in-garbage-out cascade — if enrichment hallucinates a wrong industry, every downstream stage produces incorrect output. Mitigated by confidence scoring at every AI stage. When confidence drops below threshold, the system flags the lead rather than silently passing bad data downstream.
- LLM output schema violations — the LLM returns malformed JSON, wrong field names, or conversational preamble instead of structured data. Mitigated by a robust extraction layer with typed defaults. The pipeline degrades gracefully — it never crashes on a schema violation.
- Quality gate theater — the gate rubber-stamps everything, undermining the review layer. Mitigated by 6 specific grading criteria with numeric thresholds, and architectural separation — the gate is a different LLM instance with a different prompt.
- State desynchronization — the dashboard shows stale data or crashes on missing fields. Mitigated by file-based persistence and defensive rendering.
Confidence scoring
Every AI stage reports its confidence as a numeric value. Low-confidence enrichment gets flagged with a visual indicator, not passed through silently. The dashboard shows warnings on any stage where confidence dropped below threshold.
This matters most in enrichment, where the LLM is synthesizing company information that may be incomplete or ambiguous. A confidence score of 0.4 on industry classification means “I’m guessing” — and the pipeline treats it accordingly.
Independent quality gate
The agent that checks the output is architecturally separate from the agents that produce it. Different LLM instance. Different system prompt. Different evaluation criteria.
The gate scores against accuracy, personalization, tone, actionability, brevity, and completeness. Each criterion has a numeric weight. The aggregate score determines pass, regenerate, or flag-for-human.
Calibration as ongoing work
The first pipeline run processed 12 sample leads. 8 routed correctly. 4 were misrouted. That’s not a failure — that’s calibration data.
Root cause analysis on the 4 misroutes revealed three independent issues: a hospitality industry gap in the ICP cohort data, a churned-behavior threshold that was too restrictive, and missing engagement sub-rules for Tier 2 leads. Each issue had a documented fix scoped to a specific file and line number. The architecture supports this — scoring logic is isolated, routing rules are explicit, and every decision is traceable. Real systems need calibration. The architecture is designed for it.
Tech approach
Key implementation choices for this build:
- Claude and OpenRouter for AI stages — enrichment, dossier generation, email drafting, and quality gate. Claude Sonnet for stages that require synthesis and nuance. Cost-optimized models for classification and grading. Full pipeline cost: under $0.04 per lead.
- Direct CRM API for intake and routing — the system reads from and writes to the CRM through its API, which is what made sub-minute processing possible.
- File-based persistence for auditability and debugging — every stage’s output is stored as a JSON artifact. Results survive server restarts. Atomic writes prevent corruption. During calibration, you can inspect exactly what each stage produced and why.
- Dashboard with three views: Pipeline (all leads with stage-by-stage status and routing), Dossier (deep dive into a single lead’s intelligence brief with interactive score breakdown), and Observability (aggregate metrics, cost tracking, confidence distributions).
Trial Conversion Pipeline
12 leads enriched, scored, and routed through an 8-stage pipeline
| Company | Industry | Tier | Engagement | Route | Score |
|---|---|---|---|---|---|
| Shearer's Foods | Food Manufacturing | Tier 1 | active | ae fast track | 93 |
| Tri-City Medical Center | Healthcare | Tier 1 | active | ae fast track | 95 |
| Mueller Water Products | Water Infrastructure | Tier 1 | active | ae fast track | 85 |
| Clearway Energy | Energy | Tier 1 | churned | bdr priority | 80 |
| Drury Hotels | Hospitality | Tier 2 | ghosted | nurture | 57 |
| Pretium Packaging | Manufacturing | Tier 2 | churned | nurture | 73 |
| Tijuana Flats | Food Service | Tier 1 | exploring | ae fast track | 81 |
| Cabot Creamery | Food & Beverage | Tier 2 | churned | nurture | 78 |
| Salvation Army Kroc Centers | Non-profit | Tier 2 | ghosted | nurture | 62 |
| Alum Rock Union School District | Education | Tier 2 | churned | nurture | 60 |
| Braze | Technology | Tier 3 | active | low priority | 25 |
| Plaid | Financial Technology | Tier 3 | churned | low priority | 25 |
Company Summary
Shearer's Foods is a mid-market contract manufacturer and private label producer of snack foods serving major retail and foodservice brands. With 450 employees and complex food production lines, they likely face significant maintenance challenges around equipment downtime, FDA/cGPA compliance documentation, and inventory management across their manufacturing operations.
Pain Points
- Production line downtime directly impacts contract fulfillment for major retail brands, potentially costing thousands per hour
- FDA and cGPA compliance requirements demand detailed maintenance documentation and traceability
- Contract manufacturing model requires consistent quality and uptime to maintain relationships with major retail partners
BDR Talking Points
- Already exploring actively with 8 sessions — created work orders and assets. Ask about their maintenance team experience so far.
- Companies like Water Lilies Food (who also serves Walmart and Target) reduced their downtime from 2–4 hours to just 8 minutes per shift.
- With FDA and cGPA compliance requirements, digital maintenance records and traceability is crucial.
Pipeline Observability
Aggregate performance metrics, cost tracking, and quality monitoring across all processed leads
Route Distribution
Tier Distribution
Stage Performance
| Stage | Type | Avg Confidence | Pass Rate | Avg Duration | Cost / Lead | Total Cost |
|---|---|---|---|---|---|---|
| intake | deterministic | 100.0% | 100% | 0ms | — | — |
| enrichment | claude-sonnet-4 | 80.0% | 100% | 5.3s | 0.69¢ | $0.083 |
| product signals | deterministic | 100.0% | 100% | 0ms | — | — |
| icp scoring | deepseek-v3 | 80.0% | 100% | 3.4s | 0.03¢ | $0.004 |
| routing | deterministic | 100.0% | 100% | 0ms | — | — |
| dossier | claude-sonnet-4 | 85.0% | 100% | 18.2s | 2.10¢ | $0.252 |
| claude-sonnet-4 | 90.0% | 100% | 9.7s | 1.05¢ | $0.126 | |
| quality gate | deepseek-v3 | 88.0% | 100% | 5.1s | 0.15¢ | $0.024 |
This is one approach
A standalone scoring-and-routing system was the right answer for this client because their trial volume was high enough to matter, their existing product analytics didn’t connect to the CRM in any useful way, and the sales team needed more than notifications — they needed context. For a smaller trial volume, a lighter touch might be enough: better HubSpot scoring properties, a smarter Slack alert, a routing rule on top of an enrichment tool the team already pays for. The diagnosis decides how much system the situation actually justifies.
Where an engagement starts
Not every trial conversion problem needs a standalone pipeline. Most engagements start by figuring out whether it does.
Start with an audit. Look at the trial flow end-to-end: what data is captured, where it lives, which signals the sales team actually uses, and where leads are falling through. Sometimes the gap is instrumentation — events that should be firing but aren’t. Sometimes it’s routing logic that exists but isn’t trusted. Sometimes it’s an ICP model that was never written down. The audit tells you which of those is the real bottleneck.
When the audit points at a scoring-and-routing build, the engagement looks like this:
- ICP and data source design — define scoring cohorts from the existing customer base, map the product analytics events worth using, identify enrichment signals that actually differentiate.
- Architecture scoped to your stack — pipeline stages tailored to your CRM and data sources, routing rules mapped to your actual sales motions.
- Staged build with checkpoints — each pipeline stage delivered and reviewed independently. You see working output at every checkpoint, not just at the end.
- Calibration against real data — run the pipeline against actual trial signups, diagnose misroutes, tune scoring thresholds.
- Handoff with documentation — the system is yours. Full code, architecture docs, calibration playbook.
Ongoing calibration is available as needed — new cohorts as the ICP evolves, scoring adjustments as the product does.
Case study
Building a Trial Conversion Engine for a Mid-Market SaaS Platform
A $30M B2B SaaS company with 4,000+ customers in asset-intensive industries
Read the case study →Want to see this built for your stack? Let's scope it.