System 04

Trial Conversion Engine

Behavioral signal extraction, ICP scoring, routing, follow-up generation

The problem this solved

A mid-market SaaS company was drowning in trial signups that nobody knew what to do with. The product was good enough to attract volume, but the sales team couldn’t tell which trials were worth chasing, and by the time they got around to the ones that were, the moment was gone. Product analytics showed engagement. The CRM showed contact info. Neither answered the actual question: which of these trials is my next customer, and how fast do I need to move?

The Trial Conversion Engine takes a raw trial signup and turns it into a scored, routed, sales-ready opportunity. It enriches the record with firmographic data, extracts behavioral signals from product usage, scores against ICP cohorts, routes to the right sales motion based on confidence tier, generates an intelligence dossier for the AE, and drafts a personalized first-touch email — all before the trial is 90 seconds old. An independent quality gate reviews every output before sales sees it.

The result: the sales team stopped guessing which trials to prioritize, and the trials that routed to an AE arrived with a full context brief already written.

Architecture

The system is an 8-stage pipeline. Each stage has a defined input, output, and failure mode. Five stages are deterministic — pure business logic, auditable, reproducible. Three are AI-powered — they handle tasks that require judgment or synthesis. One is a hybrid.

01
Intake Validation Deterministic

Schema validation, deduplication, ICP pre-filter

02
Enrichment AI · Claude

Firmographic research, industry classification, company intelligence

03
Product Signal Extraction Deterministic

Behavioral events → engagement classification (active / exploring / ghosted / churned)

04
ICP Scoring + Cohort Matching Hybrid

Deterministic score (0–100) + AI cohort rationale across 7 segments

05
Routing Engine Deterministic

ICP tier × engagement → sales motion (AE fast-track / BDR / nurture / suppress)

06
Lead Dossier Generation AI · Claude

Intelligence brief with lookalike customers, pain points, talking points

07
Email Generation AI · Claude

Personalized first-touch email matched to routing context

08
Quality Gate AI · Separate

Independent review — 6 criteria, numeric thresholds, max 2 regen attempts

Deterministic — auditable, reproducible AI-powered — judgment and synthesis Hybrid — deterministic formula + AI rationale

The design principle: deterministic where the decision needs to be auditable, AI where the task requires judgment. Routing is never left to the LLM.

The stages

01

Intake Validation

Deterministic
In Raw signup data
Out Validated, deduplicated record
Schema enforcement with typed defaults — malformed data gets defaults, not rejection

Validates schema, deduplicates against existing records, and enforces required fields. Malformed data gets sensible defaults rather than pipeline rejection — the system degrades gracefully.

02

Enrichment

AI · Claude
In Company name + email domain
Out Firmographic profile (industry, size, tech stack)
Multi-source research, not just email domain lookup

Goes beyond email domain lookup. The AI agent researches the company across multiple sources, returning industry classification, employee count, technology stack, and sub-industry categorization. This matters because industry match is the highest-weighted factor in ICP scoring — getting it wrong poisons every downstream stage.

03

Product Signal Extraction

Deterministic
In Product analytics events
Out Engagement classification + feature map
Distinguishes system-seeded activity from real user behavior

Where most workflow-based approaches fall apart. The system analyzes product analytics events to classify engagement as active, exploring, ghosted, or churned — but it distinguishes real user behavior from system-seeded demo data. If the product pre-populates trial accounts with sample assets and work orders, those interactions don't count. Only deliberate user actions map to engagement signals.

04

ICP Scoring + Cohort Matching

Hybrid
In Enriched profile + engagement signals
Out Numeric score (0–100) + cohort assignment
Deterministic formula for the score, LLM for cohort rationale

Runs against defined cohorts, not a single generic score. A logistics company and a hospitality chain get scored against different baselines — different industry weights, different size expectations, different behavioral benchmarks. The scoring formula is deterministic and auditable: firmographic fit (0–40 points), behavioral engagement (0–30), and intent signals from acquisition context (0–30). The LLM adds cohort rationale — a natural-language explanation of why this lead matches a specific cohort — but doesn't control the score.

05

Routing Engine

Deterministic
In ICP tier + engagement tier
Out Sales motion assignment
Confidence-tiered routing with auditable rules

Pure business rules. ICP tier crossed with engagement tier maps to a specific sales motion. High-confidence Tier 1 leads with active engagement get fast-tracked directly to an AE. Tier 1 leads who signed up and ghosted get priority BDR outreach. Tier 2 leads with active engagement get standard BDR sequences. Tier 2 leads who ghosted enter nurture. Non-ICP leads are suppressed — no BDR time wasted.

06

Lead Dossier Generation

AI · Claude
In Full enriched profile + scoring data
Out Intelligence brief for sales
Includes lookalike customers from the matched cohort

Produces everything the AE or BDR needs in one document. Company context from enrichment. ICP match reasoning from scoring. Behavioral signals from product extraction. Suggested talking points derived from the matched cohort's known pain points. And critically — lookalike customers from the same cohort, drawn from real customer evidence. When the dossier says "similar companies in your space have seen $500K/year in maintenance cost savings," that's backed by actual customer data.

07

Email Generation

AI · Claude
In Dossier + routing context
Out Personalized first-touch email
Tone matched to routing context, not a template with merge fields

Generates a personalized first-touch email from the dossier data. The tone matches the routing context — consultative and direct for a fast-track lead, value-oriented and educational for a nurture lead. These aren't templates with merge fields. Each email references the prospect's specific industry, likely pain points, and relevant features.

08

Quality Gate

AI · Separate
In Dossier + email + source data
Out Pass/fail + feedback
Independent agent with 6 grading criteria and numeric thresholds

Architecturally separate from the agents that produce the dossier and email. It reviews the output against 6 specific criteria — accuracy, personalization, tone, actionability, brevity, and completeness — with numeric thresholds. If the quality score falls below threshold, the system regenerates automatically, up to twice. After two failed attempts, it flags the lead for human review. The quality gate doesn't grade its own homework.

Design principles

Deterministic routing over probabilistic

When a lead scores Tier 1 with active engagement, it routes to an AE. There’s no “maybe” layer, no probabilistic scoring that might send it somewhere else on a different day. AI adds judgment in enrichment and dossier generation. Routing is rules-based and auditable.

This is a deliberate architectural choice. Sales teams don’t trust black boxes. When an AE asks “why did this lead get fast-tracked?” the system shows the score breakdown, the cohort match, and the behavioral signals. Every routing decision is reproducible — run the same lead through twice, get the same route.

Pre-mortem methodology

Every failure mode is identified before implementation, not discovered after. Five specific failure modes are documented and mitigated by design:

  • API rate limits during bulk processing — mitigated by pre-computed results with file-based persistence. The pipeline can process ahead of time; results survive server restarts.
  • Garbage-in-garbage-out cascade — if enrichment hallucinates a wrong industry, every downstream stage produces incorrect output. Mitigated by confidence scoring at every AI stage. When confidence drops below threshold, the system flags the lead rather than silently passing bad data downstream.
  • LLM output schema violations — the LLM returns malformed JSON, wrong field names, or conversational preamble instead of structured data. Mitigated by a robust extraction layer with typed defaults. The pipeline degrades gracefully — it never crashes on a schema violation.
  • Quality gate theater — the gate rubber-stamps everything, undermining the review layer. Mitigated by 6 specific grading criteria with numeric thresholds, and architectural separation — the gate is a different LLM instance with a different prompt.
  • State desynchronization — the dashboard shows stale data or crashes on missing fields. Mitigated by file-based persistence and defensive rendering.

Confidence scoring

Every AI stage reports its confidence as a numeric value. Low-confidence enrichment gets flagged with a visual indicator, not passed through silently. The dashboard shows warnings on any stage where confidence dropped below threshold.

This matters most in enrichment, where the LLM is synthesizing company information that may be incomplete or ambiguous. A confidence score of 0.4 on industry classification means “I’m guessing” — and the pipeline treats it accordingly.

Independent quality gate

The agent that checks the output is architecturally separate from the agents that produce it. Different LLM instance. Different system prompt. Different evaluation criteria.

The gate scores against accuracy, personalization, tone, actionability, brevity, and completeness. Each criterion has a numeric weight. The aggregate score determines pass, regenerate, or flag-for-human.

Calibration as ongoing work

The first pipeline run processed 12 sample leads. 8 routed correctly. 4 were misrouted. That’s not a failure — that’s calibration data.

Root cause analysis on the 4 misroutes revealed three independent issues: a hospitality industry gap in the ICP cohort data, a churned-behavior threshold that was too restrictive, and missing engagement sub-rules for Tier 2 leads. Each issue had a documented fix scoped to a specific file and line number. The architecture supports this — scoring logic is isolated, routing rules are explicit, and every decision is traceable. Real systems need calibration. The architecture is designed for it.

Tech approach

Key implementation choices for this build:

  • Claude and OpenRouter for AI stages — enrichment, dossier generation, email drafting, and quality gate. Claude Sonnet for stages that require synthesis and nuance. Cost-optimized models for classification and grading. Full pipeline cost: under $0.04 per lead.
  • Direct CRM API for intake and routing — the system reads from and writes to the CRM through its API, which is what made sub-minute processing possible.
  • File-based persistence for auditability and debugging — every stage’s output is stored as a JSON artifact. Results survive server restarts. Atomic writes prevent corruption. During calibration, you can inspect exactly what each stage produced and why.
  • Dashboard with three views: Pipeline (all leads with stage-by-stage status and routing), Dossier (deep dive into a single lead’s intelligence brief with interactive score breakdown), and Observability (aggregate metrics, cost tracking, confidence distributions).

Trial Conversion Pipeline

12 leads enriched, scored, and routed through an 8-stage pipeline

12 leads | $0.49 total cost | 46.7s avg
Company Industry Tier Engagement Route Score
Shearer's Foods Food Manufacturing Tier 1 active ae fast track 93
Tri-City Medical Center Healthcare Tier 1 active ae fast track 95
Mueller Water Products Water Infrastructure Tier 1 active ae fast track 85
Clearway Energy Energy Tier 1 churned bdr priority 80
Drury Hotels Hospitality Tier 2 ghosted nurture 57
Pretium Packaging Manufacturing Tier 2 churned nurture 73
Tijuana Flats Food Service Tier 1 exploring ae fast track 81
Cabot Creamery Food & Beverage Tier 2 churned nurture 78
Salvation Army Kroc Centers Non-profit Tier 2 ghosted nurture 62
Alum Rock Union School District Education Tier 2 churned nurture 60
Braze Technology Tier 3 active low priority 25
Plaid Financial Technology Tier 3 churned low priority 25

Shearer's Foods

Food Manufacturing

Employees: 450

Tier 1 ICP match (Manufacturing & Industrial Plants), score 93 with active engagement → ae fast track

Firmographic 40/40
  • Industry match (Manufacturing) +20 pts
  • Company size (450 employees) +10 pts
  • Title match (Reliability Engineer) +10 pts
Behavioral 27/30
  • 8 active sessions over 2 weeks
  • Created assets, work orders, PMs
  • Invited 8 team members
Intent 26/30
  • Google Ads acquisition (high intent) +15 pts
  • Onboarding completed + milestones +11 pts

Shearer's Foods is a mid-market contract manufacturer and private label producer of snack foods serving major retail and foodservice brands. With 450 employees and complex food production lines, they likely face significant maintenance challenges around equipment downtime, FDA/cGPA compliance documentation, and inventory management across their manufacturing operations.

  • Production line downtime directly impacts contract fulfillment for major retail brands, potentially costing thousands per hour
  • FDA and cGPA compliance requirements demand detailed maintenance documentation and traceability
  • Contract manufacturing model requires consistent quality and uptime to maintain relationships with major retail partners
  • Already exploring actively with 8 sessions — created work orders and assets. Ask about their maintenance team experience so far.
  • Companies like Water Lilies Food (who also serves Walmart and Target) reduced their downtime from 2–4 hours to just 8 minutes per shift.
  • With FDA and cGPA compliance requirements, digital maintenance records and traceability is crucial.

Pipeline Observability

Aggregate performance metrics, cost tracking, and quality monitoring across all processed leads

Total Leads 12
Total API Cost $0.49
Avg Confidence 91.3%
Total Processing 560.2s

Route Distribution

nurture
5
ae fast track
4
low priority
2
bdr priority
1

Tier Distribution

Tier 1
5
Tier 2
5
Tier 3
2

Stage Performance

Stage Type Avg Confidence Pass Rate Avg Duration Cost / Lead Total Cost
intake deterministic 100.0% 100% 0ms
enrichment claude-sonnet-4 80.0% 100% 5.3s 0.69¢ $0.083
product signals deterministic 100.0% 100% 0ms
icp scoring deepseek-v3 80.0% 100% 3.4s 0.03¢ $0.004
routing deterministic 100.0% 100% 0ms
dossier claude-sonnet-4 85.0% 100% 18.2s 2.10¢ $0.252
email claude-sonnet-4 90.0% 100% 9.7s 1.05¢ $0.126
quality gate deepseek-v3 88.0% 100% 5.1s 0.15¢ $0.024

This is one approach

A standalone scoring-and-routing system was the right answer for this client because their trial volume was high enough to matter, their existing product analytics didn’t connect to the CRM in any useful way, and the sales team needed more than notifications — they needed context. For a smaller trial volume, a lighter touch might be enough: better HubSpot scoring properties, a smarter Slack alert, a routing rule on top of an enrichment tool the team already pays for. The diagnosis decides how much system the situation actually justifies.

Where an engagement starts

Not every trial conversion problem needs a standalone pipeline. Most engagements start by figuring out whether it does.

Start with an audit. Look at the trial flow end-to-end: what data is captured, where it lives, which signals the sales team actually uses, and where leads are falling through. Sometimes the gap is instrumentation — events that should be firing but aren’t. Sometimes it’s routing logic that exists but isn’t trusted. Sometimes it’s an ICP model that was never written down. The audit tells you which of those is the real bottleneck.

When the audit points at a scoring-and-routing build, the engagement looks like this:

  1. ICP and data source design — define scoring cohorts from the existing customer base, map the product analytics events worth using, identify enrichment signals that actually differentiate.
  2. Architecture scoped to your stack — pipeline stages tailored to your CRM and data sources, routing rules mapped to your actual sales motions.
  3. Staged build with checkpoints — each pipeline stage delivered and reviewed independently. You see working output at every checkpoint, not just at the end.
  4. Calibration against real data — run the pipeline against actual trial signups, diagnose misroutes, tune scoring thresholds.
  5. Handoff with documentation — the system is yours. Full code, architecture docs, calibration playbook.

Ongoing calibration is available as needed — new cohorts as the ICP evolves, scoring adjustments as the product does.

The pain this solves

Most of your trial signups never talk to sales

Read about the problem →

Case study

Building a Trial Conversion Engine for a Mid-Market SaaS Platform

A $30M B2B SaaS company with 4,000+ customers in asset-intensive industries

Read the case study →

Want to see this built for your stack? Let's scope it.

Let's talk

Tell us what you're working on, or book a call directly.

Or book a call