A vocabulary,
not adjectives.

Twenty-eight-plus signals in a controlled, editable taxonomy. Pitches can only fire from observed signals. The model can't compliment a lead on a strength they don't have.

Sample taxonomy

Real signal names from the live registry.

multi_system_scatter manual_po_entry marketplace_dependency no_online_booking portfolio_wp_heavy portfolio_bloat slow_load_>3s stale_blog_>6mo no_dealer_locator no_request_quote single_channel_amazon low_review_velocity missing_oss_repo support_email_only no_changelog_visible team_page_outdated single_payment_gateway no_intl_shipping manual_inventory_sync + 9 more in registry

Verticals add their own. The Devtools pack adds missing_oss_repo. The Coatings pack adds no_dealer_locator. You can add and rename your own, and the renames propagate through every solution template that referenced them.

In / Out

What goes in, what comes out.

Inputs

Audit + Taxonomy

lead.webAudit — Perplexity structured extraction (channels, traffic, ecommerce maturity, content recency)
lead.htmlSnapshot — homepage + key page captures
lead.socialResearch — follower counts, post recency
taxonomy.signals[] — controlled vocabulary, per-vertical extensions
taxonomy.derivationRules[] — declarative rules (e.g. no_dealer_locator if HTML lacks "locator")

Outputs

PainSignal[]

painSignal.name — must match taxonomy entry
painSignal.confidence — 0–1, with derivation source
painSignal.evidence — quoted text or URL anchor
painSignal.derivedFrom — rule | llm | manual
painSignal.frequencyCap — max signals retained per lead

How it works

Rules first.
LLM second.
Operator last.

Signal extraction is a three-pass operation. The first pass runs declarative rules over the structured audit and HTML snapshot. no_dealer_locator fires if the HTML lacks any anchor or section labelled "locator", "find a dealer", or equivalent in the audit. slow_load_>3s fires from the Lighthouse-style timing in the audit. These rules are deterministic, free, and explain themselves.

The second pass is LLM-driven extraction (Perplexity Sonar with structured output) over the same inputs. It can name signals the rule pass missed — most signals about strategy and positioning fall into this bucket. The model is bound to the taxonomy: it can only emit signal names that already exist. New signal proposals go into a separate candidates table for human curation.

The third pass is the operator. Every signal carries provenance — rule, LLM, or manual — and an evidence quote. You can override a signal that fired wrong, add a signal the system missed, or downvote a signal that the model overuses. Overrides feed back into the prompt registry as few-shot examples on the next version bump. The taxonomy itself is editable: rename, deprecate, merge, or split signals, and every solution template that referenced the old name updates by reference.

The prompt

Bound to the vocabulary.

--- system ---
You extract pain signals from a B2B company's web presence.
You may ONLY emit signal names from the provided taxonomy. If you
identify a pain that does not match any taxonomy entry, write it to
candidates[] for human review — never invent a name in the output.

--- inputs ---
audit:        {{lead.webAudit | json}}
htmlSnapshot: {{lead.htmlSnapshot | truncate(8000)}}
social:       {{lead.socialResearch | json}}
taxonomy:     {{taxonomy.signals[] | json}}      # 28 entries

--- output schema ---
{
  "signals": [
    {
      "name":       string  # MUST be in taxonomy,
      "confidence": 0..1,
      "evidence":   string  # quoted text or URL anchor
    }
  ],
  "candidates": [
    { "proposedName": string, "rationale": string }
  ]
}

--- guards ---
- frequency_cap: max 6 signals per lead. Drop lowest confidence first.
- evidence required: signals without an evidence quote are dropped.
- on missing input: do NOT emit. Write field name to missing_required.

# <!-- PLACEHOLDER — full prompt registry available in app -->

Failure modes & safeguards

What can break.
And what catches it.

Risk

Signal over-detection

Every lead gets the same eight signals, pitches start to look templated.

Mitigation

Frequency cap per lead (default 6). Lowest-confidence signals drop first. Per-signal saturation alerts in the registry.

Risk

Hallucinated signal name

Model emits a plausible-sounding signal that no template references.

Mitigation

Schema-bound output. Names not in taxonomy are silently dropped to candidates[] for curation.

Risk

Evidence-free claims

A signal fires but the operator can't see why.

Mitigation

Every signal must carry an evidence quote or URL anchor. Signals without evidence are dropped server-side.

Where it sits

02
Audit 03
Social 05
Signals 07
Enrich 08
Draft 09
Review 10
Sent 11
Funnel

Extract pain signals on a real lead.
In about 12 seconds.

Drop a domain. Watch the rules and the LLM agree (or argue) in the sandbox.

Try the Sandbox Talk to Sales

A vocabulary, not adjectives.

Real signal names from the live registry.

What goes in, what comes out.

Rules first. LLM second.Operator last.

Bound to the vocabulary.

What can break. And what catches it.

Extract pain signals on a real lead. In about 12 seconds.

A vocabulary,
not adjectives.

Rules first.
LLM second.
Operator last.

What can break.
And what catches it.

Extract pain signals on a real lead.
In about 12 seconds.