Twenty-eight-plus signals in a controlled, editable taxonomy. Pitches can only fire from observed signals. The model can't compliment a lead on a strength they don't have.
Verticals add their own. The Devtools pack adds missing_oss_repo. The Coatings pack adds no_dealer_locator. You can add and rename your own, and the renames propagate through every solution template that referenced them.
lead.webAudit — Perplexity structured extraction (channels, traffic, ecommerce maturity, content recency)lead.htmlSnapshot — homepage + key page captureslead.socialResearch — follower counts, post recencytaxonomy.signals[] — controlled vocabulary, per-vertical extensionstaxonomy.derivationRules[] — declarative rules (e.g. no_dealer_locator if HTML lacks "locator")painSignal.name — must match taxonomy entrypainSignal.confidence — 0–1, with derivation sourcepainSignal.evidence — quoted text or URL anchorpainSignal.derivedFrom — rule | llm | manualpainSignal.frequencyCap — max signals retained per lead
Signal extraction is a three-pass operation. The first pass runs declarative rules over the structured audit and HTML snapshot. no_dealer_locator fires if the HTML lacks any anchor or section labelled "locator", "find a dealer", or equivalent in the audit. slow_load_>3s fires from the Lighthouse-style timing in the audit. These rules are deterministic, free, and explain themselves.
The second pass is LLM-driven extraction (Perplexity Sonar with structured output) over the same inputs. It can name signals the rule pass missed — most signals about strategy and positioning fall into this bucket. The model is bound to the taxonomy: it can only emit signal names that already exist. New signal proposals go into a separate candidates table for human curation.
The third pass is the operator. Every signal carries provenance — rule, LLM, or manual — and an evidence quote. You can override a signal that fired wrong, add a signal the system missed, or downvote a signal that the model overuses. Overrides feed back into the prompt registry as few-shot examples on the next version bump. The taxonomy itself is editable: rename, deprecate, merge, or split signals, and every solution template that referenced the old name updates by reference.
--- system --- You extract pain signals from a B2B company's web presence. You may ONLY emit signal names from the provided taxonomy. If you identify a pain that does not match any taxonomy entry, write it to candidates[] for human review — never invent a name in the output. --- inputs --- audit: {{lead.webAudit | json}} htmlSnapshot: {{lead.htmlSnapshot | truncate(8000)}} social: {{lead.socialResearch | json}} taxonomy: {{taxonomy.signals[] | json}} # 28 entries --- output schema --- { "signals": [ { "name": string # MUST be in taxonomy, "confidence": 0..1, "evidence": string # quoted text or URL anchor } ], "candidates": [ { "proposedName": string, "rationale": string } ] } --- guards --- - frequency_cap: max 6 signals per lead. Drop lowest confidence first. - evidence required: signals without an evidence quote are dropped. - on missing input: do NOT emit. Write field name to missing_required. # <!-- PLACEHOLDER — full prompt registry available in app -->
Every lead gets the same eight signals, pitches start to look templated.
Frequency cap per lead (default 6). Lowest-confidence signals drop first. Per-signal saturation alerts in the registry.
Model emits a plausible-sounding signal that no template references.
Schema-bound output. Names not in taxonomy are silently dropped to candidates[] for curation.
A signal fires but the operator can't see why.
Every signal must carry an evidence quote or URL anchor. Signals without evidence are dropped server-side.
Drop a domain. Watch the rules and the LLM agree (or argue) in the sandbox.