Perplexity structured extraction over the homepage and key sub-pages. Channels, ecommerce maturity, traffic proxy, content recency — turned into typed fields the next nine stages can reason on.
lead.domain — the only required fieldlead.packId — controls per-pack field weights and required fieldslead.socialHandles[] — optional seed handles if knownpack.auditFields[] — vertical-specific fields to extract (e.g. has_oss_repo for devtools, has_dealer_locator for logistics)pack.minMaturity — gate value below which the audit short-circuitsweb_audit.channels[] — b2c | b2b | distributor | marketplaceweb_audit.ecommerce — shopify | woocommerce | custom | noneweb_audit.trafficLevel — proxy bucket: low | mid | highweb_audit.has_request_quote, has_dealer_locatorweb_audit.marketplaceDeps[] — Amazon, Etsy, ThomasNet, etc.web_audit.socialFollowers — per-network countsweb_audit.contentRecency — newest blog post age in daysweb_audit.hiringSignals[] — open roles by teamlogistics pack adds has_dealer_locator; devtools pack adds has_oss_repo
The audit is one model call, not a crawl. Perplexity Sonar is given the lead's domain, a list of structured fields to extract, and the schema each field must match. It reads the home page, the about page, the products or pricing page if present, and the most recent blog or news entry. The output comes back as JSON, validated against the schema, and written into the lead's web_audit column. No HTML scraping in our infra. No flaky DOM selectors that break on a Webflow redesign.
Field requirements are per-pack. The logistics pack needs has_dealer_locator and marketplaceDeps because the pitches downstream lean on those. The devtools pack needs has_oss_repo and recent_release_age. A pack declares which audit fields it needs, and the prompt is composed at run-time from that declaration. Add a field to a pack and every lead in that pack re-audits on the next refresh.
Traffic level is a proxy, not a measurement — we do not have access to the lead's analytics. The model estimates a bucket from public signals: visible review counts, marketplace listing rank, social follower distribution, and the cadence of dated content. Buckets are intentionally coarse. low | mid | high is a useful axis to sort on; a fake-precise visitor count is not. Where the bucket is uncertain, the field returns null and the lead waits for a human spot-check before qualification can fire.
--- system --- You audit a B2B company's web presence. Read only what is publicly available at the given domain. NEVER infer a field if the page does not provide a basis. Return null and add the field to missing[]. --- inputs --- domain: {{lead.domain}} pack: {{lead.packId}} # logistics fields: {{pack.auditFields | json}} --- output schema --- { "channels": ("b2c"|"b2b"|"distributor"|"marketplace")[], "ecommerce": "shopify"|"woocommerce"|"custom"|"none"|null, "trafficLevel": "low"|"mid"|"high"|null, "has_request_quote": boolean|null, "has_dealer_locator": boolean|null, "marketplaceDeps": string[], "socialFollowers": { linkedin?:int, x?:int, instagram?:int, facebook?:int }, "contentRecency": int|null, # days since newest dated post "hiringSignals": string[], # role · team "missing": string[] } --- guards --- - Each enum value MUST come from the declared list. Unknown → null. - trafficLevel is a coarse bucket; do not invent a visitor number. - Quote the source URL anchor for every non-null field in _evidence. # <!-- PLACEHOLDER — full prompt registry available in app -->
Model invents an ecommerce platform or a dealer-locator that isn't on the site.
Every non-null field carries a URL-anchor quote in _evidence. Fields without evidence are dropped to null server-side before write.
The page is behind a JS challenge or auth wall, audit returns nothing useful.
All required fields land in missing[]. Lead is tagged audit_blocked and never enters Qualification — it surfaces in a manual triage queue.
Lead's site is rebuilt; the audit on file no longer reflects reality.
Per-pack TTL on audit rows. Re-audit triggers automatically when a lead is promoted out of cold storage or when the pack's audit-field list changes.
Drop a domain. Watch eight typed fields appear with quoted evidence.