← Product / Stage 02 · Web Audit

Pipeline · Stage 02

Read the website
like a senior rep would.

Perplexity structured extraction over the homepage and key sub-pages. Channels, ecommerce maturity, traffic proxy, content recency — turned into typed fields the next nine stages can reason on.

In / Out

What goes in, what comes out.

Inputs

Lead seed

lead.domain — the only required field
lead.packId — controls per-pack field weights and required fields
lead.socialHandles[] — optional seed handles if known
pack.auditFields[] — vertical-specific fields to extract (e.g. has_oss_repo for devtools, has_dealer_locator for logistics)
pack.minMaturity — gate value below which the audit short-circuits

Outputs

web_audit

web_audit.channels[] — b2c | b2b | distributor | marketplace
web_audit.ecommerce — shopify | woocommerce | custom | none
web_audit.trafficLevel — proxy bucket: low | mid | high
web_audit.has_request_quote, has_dealer_locator
web_audit.marketplaceDeps[] — Amazon, Etsy, ThomasNet, etc.
web_audit.socialFollowers — per-network counts
web_audit.contentRecency — newest blog post age in days
web_audit.hiringSignals[] — open roles by team

Schema-bound: strict_json: true · missing → null

How it works

One Perplexity call.
Typed back into a row.

channels→ enum[]

ecommerce→ enum

trafficLevel→ low | mid | high

has_request_quote→ bool | null

has_dealer_locator→ bool | null

marketplaceDeps→ string[]

socialFollowers→ map<net, int>

contentRecency→ days_since

— logistics pack adds has_dealer_locator; devtools pack adds has_oss_repo

The audit is one model call, not a crawl. Perplexity Sonar is given the lead's domain, a list of structured fields to extract, and the schema each field must match. It reads the home page, the about page, the products or pricing page if present, and the most recent blog or news entry. The output comes back as JSON, validated against the schema, and written into the lead's web_audit column. No HTML scraping in our infra. No flaky DOM selectors that break on a Webflow redesign.

Field requirements are per-pack. The logistics pack needs has_dealer_locator and marketplaceDeps because the pitches downstream lean on those. The devtools pack needs has_oss_repo and recent_release_age. A pack declares which audit fields it needs, and the prompt is composed at run-time from that declaration. Add a field to a pack and every lead in that pack re-audits on the next refresh.

Traffic level is a proxy, not a measurement — we do not have access to the lead's analytics. The model estimates a bucket from public signals: visible review counts, marketplace listing rank, social follower distribution, and the cadence of dated content. Buckets are intentionally coarse. low | mid | high is a useful axis to sort on; a fake-precise visitor count is not. Where the bucket is uncertain, the field returns null and the lead waits for a human spot-check before qualification can fire.

The prompt

Schema-bound. Per-pack composed.

--- system ---
You audit a B2B company's web presence. Read only what is publicly
available at the given domain. NEVER infer a field if the page does
not provide a basis. Return null and add the field to missing[].

--- inputs ---
domain:    {{lead.domain}}
pack:      {{lead.packId}}            # logistics
fields:    {{pack.auditFields | json}}

--- output schema ---
{
  "channels":           ("b2c"|"b2b"|"distributor"|"marketplace")[],
  "ecommerce":          "shopify"|"woocommerce"|"custom"|"none"|null,
  "trafficLevel":       "low"|"mid"|"high"|null,
  "has_request_quote":  boolean|null,
  "has_dealer_locator": boolean|null,
  "marketplaceDeps":    string[],
  "socialFollowers":    { linkedin?:int, x?:int, instagram?:int, facebook?:int },
  "contentRecency":     int|null,         # days since newest dated post
  "hiringSignals":      string[],         # role · team
  "missing":            string[]
}

--- guards ---
- Each enum value MUST come from the declared list. Unknown → null.
- trafficLevel is a coarse bucket; do not invent a visitor number.
- Quote the source URL anchor for every non-null field in _evidence.

# <!-- PLACEHOLDER — full prompt registry available in app -->

Failure modes & safeguards

What can break.
And what catches it.

Risk

Hallucinated facts

Model invents an ecommerce platform or a dealer-locator that isn't on the site.

Mitigation

Every non-null field carries a URL-anchor quote in _evidence. Fields without evidence are dropped to null server-side before write.

Risk

Cloudflare or login walls

The page is behind a JS challenge or auth wall, audit returns nothing useful.

Mitigation

All required fields land in missing[]. Lead is tagged audit_blocked and never enters Qualification — it surfaces in a manual triage queue.

Risk

Stale audit drift

Lead's site is rebuilt; the audit on file no longer reflects reality.

Mitigation

Per-pack TTL on audit rows. Re-audit triggers automatically when a lead is promoted out of cold storage or when the pack's audit-field list changes.

Where it sits

02
Audit

Audit a real domain.
In a sandbox, in 2 minutes.

Drop a domain. Watch eight typed fields appear with quoted evidence.

Try the Sandbox See Pricing

Read the website like a senior rep would.

What goes in, what comes out.

One Perplexity call. Typed back into a row.

Schema-bound. Per-pack composed.

What can break. And what catches it.

Audit a real domain. In a sandbox, in 2 minutes.

Read the website
like a senior rep would.

One Perplexity call.
Typed back into a row.

What can break.
And what catches it.

Audit a real domain.
In a sandbox, in 2 minutes.