Discovery turns a few sentences of GTM intent into a candidate pool. Not your lead database — a parking lot you can throw away and rerun.
searchRun.pack — one of devtools / fintech / clinical / recruiting / logistics / marketing / legalsearchRun.market — country or regionsearchRun.startingChannel — Amazon / Shopify / WooCommerce / Wix / etc.searchRun.vertical — kubernetes, payments, dental, fitness, …searchRun.subcategory — narrows vertical (e.g. b2c | b2b | distributor_network)searchRun.pitchType — what angle you're approaching withdiscovery.companyName — stringdiscovery.domain — resolvable, validateddiscovery.lightFitScore — 0–100, pre-audit estimatediscovery.lightSignals[] — initial pain hypothesisdiscovery.searchRunId — back-link for prompt attributiondiscovery.status — pending | promoted | rejectedLead rows. Rejected stay in the discovery pool, never pollute pipeline metrics.
You don't write a database query. You write a brief — pack, market, channel, vertical, subcategory, pitchType. The discovery prompt builder composes those into a structured instruction for a structured-extraction LLM (Gemini Flash by default), which returns candidate companies with light-fit scoring and an initial pain hypothesis.
Results land in a pre-promote pool — a separate table from your real leads. You can rerun the same brief with a different starting channel, a different subcategory, or a tweaked prompt and compare results without polluting your pipeline. Anything that looks worth real research gets a one-click promote into the Lead table; the rest stays parked.
Promotion is the only path that consumes contact enrichment, web audit, or pitch-brief tokens. That keeps your spend on AI proportional to the quality of your input brief, not to the size of the candidate pool. A bad brief produces 200 cheap discoveries and zero promotions — fine. A good brief produces 60 discoveries and 40 promotions — also fine, and an order of magnitude more useful.
A real (shortened) discovery prompt body. The full registry lives in your workspace.
--- system --- You are a B2B lead discovery assistant. Return a JSON array of company candidates that fit the brief below. NEVER invent companies that do not have a resolvable public domain. NEVER repeat companies the operator has already rejected (rejected_domains list provided). --- brief --- pack: {{pack}} # devtools market: {{market}} # US-NE + Canada startingChannel: {{startingChannel}} # GitHub / Docker Hub vertical: {{vertical}} # kubernetes subcategory: {{subcategory}} # multi-cluster pitchType: {{pitchType}} # infra_complexity --- output schema --- [ { "companyName": string, "domain": string # must resolve, "lightFitScore": 0..100, "lightSignals": string[] # names from taxonomy only, "rationale": string # why this company fits brief } ] --- guards --- - max_results: {{cap}} - exclude_domains: {{rejected_domains[]}} - if a candidate cannot meet ALL of {has_website, has_observable_channel, matches_vertical}, omit it. Never fill with weaker matches. # <!-- PLACEHOLDER — full prompt registry available in app -->
Source LLM invents a plausible-sounding company that doesn't exist.
Hard-input check: domain must resolve via DNS before the row is even written to the discovery pool.
Same companies keep surfacing across runs, wasting LLM tokens.
Rejected and promoted domains feed back into the prompt as exclude_domains on every run.
Bad discoveries flow downstream and skew funnel reporting.
Pre-promote pool is a separate table. Only explicit operator promote creates a Lead row.
Try discovery on a real vertical in the sandbox. No signup.