A byline in TechCrunch does more for AI citation probability than 50 blog posts. How press coverage creates citation pathways, which publications have the highest Common Crawl frequency, and how to target them.
Earned media — coverage in third-party publications — is one of the most effective and most underutilised levers for AI citation. A single article in a high-frequency publication can create more citation pathways than months of on-site content work.
Why press coverage creates citation pathways
LLM pretraining data is disproportionately sourced from publications with high Common Crawl frequency. TechCrunch, Forbes, Wired, VentureBeat, The Verge, and similar publications are crawled monthly and are heavily represented in training datasets. When these publications mention your brand, the mention enters the LLM training data and becomes a citation pathway.
- A journalist publishes an article mentioning your company
- Common Crawl indexes the article in its next monthly crawl
- The mention enters the LLM pretraining corpus
- When a user asks the model about your category, your brand appears as a referenced entity
This is why brands with strong press coverage are cited even when their own website is technically citation-hostile (bad robots.txt, no structured data). Off-site coverage compensates.
Publication tiers by Common Crawl frequency
Not all press is equal for AI citation. Coverage in small-domain publications has minimal impact because they are crawled infrequently.
Tier 1 (crawled monthly, high corpus weight): TechCrunch, The Verge, Wired, Forbes, VentureBeat, Ars Technica, MIT Technology Review, Bloomberg Tech, Reuters, BBC News
Tier 2 (crawled quarterly, moderate corpus weight): Entrepreneur, Fast Company, Inc., Business Insider, ZDNet, CNET
Tier 3 (crawled annually, low corpus weight): trade publications, niche B2B press, regional business journals
For AI citation purposes, one Tier 1 article outweighs 20 Tier 3 articles.
What triggers press coverage
The most reliable press triggers in B2B tech:
- Fundraising announcements (Series A and above get automatic coverage in TechCrunch, Bloomberg)
- Original research (data studies, benchmark reports — journalists want numbers)
- Product launches on Product Hunt (Featured placement generates press coverage; also a high-CC-frequency citation source itself)
- Executive hires (VP-level and above get covered in trade press)
- Partnerships with known brands (press releases with recognizable co-brands get picked up)
The research angle
The highest ROI for sustainable AI citation is publishing original research. A data study gets covered, cited by other publications, and creates a durable citation pathway. The study should be data-driven (original data), counterintuitive (surprises drive coverage), timely (tied to a trend the press is already tracking), and easy to excerpt (a clear headline stat is mandatory).
Citation clustering
When multiple high-CC-frequency publications mention your brand within a short window, the model's entity confidence increases. Citation clustering — earning coverage in several publications at once — is more effective than one article followed by a long gap. This is why product launches, fundraising announcements, and research report drops should always be accompanied by maximum press distribution.
Integration with your citation strategy
Press is most effective when combined with structural work:
- Wikidata entity graph: off-site press mentions map onto your Wikidata entity, reinforcing it
- Citation velocity: Tier 1 press is the highest-velocity citation accelerator available
- Wikipedia presence strategy: press coverage in reliable sources is a prerequisite for Wikipedia notability
Run a free scan to see your current off-site citation score — how many high-frequency sources reference your domain, and where the gaps are.
Measure your current position
Veezow scans your domain for the signals covered in this playbook — robots.txt access, structured data, Common Crawl presence, bot permissions, and off-site mentions — and scores them in one report.
Run a free scan →