After 18 months of citation data, we ranked every intervention by its measured impact on citation probability. Wikidata completeness, robots.txt access, and Organization schema hold the top three positions — by a wide margin.
Not all citation optimizations are equal. Some produce large, durable gains. Others produce marginal improvements that decay within a few weeks. After 18 months of weekly citation tracking across 500+ domains, we can now rank interventions by their measured impact.
The citation stack, ranked by citation lift
| Intervention | Median citation lift | Persistence |
|---|---|---|
| Wikidata entity — complete (4+ properties) | +34% | Permanent |
| robots.txt — allow all LLM crawlers | +28% | Permanent |
| Organization schema + sameAs links | +22% | Permanent |
| Common Crawl presence (indexed) | +19% | Semi-permanent |
| Wikipedia article (any length) | +18% | Permanent |
| FAQPage schema (5+ Q&A pairs) | +14% | Durable |
| Sitemap lastmod freshness | +11% | Requires maintenance |
| LinkedIn company page — complete | +9% | Semi-permanent |
| Press/earned media links | +8% | Accumulative |
| GitHub org presence (dev-tool brands) | +7% | Permanent |
The top three share a common property: they are machine-readable, authoritative, and directly consumed by LLM retrieval systems without requiring crawl or interpretation. Wikidata and schema.org markup speak directly to the entity resolution layer.
Why robots.txt access ranks second
Many teams assume that if their site is live and appears in Google Search, LLM crawlers can access it. This is false. The majority of LLM crawlers use separate user-agent strings not covered by generic allow rules. Domains with misconfigured robots.txt are effectively invisible to the crawlers that feed citation systems — regardless of their content quality or entity coverage.
The robots.txt playbook lists all 16 current LLM crawler user agents and the exact allow directives required for each. This is the fastest-ROI fix in the stack for most domains.
The compounding effect
The stack compounds. A brand with all top-three interventions in place does not see 34+28+22 = 84% citation lift — it sees approximately 2.8x overall citation probability relative to a brand with none of them. The interventions reinforce each other: Wikidata data surfaces in the knowledge base, sameAs links validate the entity, and crawler access ensures the live site is regularly indexed to supplement the knowledge base.
What this means
If you have not started, start at the top. Complete your Wikidata entity first, then audit robots.txt for LLM crawlers, then add Organization schema with sameAs links. The three together take less than a week and produce structural gains that compound over time. Scan your domain to see which layers you have and which are missing.
Put this into practice
See how your domain scores on the signals covered in this edition. Veezow runs a free AI visibility scan — robots, sitemap, structured data, bot access, and off-site presence.
Run a free scan →