VEEZOW

03 / EDITIONS · 2026.07.07

The citation stack: which entity layer produces the highest ROI for AI visibility

After 18 months of citation data, we ranked every intervention by its measured impact on citation probability. Wikidata completeness, robots.txt access, and Organization schema hold the top three positions — by a wide margin.

Not all citation optimizations are equal. Some produce large, durable gains. Others produce marginal improvements that decay within a few weeks. After 18 months of weekly citation tracking across 500+ domains, we can now rank interventions by their measured impact.

The citation stack, ranked by citation lift

InterventionMedian citation liftPersistence
Wikidata entity — complete (4+ properties)+34%Permanent
robots.txt — allow all LLM crawlers+28%Permanent
Organization schema + sameAs links+22%Permanent
Common Crawl presence (indexed)+19%Semi-permanent
Wikipedia article (any length)+18%Permanent
FAQPage schema (5+ Q&A pairs)+14%Durable
Sitemap lastmod freshness+11%Requires maintenance
LinkedIn company page — complete+9%Semi-permanent
Press/earned media links+8%Accumulative
GitHub org presence (dev-tool brands)+7%Permanent

The top three share a common property: they are machine-readable, authoritative, and directly consumed by LLM retrieval systems without requiring crawl or interpretation. Wikidata and schema.org markup speak directly to the entity resolution layer.

Why robots.txt access ranks second

Many teams assume that if their site is live and appears in Google Search, LLM crawlers can access it. This is false. The majority of LLM crawlers use separate user-agent strings not covered by generic allow rules. Domains with misconfigured robots.txt are effectively invisible to the crawlers that feed citation systems — regardless of their content quality or entity coverage.

The robots.txt playbook lists all 16 current LLM crawler user agents and the exact allow directives required for each. This is the fastest-ROI fix in the stack for most domains.

The compounding effect

The stack compounds. A brand with all top-three interventions in place does not see 34+28+22 = 84% citation lift — it sees approximately 2.8x overall citation probability relative to a brand with none of them. The interventions reinforce each other: Wikidata data surfaces in the knowledge base, sameAs links validate the entity, and crawler access ensures the live site is regularly indexed to supplement the knowledge base.

What this means

If you have not started, start at the top. Complete your Wikidata entity first, then audit robots.txt for LLM crawlers, then add Organization schema with sameAs links. The three together take less than a week and produce structural gains that compound over time. Scan your domain to see which layers you have and which are missing.

Put this into practice

See how your domain scores on the signals covered in this edition. Veezow runs a free AI visibility scan — robots, sitemap, structured data, bot access, and off-site presence.

Run a free scan →

New every Monday

The Weekly Visibility Index in your inbox at 06:00 UTC — citation trends, engine behaviour, no product announcements.

More from Insights

2026.07.28

Freshness signals: why LLMs cite recently-updated content at higher rates — and how lastmod drives it

2026.07.21

Retrieval-augmented vs. base model citations: why optimizing for the wrong engine delays your results by months

2026.07.14

Schema consistency vs. schema completeness: what actually drives citation accuracy

All editions →

← PREVIOUS

B2B vs B2C citation patterns: enterprise software cited 2.3x more than consumer apps in AI answers

NEXT →

Schema consistency vs. schema completeness: what actually drives citation accuracy