VEEZOW

Glossary / AI Crawlers

AI Crawlers

Google-Extended

Google's opt-out user-agent for AI product training, separate from regular search crawling.

Definition

Google-Extended is a separate crawler user-agent Google introduced to give site owners control over whether their content is used to train Gemini models and improve AI features like AI Overviews. Blocking it does not affect standard Google Search indexing — that is controlled by Googlebot. Allowing Google-Extended is a distinct opt-in for AI product training.

Why it matters for AI visibility

If you block Google-Extended, your content may be underrepresented in Gemini training data and Google's generative AI answers. Since it is separate from Googlebot, you can allow one while restricting the other.

Related

GPTBotOpenAI's web crawler that fetches content to train and update its models.
ClaudeBotAnthropic's crawler used to collect content for training and grounding Claude models.
PerplexityBotPerplexity AI's crawler that indexes content for real-time answer generation.
robots.txtA plain-text file at the root of your domain that tells crawlers which paths they may or may not fetch.
↗ Checklist: AI crawler accessGPTBot, ClaudeBot, PerplexityBot, Google-Extended, and other AI crawlers need clear permission to fetch important pages.

Check your site

The free scan checks crawler access, robots.txt, sitemap, structured data, and discoverability — and turns the results into a prioritized fix list.

Run the free scan →Back to glossary