Glossary / Technical Signals
Technical Signals
robots.txt
A plain-text file at the root of your domain that tells crawlers which paths they may or may not fetch.
Definition
robots.txt is a standard file at `yourdomain.com/robots.txt` that web crawlers check before requesting other pages. It uses `User-agent` and `Disallow`/`Allow` directives to specify which paths each crawler can access. Most AI crawlers respect robots.txt as a binding policy, though they are not technically required to.
Why it matters for AI visibility
A misconfigured or overly broad robots.txt is the single most common reason AI crawlers cannot access important pages. Even a single `Disallow: /` directive under `User-agent: *` will block all crawlers from all pages.
Related
Check your site
The free scan checks crawler access, robots.txt, sitemap, structured data, and discoverability — and turns the results into a prioritized fix list.