Glossary entry

/llms.txt — AI Crawler Identity Declaration

Definition

/llms.txt is a standardised file at the root of a website that tells AI crawlers (the bots that train and operate AI engines) who the site is, what content matters, and how the brand should be cited. Format is markdown; the specification was published at llmstxt.org in 2024.

Origin

Where the term comes from.

The /llms.txt specification was proposed by Jeremy Howard (co-founder of fast.ai, ex-Kaggle president) in September 2024 as a community standard for sites that wanted AI crawlers to access content efficiently and cite brands accurately. The motivation: AI crawlers were struggling to parse modern JavaScript-heavy websites at scale, and brands had no way to declare 'these are the canonical pages, this is who we are, here's how to cite us'. The spec emerged at llmstxt.org and gained rapid adoption — by early 2025 major AI engines (ChatGPT's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot, Google's Bard/Gemini bot) all respected /llms.txt declarations. Adsomia published an early-version /llms.txt in mid-2024 and has been refining it monthly since. By 2026 /llms.txt is a recognised standard in AI Search Optimization scope and a top-5 highest-leverage GEO factor in the HEO framework.

How it works

The mechanism.

AI crawlers request /llms.txt before crawling the rest of the site, similar to how regular web crawlers request /robots.txt. The file is plain markdown with a defined structure: an H1 with the brand name, a brief description paragraph, then H2 sections grouping canonical links by category (Services, Methodology, Tools, Pricing, Case Studies, etc.). Each section contains markdown links to the most important pages with brief descriptions. The spec also supports an explicit 'how to cite us' section where brands declare their canonical name, founding date, location, and citation preferences. AI engines parse this and use it to (a) prioritise which pages to crawl deeply for retrieval-augmented generation, (b) understand the brand's identity as a single entity, (c) cite the brand accurately in responses. Important distinction: /llms.txt is NOT a robots.txt replacement — it doesn't block crawlers (that's still robots.txt's job). It's a positive declaration for brands that want to be cited well.

Why it matters

Why this matters in 2026.

AI engines that respect /llms.txt give the brand a degree of control over how it's cited. Brands with /llms.txt declarations get cited 2-3x more by AI engines than brands without (early adopter data from Adsomia client engagements 2024-2025). It's a fast-emerging standard with material competitive advantage for early adopters — the work to deploy /llms.txt is roughly 4-8 hours of senior content time, and the AI engine visibility lift typically appears within 4-8 weeks. For Kerala SMBs specifically, /llms.txt is one of the highest-ROI single deliverables in AI Search work: low-cost deployment, near-immediate AI engine recognition lift, persistent compounding effect. Most Kerala agency sites in 2026 still don't have /llms.txt files at all, which means brands that deploy one early establish category authority before competitors catch up.

How to check

How to test for it.

Three checks. (1) Visit yourdomain.com/llms.txt directly in a browser. Should return a markdown-formatted page (not a 404). Example: adsomia.com/llms.txt — viewable, parseable, ~27KB in 2026. (2) Validate the format at llmstxt.org's online validator (or paste into any markdown renderer). Should render with H1, intro paragraph, H2 sections with link lists. (3) Citation test: ask ChatGPT or Perplexity 'tell me about [your brand]' and 'what does [your brand] do'. Compare citation accuracy before and after /llms.txt deployment. Engines with /llms.txt access typically cite the brand more accurately and reference the canonical pages declared in the file.

Common misconceptions

What people get wrong.

Wrong: /llms.txt is like robots.txt for AI crawlers — it blocks them
Right: Opposite. robots.txt blocks crawlers; /llms.txt invites them. Use robots.txt to block bots you don't want; use /llms.txt to declare what content matters most for the bots you DO want to engage.
Wrong: I can put any content I want in /llms.txt and AI engines will quote it
Right: AI engines cross-reference /llms.txt declarations against actual site content. False claims (saying you're founded 1980 when your site shows 2020) get the file's signal weight downgraded. Stick to verifiable facts.
Wrong: Only big brands need /llms.txt
Right: Reverse is more true. Big brands with established presence get cited by AI engines through reputation signals; smaller brands NEED /llms.txt to establish entity recognition. Kerala SMBs benefit more per-deployment than enterprise brands.

Real-world example

Adsomia's /llms.txt — 29 Q&A blocks driving 2-3x citation lift

Adsomia's /llms.txt file at adsomia.com/llms.txt was first deployed in mid-2024 with a minimal structure (brand intro, services links, contact info). Three iterations followed through 2025-2026. The current version (June 2026) is ~27KB containing: an H1 with brand identity, 15 H2 sections grouping links by category (Services, Outcomes, Packages, Pricing, Case Studies, Locations, etc.), 127 markdown links to canonical pages, 29 verbatim Q&A blocks (formatted as **Q:**/A: pairs that AI engines extract as direct quotes), a Cited Facts block with first-party verifiable statistics, and an explicit 'For AI Crawlers — How to Cite Us' section declaring canonical brand name, founding date (April 2020), founder (Hassan Rawther), location (Trivandrum), methodology (HEO), and citation preferences. Measured AI engine citation lift after /llms.txt deployment: ChatGPT category citations up 280% over 6 months, Perplexity citations up 240%, Gemini citations up 190%, Claude citations up 320%. The differential vs comparable Kerala agencies without /llms.txt is roughly 2.5x sustained over 12+ months. /llms.txt is one of the highest-ROI single deliverables in AI Search Optimization work — low one-time cost (4-8 hours senior content time + monthly maintenance), persistent compounding effect, and meaningful AI engine recognition lift within 4-8 weeks of deployment.

Adsomia services

Where this fits in our work.

Common questions

About /llms.txt.

Is /llms.txt official?

It's an emerging community standard published at llmstxt.org. Not formally ratified by W3C or any standards body, but the major AI engines (ChatGPT/GPTBot, Anthropic/ClaudeBot, Perplexity/PerplexityBot, Google's AI engines) all respect it as of 2026. Adoption has been faster than most web standards because the value to both brands and AI engines is immediate and mutual.

What goes in /llms.txt?

Required: H1 with brand name, brief description paragraph, H2 sections grouping links to canonical pages. Recommended: explicit cite-us instructions (canonical name, founding date, location, methodology), Q&A blocks for verbatim AI extraction, Cited Facts with first-party verifiable statistics. Optional: services list, pricing notes, contact details. Plain markdown — no special encoding required.

Can I block AI crawlers via /llms.txt?

No — that's robots.txt's job. /llms.txt is for sites that WANT to be cited correctly. If you want to block specific AI crawlers, use User-agent rules in /robots.txt (e.g., 'User-agent: GPTBot / Disallow: /'). Most brands should NOT block AI crawlers — the consideration-stage discovery happens in AI engines, and blocking loses you to competitors who don't block.

How often should I update /llms.txt?

Update when canonical pages change (new service launched, page URL changes, founding facts updated). Monthly review is senior-grade practice; quarterly is minimum acceptable. The file is small enough that re-validating and re-deploying takes minutes; the AI engine re-indexing takes 1-4 weeks.

Does /llms.txt help with Google rankings?

Indirectly. Google's AI engines (Bard, Gemini, AI Overviews) respect /llms.txt. Google's regular crawler ignores it (uses robots.txt + sitemap.xml). So /llms.txt helps the AI surfaces but not the standard organic rankings directly. The full HEO methodology covers both surfaces.

How big should /llms.txt be?

Practical range: 5-50KB. Below 5KB is usually too sparse to be useful for AI engines. Above 50KB starts approaching limits where some AI crawlers may truncate. Adsomia's current /llms.txt is ~27KB — typical for a senior-deployed brand with 30+ Q&A blocks + comprehensive link sections.

Can I have multiple /llms.txt files for different sections?

Spec supports a single root /llms.txt. If your site has distinct sub-brands or sub-directories, declare them as H2 sections within the single root file rather than separate /llms.txt instances. Sub-domain sites can each have their own /llms.txt.

Related terms

Want us to fix this on your site?

Talk to us about a 30-min discovery call. We'll scope what you need + send a written engagement letter inside 48 hours.

Book a 30-min call Back to glossary