Skip to content

What Is a Robots.txt File?

Reading Time: 4 minutes
Knowledge Hub Icon

Chapters

On Page SEO
1. What is On-Page SEO? – Important On-Page SEO Elements Guide
2. How to Use SEO Friendly URL’s – Length, Structure & Keywords
3. What is SEO Title Optimization? – Page Titles, Tag’s, Keywords
4. What is Meta Description Tag?
5. How to Optimize Content for SEO
6. What is SEO Internal Linking?
7. What are Rich Snippets & How to Optimize Your Website for Them
8. What is a SEO Schema Markup?
9. How to Create a Content Strategy
10. How to Write SEO Content?
11. How to Create Blog Content
12. How to Write a Content Article
13. How to Conduct a Content Audit
14. What Is an XML Sitemap?
15. What Is a H1 Tag?
16. How to Perform an SEO Audit
17. Ultimate Guide to Website Navigation
18. What Are SEO Footers?
19. What Are SEO Breadcrumbs?
20. Ultimate SEO Audit Checklist
21. What Is a Robots.txt File?
22. How To Improve Page Speed
23. How To Improve Mobile Optimisation
24. What Is an HTML Sitemap?
25. What is Google Cache?
26. What Is a Canonical Tag?
27. What Are Core Web Vitals?
28. Ultimate SEO Audit Checklist

Subscribe to our Newsletter

Share this guide

It’s easy to overlook the importance of robots.txt. However, this text file is a critical component of your on-page SEO. Why? Because a robots.txt file is effectively a roadmap for search engine bots, telling them which pages they can – and cannot – access on your website.  

What Is Robots.txt 

So, what is robots.txt file, and why is it important? In essence, a robot txt file supplies search engine bots with a set of instructions. The bots, such as web crawlers, will then read these instructions and appropriately manage their activities. The instructions typically boil down to one of two things: it can or cannot crawl a part of your website.   

The bots themselves are automated computer programs, which means they follow the rules you set. Now you might be wondering, why do you need to manage the activities of web crawlers? One reason is to avoid indexing any pages not intended for public view; another is to ensure your site’s web server hosting isn’t being overtaxed.  

Below is a robots txt example 

User-Agent: * 

Disallow: /*__* 

Sitemap: https://www.yourwebsite.com/sitemap_index.xml 

How Does a Robots.txt File Function? 

Now you know what is robots.txt, the next step is understanding how it functions. As you can likely gather by the .txt. extension, robots.txt is simply a text file – no HTML code included. Nevertheless, robots txt is still hosted like any other file found on a web server.  

If you don’t believe us, simply type in a website’s homepage URL, and add the “/robots.txt” extension. Try it out for yourself; you can take a look at our robot txt page here: https://www.clickintelligence.co.uk/robots.txt 

Even though it’s there for everyone to see and access, users won’t stumble upon your website’s robots txt file. Why? The answer is simple: there’s no reason to link to it on your site – unless you’re doing a demonstration as we did above! Nevertheless, this file is often the first stop in the journey for crawler bots before they crawl the rest of your site’s content.  

Crawler bots do this so they can follow your instructions and know which pages to access. Note that if the robot txt file features contradictory commands, the more granular command will be followed by the bot. Also, each subdomain requires its own robots txt file.  

What Is a User Agent?  

In the example of a robot txt file above, you may have been confused by the part that says “User-Agent: *”.  

In terms of a robot txt file, the user agent function allows you to indicate instructions for specific agents – aka bots. As an example, you could decide to have a certain webpage appear in search results for Bing, but not in Google results. In this case, you can set a disallow command for “User-agent: Googlebot”, and the opposite for “User-agent: Bingbot”. 

If the text file uses the asterisk (“User-agent: *”) option, this indicates the intended action is for all bots to follow – not just the specific one you’ve highlighted.  

How Does the Disallow Command Work?  

When it comes to the robots exclusion protocol (REP), one of the most prevalent commands is the “Disallow” one. With a robots.txt file, this function states that bots shouldn’t access a specific webpage – or collection of webpages – following the command.  

Is a disallowed page hidden completely? Not necessarily. However, you might decide to opt against a webpage showing up in search results because it’s not useful for your audience. If need be, a user will be able to find this page if they know how to navigate to it on your site.  

With robots txt disallow, there are various command options available. The most common options include:  

  • Single webpage block: The disallow selection allows you to block a single webpage. For instance, this could be a blog post or an “about us” page.  
  • Full website block: If you don’t want your website to appear in search results, you can do a robots.txt disallow all selection. This is achieved with a “Disallow: /” command, as the “/” acts as the site’s entire hierarchy.   
  • Full access: What if you don’t want to disallow any of your web pages? In this situation, you can go with the following command: “Disallow:” That lets the bots know they can crawl your whole website.  
  • Block a directory: Rather than a robots.txt disallow all option for your full site, you can block individual directories. This is a more efficient approach when you want to block numerous pages in one go.  

What Other Commands Are There? 

Even though it’s by far the most frequently used command, this doesn’t mean disallow is all alone. Here are the other two commands available with robots txt 

Allow  

Not much explanation is required with this command. Rather than disallowing a page from being available to bots, you can “Allow” a specific webpage or directory to be accessed. This is useful if you’re, for example, disallowing an entire directory, but you want to allow the bots to access a particular webpage within said directory.  

Crawl-Delay 

The crawl delay command in action helps prevent bots from putting too much strain on your server. This delay makes a bot wait a specified time, in milliseconds, between each request. If you want to wait 12 milliseconds, this is how to enter the command:  

Crawl-delay: 12 

Note that while other search engines recognise the crawl delay, Google doesn’t. However, if required, you can alter the crawl frequency within the Google Search Console.  

Using the Robots.txt Tester 

Do you want to ensure your robot.txt file is functioning as expected? The good news is that validating its crawl directives is an easy task. This is because of the robots.txt tester tool provided by none other than Google itself. You can find the robots.txt tester by clicking here. Google also provides a handy rundown, enabling users to know what to do when utilising the tester.  

As you might expect, there is a notable limitation with this tool: it can only be used to test your robots txt file with Google’s web crawler. Google “cannot predict how other web crawlers interpret” this file.  

Need Help? 

Those with experience and a decent amount of knowledge should find using the robots.txt file and robots.txt tester tool fairly simple, especially if the intention is to use it to check SEO performance. However, not everyone’s an expert – and that’s okay. If you find your mind boggled by the process – particularly when it comes to SEO strategy – don’t panic; our gurus at Click Intelligence are here to help. Simply get in touch for assistance! 

View all Downloads

Downloads

The cover of an e-book titled 'SEO: The Basics', features a person's hands typing on a laptop.

SEO: The Basics

Download our Basic SEO Guide to help you build traffic today!

Download Guide
View Learning Zone

Online Guides

Best Link Acquisition Services That Work
Agency Lists
View guide
The Top 5 B2B SEO Companies
Agency Lists
View guide
The Best US AI SEO Companies
Agency Lists
View guide
Best UK AI SEO Companies
Agency Lists
View guide
What Are the Best AI SEO Services?
Agency Lists
View guide
The 9 Best US SEO Companies
Agency Lists
View guide
The Top 10 Managed SEO Companies For Your Digital Marketing
Agency Lists
View guide
Looking For Managed SEO Services? These Are The 5 Best Options
Agency Lists
View guide
Back To Top