x First time here? Check out the FAQ

Robots.txt for use with Umbraco

    The Robots Exclusion Protocol has been around for many years, yet there are a lot of web-developers who are unaware of the reasons for having a robots.txt file in the root of their websites.

    There have been many rumours around whether the bigger search engine crawlers (i.e. Googlebot) consider your website amateurish if you didn't have a robots.txt - and if handled badly, could lead to your site being invisible on SERPs.

    If you are happy for a crawler to crawl all of your website's content, then you can use the following:

    User-agent: *
    Disallow:

    However, when using Umbraco to power my websites, it is preferable to define which folders are accessible by the crawler. Personally, I would not like to see the contents of my /umbraco/ folder to be returned in Google's SERPs.

    Here is an example of the robots.txt that I have used on several Umbraco-powered websites.

    # robots.txt for Umbraco
    User-agent: *
    Disallow: /aspnet_client/
    Disallow: /bin/
    Disallow: /config/
    Disallow: /css/
    Disallow: /data/
    Disallow: /macroScripts/
    Disallow: /scripts/
    Disallow: /umbraco/
    Disallow: /umbraco_client/
    Disallow: /usercontrols/
    Disallow: /xslt/

    From my perspective, there is no reason for a search engine crawler to be indexing files from any of the above folders - you may have a different perspective, to which you can amend your robots.txt accordingly.

    For more information about the robots.txt standard, please refer to the official website: http://www.robotstxt.org/robotstxt.html

    However, it should be noted, that exposing folders on your website that you do NOT want to be indexed, also tells robots that they exist, posing a potential security risk.