Canonical URLs
Sites created with a CMS system like Umbraco have multiple ways of accessing content. There is usually both a friendly URL and the direct URL where an ID is used to access the content.
It is not always possible (or easy) to control how a search engine like Google, Yahoo or Bing indexes your site, so you never know what version of the URL they have indexed. In some cases both URLs have been indexed resulting in the Page rank being split between the two URLs. To help out the search bots you can tell the bot which version of the URL you prefer.
UPDATE: You can now find a package that automatically inserts NiceUrl as Canonical link, if that is not the link the page has been found through - our.umbraco.org/.../canonical-meta-link-package
To do this you have to specify a canonical URL. This is a tag that you place in your HEAD tag like this:
<HEAD>
<link rel="canonical" href="http://our.umbraco.org/wiki/how-tos/create-canonical-urls-using-xslt-macro" />
</HEAD>
You can read a lt more about this in the Google Official Webmasters blog here:
googlewebmastercentral.blogspot.com/.../...al.html
Inserting this into your Umbraco site
Inserting this tag on all pages of your Umbraco site is quite easy. Just create an Xslt macro using the code below. I have named the Xslt CanonicalNames.xslt.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:Stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
exclude-result-prefixes="msxml umbraco.library">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:template match="/">
<xsl:variable name="url"
select="concat('http://',umbraco.library:RequestServerVariables('HTTP_HOST'))" />
<link rel="canonical" href="{$url}{umbraco.library:NiceUrl($currentPage/@id)}" />
</xsl:template>
</xsl:stylesheet>
I have then created a macro based on the Xslt document called Canonical Names with an alias CanonicalNames.
Then in my master template I insert the macro like this:
<%@ Master Language="C#" MasterPageFile="/umbraco/masterpages/default.master" AutoEventWireup="true" %>
<asp:Content ContentPlaceHolderID="ContentPlaceHolderDefault" runat="server">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>
Canonical Site!
</title>
<umbraco:Macro Alias="CanonicalNames" runat="server"></umbraco:Macro>
</head>
<body >
<form id="MasterForm" runat="server">
<asp:ContentPlaceHolder ID="MasterContent" runat="server"></asp:ContentPlaceHolder>
</form>
</body>
</html>
</asp:Content>
And presto I have defined a canonical URL for every page on my site, and the search engines are happy!
Be aware: this macro works fine for single domain. If you have assign multiple domains to same content. You should replace url variable in xslt file.
Deprecated:
To BeanAnimal: You should redirect your users to only one version of the URL. If you prefer that they use the 'www.' version of the URL, then set up the 'non www' version to redirect to other site. That way you don't have to worry about this either.
BeanAnimal here... I have slightly changed the above XSLT code (I am new to XSLT, so please forgive me if this could have been done eaier) to add "www." to the canonical link if it does not already exist. Part of the idea of the canonical link is to fix the problem of www.beananimal.com and beananimal.com not being seen as the same page by the search engines and thus splitting the page rank. The code posted above takes care of things like www.engineeredbyme.com and www.engineeredbyme.com/home.aspx and www.engineeredbyme.com/default.asp all pointing to the same page, but doe not do anything about the lack of 'www'.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:Stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
exclude-result-prefixes="msxml umbraco.library">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="starts-with(umbraco.library:RequestServerVariables('HTTP_HOST'),'www')">
<xsl:variable name="url" select="concat('http://',umbraco.library:RequestServerVariables('HTTP_HOST'))" />
<link rel="canonical" href="{$url}{umbraco.library:NiceUrl($currentPage/@id)}" />
</xsl:when>
<xsl:otherwise>
<xsl:variable name="url" select="concat('http://www.',umbraco.library:RequestServerVariables('HTTP_HOST'))" />
<link rel="canonical" href="{$url}{umbraco.library:NiceUrl($currentPage/@id)}" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>