1. Glossary
  2. Crawlers
  3. Sitemaps

Sitemaps

A sitemap is essentially a resource you create for your site to enable discovery by Google and other search engines. Providing resource metadata in the form of sitemaps is a key way to position your content to appear in search.

As a normal course of business, search engine web crawlers, also known as bots, robots or spiders crawl your pages and other website pages that link to your content to retrieve content. The concern of these crawlers is simply ‘Can I access this content?’ and ‘Is there any further content I can access?’ The content that crawlers retrieve from your site is what forms a given search engine’s index, which is what they use to return search results.

Relying on these ‘natural’ means of discovery can be adequate if you have a simple website and don’t rely on users discovering your site in search results, but not if you’re looking to improve your ranking in the search engine results page (SERP). Taking such a passive approach can cause problems for new content that you are eager to see in search, and obscure content you mark up especially for inclusion in rich results.

A sitemap provides search engine systems a direct list of URLs to your content, improving a crawler’s ability to find your pages as it no longer relies solely on your page’s relationship to other referring pages within your site and on the wider web.

What is a sitemap?

There are two types of sitemaps: user sitemaps and sitemaps for search engines. A user sitemap is a page on a website that helps users navigate it by listing and linking to important areas and content.

User sitemaps can be a useful way for visitors to quickly and easily find the information they need. Ideally, however, your website shouldn’t have to rely on a sitemap to improve overall user experience. Clear user menus and breadcrumb navigation should allow users to navigate content without having to reference a map of the website to orient themselves. User sitemaps tend to be in html format. You may still see a link to a user sitemap on older websites.

Search engines on the other hand, still rely strongly on sitemaps. Since they have limited time (or crawl budget) to index a site, search engine crawlers like Googlebot, Yahoo Slurp or MSNbot read the sitemap file first to more intelligently crawl your site. These sitemaps are in xml format and designed strictly for bots (this means they don’t look particularly pretty). A sitemap can usually be located at:

www.yourdomain.com/sitemap.xml

A search engine sitemap should detail all important URLs and exclude any pages with robots instructions not to index a page.

A search engine sitemap is a highly important aspect of any SEO campaign, because it reduces the chances of search engine crawlers overlooking new or recently updated content. Your sitemap connects the dots from page to page to page, and dictates the hierarchy of each of those pages and the importance of the page relative to other URLs within your site. It’s easiest to think of it as a flowchart:

sitemap-flowchart

For most sites, your navigation and internal links will be more complicated than this, which is why it’s lucky a sitemap doesn’t have to be generated manually—your content management system (CMS) or a number of third-party tools can auto-generate one for you. Still, it’s important to be aware of your site structure: A well optimised information architecture—organisation of subdomains and folders—allows search crawlers to take the path of least resistance through your site.

Your sitemap can also provide valuable metadata associated with the pages you list in your sitemap: Metadata is information about a web page, such as when the page was last updated, how often the page is changed and how it relates to other pages on your site. You can also use a sitemap to provide Google with metadata about specific types of content on your pages, including multimedia content. It’s all about feeding search engine crawlers the information they need to understand your website. The more they understand, the more likely you are to rank highly.

Why do you need a sitemap?

If your website pages are properly linked, search engine crawlers can usually discover most of your site without one. But a sitemap can improve the crawling of your site if your site is really large, contains an archive of content pages that are isolated or not linked well to one another, your site is new and has few external links to it (read more on this in link building), or your site uses multimedia and hard-to-parse content a search crawler might not otherwise be able to ‘read’.

To get an idea of what Google sees from your site perform the following Google search:

Site:yourwebsite.com

By putting “site:” in front of your domain name you will be requesting Google to list the pages Google has indexed for your site. If you see fewer results than you expect, it means there’s something about your site structure that’s blocking the Googlebot from ‘seeing’ your site.

This goes both ways:

Creating a sitemap applies too if there is content on your site you prefer not to register in search. Say, duplicate content that might be considered spammy, or a news page that competes in search with your homepage. If you create a website without providing a sitemap, search engines will attempt to find and index all the content on your site unless you specifically both exclude these from your sitemap and place instructions on the pages to noindex them. The contents of your robots.txt file can also tell crawlers which parts of a website they should and should not visit.

Submitting your sitemap to Google

Sitemaps can be submitted to Google via the Search Console Sitemaps tool. Your agency will either do this for you as part of your SEO campaign or will show you how to do it. This further encourages Google’s crawlers to reference your sitemap when it visits.

Google supports several sitemap formats, including XML, RSS, HTML and text files, but it must be UTF-8 encoded. You can submit multiple sitemaps and/or sitemap index files (if your site is large and requires multiple sitemaps).

Using a sitemap doesn’t guarantee search engines will crawl and index all the items as and how you tell them to, but your site will benefit hugely and you’ll never be penalised for having one.