Breaking Down Google Sitemaps XML

Since Danny Sullivan already covered the overview of Google Sitemaps, I’m going to take some time to explain the Sitemap protocol.

What is the Google Sitemap Protocol?

The Google Sitemap Protocol allows you to tell Google what URLs on your web site is ready to be crawled. The Sitemap contains a list of URLs and some meta data about the URL such as when they were last modified, how frequently the content changes, and the priority of the page relative to other pages.

The Google Sitemap is in an XML format using some very simple XML tags. So if you know how to alter HTML files, you will be fine with Google Sitemaps. XML is a bit more strict than HTML, so you will need to remember to encode all your data values (fix those &’s!).

What does a Google Sitemap look like?

A Google Sitemap uses 6 XML tags:

  • changefreq
  • lastmod
  • loc
  • priority
  • url
  • urlset

This is how part of my Google Sitemap looks:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.google.com/schemas/sitemap/0.84″>
<url>
<loc>http://www.socialpatterns.com</loc>
<lastmod>2005-06-03T04:20:36Z</lastmod>
<changefreq>always</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.socialpatterns.com/new-post/</loc>
<lastmod>2005-06-02T20:20:36Z</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>

What does that all mean?

I’ll explain each section so you can get a better idea of what each line means.

<?xml version=”1.0″ encoding=”UTF-8″?>

This defines the file as XML and sets the correct UTF-8 encoding.

<urlset xmlns=”http://www.google.com/schemas/sitemap/0.84″>

This line defines the xml schema that the rest of the file will be using. Basically this is equivalent to the <html> tag.

<url>

This starts a URL entry. For each URL that you will include in your sitemap, you will need this tag and its closing tag.

<loc>http://www.socialpatterns.com</loc>

This tells Google what URL to crawl. One thing to note here, the URL must be less than 2049 characters.

<lastmod>2005-06-03T04:20:36Z</lastmod>

This tells Google when this URL’s content was last modified. This helps Google determine how recent this URL is. The time needs to follow ISO 8601 format.

<changefreq>always</changefreq>

This tells Google how often this URL is updated. You can define this as (always), (hourly), (daily), (weekly), (monthly), or (yearly).

<priority>0.8</priority>

This shows Google how important this URL is compared to the rest of the URLs. This can range from 0.0-1.0. The higher the number, the more priority you are assigning.

</url>

This is the closing tag for one URL entry. If you have more URLs you want to include, you would repeat the above for as many URLs as you want.

</urlset>

This closes off your URL set and finishes up your Google Sitemap.

Ok so how do I submit the darn thing?

After you are done creating your Google Sitemap, you will need to submit it. First head over to the Google Sitemaps homepage and sign in (you’ll need a Google Account). After logging in you will be taken to a screen that looks like this.

Click on Add a Sitemap.

Enter in the URL to your Google Sitemap and click Submit URL. And that’s it!

Update: Added line about UTF-8 encoding. Also, if you need example code for one head over to my Google Sitemaps with WordPress article.

Recent Posts

Leave a Comment