XML sitemap

What is the sitemap or sitemap.xml?

A sitemap is a file where you provide information about the pages, videos, and other files on your site and their relationships. Search engines like Google read this file to crawl your site more efficiently. A sitemap tells search engines like Google, Bing, DuckDuck and others which pages and files are important to your site and provides valuable information about these files. For example, when the page was last updated and any alternate language versions, re-crawl frequency.

You can use sitemaps to show what content the search engine bots should crawl, and as the opposite, if you don’t attach some URLs or files in the sitemap, it indirectly shows what content you don’t want to crawl. Usually, when you mark content with the “noindex” tag, it is not present in the sitemap.

An XML sitemap can help speed up search engine crawlers’ content discovery and indexing even when internal URL structure and navigation don’t point to them, making them hard to find.

Google was the first search engine that introduced Sitemaps 0.84 in June 2005. In November 2006, Google, Yahoo! and Microsoft announced joint support for the Sitemaps protocol. Sitemaps.org changed the schema version to Sitemap 0.90, and it is valid until today. Sitemap 0.90 protocol is offered under the Attribution-ShareAlike Creative Commons License and has wide adoption, including Google, Yahoo!, Bing, Baidu, Yandex, DuckDuckGo, Ask.com and others.

What markup language is used for website sitemap?

An XML Extensible Markup Language sitemap is a file that lists all or selected website pages, making sure search engines can find and crawl them all. XML Sitemaps are easy for web admins or website admins to inform search engines about pages on their sites available for crawling.

The XML Sitemaps protocol is based on ideas from “Crawler-friendly Web Servers”, with improvements including auto-discovery through robots.txt and the ability to specify the priority and change frequency of pages.
An XML sitemap can include information about pages, videos, and other files on your website and the relationships between pages and other pieces of content. An XML sitemap can be divided into separate XML maps for different types of content, for example:

  • posts (articles)
  • news
  • pages
  • media
  • video
  • features
  • other

When there are separate sitemaps for the content type or others, the best practice is to create a map index that links to particular maps.

XML sitemap index example

The Sitemaps protocol has been created based on ideas from “Crawler-friendly Web Servers,” with a few improvements. One of the meaningful improvements was the auto-discovery of an XML sitemap using a robots.txt file. Another significant improvement was the ability to specify the crawl priority and change the frequency of pages. You can find protocol details on the sitemaps.org page and learn more about sitemaps on the Google Developers website.

You can generate an XML sitemap manually using one of many tools (also online tools) and publish it to the main website folder via FTP. You can also prepare it manually in any text editor or spreadsheet, but you need to ensure that the file you created follows the rules published on sitemaps.org.
WordPress users can use several plugins to automatically generate an XML sitemap, set up priorities for selected types of content, and include or exclude from indexing and other features.

You can also check what is the HTML sitemap.