Bloggers who publish with WordPress should take special care to optimize their sites for search engines by uploading a custom robots.txt file. robots.txt is a text file that contains instructions for search engines concerning what pages should and should not indexed.
You may be wondering why you should prevent search engines from indexing certain parts of your blog. After all, isn’t getting your posts ranked the point of SEO? In this case, not necessarily. An important part of Search Engine Optimization is making sure you are not publishing duplicate content. Search engines don’t like duplicate content, and therefore it hurts your rankings.
Where do your words go?
You must be aware of where your words end up after you hit the “Publish” button. Think of all the different places a post will appear within your WordPress blog. There are permalinks, category archives, dated archives, search results, and possibly much more depending on your chosen plug-ins. If the entire text of your post appears on its own permanent page, then on a category archive page and a dated archive page, those are considered three separate copies of the same post — in other words, duplicate content.
My approach is to direct the search engines to the permanent links for each of my posts. This is the “permalink” page, referred to as the “single post” page by WordPress. Categories and search results are tools for my visitors, not the search engines. I also disallow my RSS feeds from being indexed, since they are full text feeds.
robots.txt goes in the root directory of your site, alongside index.php. Here are some robots.txt rules that implement these suggestions for Of Zen and Computing’s URL structure:
user-agent: *
disallow: */feed/
disallow: /zcategories/
Here is my actual robots.txt file. In addition to directives for controlling WordPress, it also has a few lines that tell search engines how to handle the forums.
More tips
For more tips on optimizing robots.txt for WordPress, check out Shoemoney’s suggestions. And keep in mind that like Shoemoney, I am not an SEO. I’ve just been using this method for a while, and it has worked well for me. If anyone has anything to add — especially if you know a better way to handle feeds, feel free to comment.





1 response
April 24th, 2008
Ross says:
The robots.txt file is important for sure. But I’ve found the single most helpful thing to getting your stuff indexed is a valid sitemap.xml submitted on a regular basis to Google, Yahoo etc.
Leave a Comment