When Panda hit, thin or sub-optimal (read duplicate or not useful) pages became liabilities. Sites have spent the last 3 years either updating, deleting or consolidating their content.
Big publishers and those dependent on search engines for revenue have been aggressive about cleaning out their inventories. I’ve also found that regular bloggers — either business or personal — have done a good job of assessing their content, but many sites are still carrying dead weight despite their best efforts.
While everyone rushed to remove or refine pages, many sites ignored their taxonomy and hierarchies or did cosmetic work to bulk them up. This means that a lot of weak content (from Google’s perspective) could still be on your site.
Category & Tag Pages Are Content Too
It makes sense that taxonomy pages would be ignored by some. We spend a lot of time on adding content, but organizing is usually done when sites launch and only re-addressed during redesigns. You also have to think about prioritization: ecoms depend on category structure while publishers often get the most hits on posts. It makes sense that you’d go make sure that specific post/entries were in good shape before even thinking about your taxonomy/hierarchy. Last, is the issue of visibility: while top categories can be seen in navigation, tags are often only seen as nondescript links in headers or footers of posts; and sub-categories can languish unless a decent content strategy is in place.
The problem with ignoring categories and tag pages is that they are places where duplication can naturally occur AND new ones have a tendency to sprout as different authors an admins join a site.
Unintential Duplication of Categories & Tags
When I say duplicate categories I’m talking about the duplication of content in them or the use of names that are synonymous. There is nothing wrong with having the same content in multiple places if it helps your visitors, but if all the content is the same among categories, then you’re not giving search engines or users clear guidance on content value.
There are 3 common ways that I’ve seen Taxonomy Content Get Duplicated:
- Plurals and Typos: Someone creates a category called “chairs” and then someone creates a category called “chair” or “cahirs”. – We hope that an editor catches this and fixes it, but often it sits — and then gets used repeatedly. You end up with multiple categories with nearly identical names and content.
- Synonyms across taxonomies: You create a category called Human Resources Management, then someone creates a tag called HR Management. – while they are different phrases, they mean the same thing and will often be used interchangeably. Adding to the problem is that well-intentioned writers might choose to use the category and the tag on the same piece of content.
- Over-specialization: Let’s imagine you interview an author or review of their new book. You use books for the category, but then you create tags for the authors name and book title. The category level is fine, but the tags have only one piece of content and it’s exactly the same.
All 3 of these issues are easy to avoid, but they are common if you have a multi-author site with weak permissions/user role rules or if you are creating categories, tags or terms without an over-arching content strategy.
Duplicate & Thin
You might have noticed that the last example leads to both duplicate content and thin content. This is the type of thing that Google targets by default: it’s bad site architecture and useless to visitors. Don’t freak out, but you should be looking to fix it. If it only happens in a couple of places, then it might not even matter. But if you know you’ve got lax editorial control, then it’s time to take a closer look.
With the other 2 examples, you might “get away with it” (avoid any type of penalty). But you’re basically competing against yourself on plurals and synonyms. Search engines will see that the categories are the same for all intents and purposes, then either discount both or treat one as less valuable (this usually depends on the link profile).
Casually Adding to Your Taxonomy
Sometimes you write a post and then realize that no category or tag really fits. This even happens to me since I will sometimes rant or digress from marketing or project management topics. You end up with content that doesn’t fit the site, and you decide to add a tag. What’s the big deal?
With categories, maybe you’re trying to boost your SEO by going after a new phrase or maybe you’ve realized that there are some very popular sub-topics that aren’t getting much visibility. So you add a category and move some content into it. Again, what’s the big deal?
The Big Deal: You Are Creating Empty, Ugly Holes in Your Site
Best case scenario, you start writing more to fill the new categories and tags. This is possible and maybe likely if you’re feeling good about the new topics. But what if you don’t write anything for these areas again? Or, if you write so sporadically that Google only sees an update once per quarter?
Google crawling of any site is finite. You create new areas to crawl and Google is going to spend time there. If the category or tag doesn’t have value to you or the search engine, then why waste a finite resource. You also have to consider how it looks that you are creating new thin content: how should Google interpret a site that regularly adds new pages with very little on them. Doing this occassionally shouldn’t hurt (it happens to everyone), yet if it happens regularly, then you start looking like a content mill chasing long-tail rankings. Google isn’t your friend and it doesn’t give sites the benefit of the doubt: if your site starts looking spammy, then expect a decline in traffic even if you don’t officially get a penalty.
Keeping Your Taxonomy Tight
The issues I’ve mentioned can be avoided by using tighter editorial guidelines, user permissions and a little self-discipline.
- Create a rigid hierarchy/information architecture for your site: Decide what your categories will be, what tags will be and how you will decide to add or remove them.
- Restrict the ability to add categories and tags: You might already be doing this by default, but user roles can be might have more rights than they need. Depending on your CMS, you might have editors or authors that can create new taxonomy items.
- Use an Editorial Calendar: Not only will it help you improve your production, it will help you track category/tag/topic usage and you’ll know what to use before you start writing. You can get a great one by Lee Odden over at his book site.
- Setup Workflows: Make sure that content can’t go live until someone with a grasp of your taxonomy has approved the content.
- Track traffic to taxonomy pages: Build a report that filters for the taxonomy pages. This might be tricky if there are no obvious identifiers in your URLs (e.g. /category/), but it’s worth it. Try using page titles, parameters in URLs or a tag injection to get the necessary reporting.
How to Find Existing Taxonomy Issues
- XML Sitemaps: Depending on your platform, you might have a sitemap of tags already. Pull this into Excel and then sort Ascending (a-z) to see if you’ve got duplicates.
- Export your article database: If the info is spread across multiple tables, then you may have to figure out exactly what info you need (try url, title to start). If you can get article counts with this export, then life will be much easier.
- Manually review your taxonomy lists in your admin: You can do this in almost every CMS and it will usually show you item counts too. The only problem is that these pages aren’t always that easy to navigate. Be prepared to do a lot of clicking around or cut n’ past each page into word or Excel to read more easily.
- Compare your taxonomy lists to your analytics reports: You can start with a Vlookup to get them side by side with performance data. Then look for ones that have no traffic. Note that you’ll want to use a long date range, since some areas won’t get traffic regularly.
- Crawl the Site: Xenu Link Sleuth and Screaming Frog are both great tools. SEO tools like Moz also have built-in crawlers. Use the reports the same way you’d use the XML file or dbase export. Sort, then review for synonyms or duplicates.
- Compare reports for different taxonomy types: Pull everything into Excel, then clean the URLs down to their last part. You should then be able to compare the 2 (or more) lists and spot places where you’ve got a category and a tag with the same name.
How to Fix Existing Taxonomy Issues
If you’re a masochist, then you can spend a lot of time manually deleting categories and moving content around. But you’re better off using a tool — either something built-in or an add-on. Here are a few I know about.
SiteCore: Taxonomy Module
I’ve heard that the latest version build in more user-friendly taxonomy management, but this module is what I know. You’ll need to be familiar with item relationships to use this, but it’s got several nice features: shows related tags/categories, you can weight tags, live edit them in Editor view.
Drupal: Manage Multiple Terms
This tool gives you a straightforward view of your terms and allows you to easily edit, delete and create. As always, know what version of Drupal you’re using being installing.
WordPress: Term Management Tools
I use this term quarterly to clean things up on my site. Allows you to merge, delete, switch taxonomy and rename from the category and tag pages of WordPress.
SharePoint: Use the Built-in Tools
This may seem strange, but I think the OOB term management tools in SharePoint 2013 are your best bet. Terms and Enterprise keywords are used all over the place and you’ll want a firm grasp of your term sets whenever you mess with anything. Try this article.
Architecture is Opportunity
How you build and maintain your site hierarchies and taxonomies, and how you organize info in general is a key factor in how easily search engines crawl your site, determine relevance and rank you. And this is just as important for users as it is for search engines. A well architected site is easier to navigate and lets you leverage tools like Related Articles and site search better.
Taking the time to clean out duplicate or thin taxonomy types will payoff for you, as will giving consistent attention to architecture.
Featured image from Wiki Commons
Double Impact Poster from AlphaCoders
Black Hole image from UniverseToday (the article is very cool).