Sites have been using the rel=canonical link element for a while now to help offset duplicate content issues that cannot be eliminated through URL consolidation.
While not a perfect solution it is often the best option available in cases like duplicate content caused by appending tracking codes to URLs (e.g. ?source=rss, ?xid=newsletter, ?nav=home).
Marketing teams like to track referrals from newsletters, RSS, partnerships, internal navigation, etc. and they typically don’t want to give that up. So rel=canonical has been a welcome tactic and it works relatively well at least for Google (c’mon Bing, you can do it!).
But you still end up with lots of inbound links to non-canonical URLs because users will naturally link to and share the URLs they see in their browsers. While rel=canonical in theory sorts out the indexation and link popularity issues it doesn’t work perfectly, and not every engine supports it or processes it effectively.
Furthermore, if a link coded with “newsletter” gets shared widely on Twitter or somewhere else the data provided by that tracking code is no longer an accurate representative of newsletter traffic.
Here’s an idea: why not let browsers help out?
For pages with a rel=canonical link element, browsers could display the canonical URL instead of whatever duplicate URL the user may have clicked on to arrive at the page. The referral to the coded URL would still be passed to the site’s analytics software but users would see the canonical URL. Thus any resulting new links would point to the canonical URLs.
Users would get cleaner, simpler, consistent URLs and over time there would be fewer links to duplicate URLs and less work for the engines to do in processing duplicate content.
There may be technical reasons why this wouldn’t work and it could only be applied when the canonical URL is on the same domain (though cross-domain implementation of rel=canonical is still rare). Plus not all causes of duplicate content are as straightforward as tracking codes so there might be situations in which displaying the canonical URL in the browser could create a confusing user experience. But it seems like an interesting idea.
Google could test it with Google Chrome to see if it can help itself out; Microsoft could do the same with Internet Explorer to ease the burden on Bing. And Firefox, Safari and other browsers could follow suit just to be helpful.
Michael Dub says
Great idea- would make the data much cleaner. Now to get the browsers to cooperate
Adam Sherk says
Thanks, it’s probably not possible but it seems like a good idea and it would certainly simply things with all those coded URLs. Browser cooperation? Sure, no problem 🙂
I think this is a great idea… Gets my vote! Incedently, didn’t Matt Cutts ask for ideas to make Google better – I think you should make him aware of this one because it’s good (and probably better than the majority of crud he must be recieving right now!).
Adam Sherk says
Thanks Amelia. I almost added this idea in the comments of that post, but I wasn’t sure it qualified as a “big idea.” But relative to browsers and dealing with canonical URLs, I suppose it is.
Tad Chef says
Yes, they should, at least clean up the obvious nonsense by third parties like Feedburner. Until they do I created a small script for bloggers to remove unwanted additions in their URLs:
I don’t think it matters too much whether canonical urls are show in the address bar or not. We extensively use canonical urls for petzooey on every geo-location home page. As long as the content isn’t duplicated, and a user is cookied for that canonical it shouldn’t make much of a difference.
Adam Sherk says
Thanks for the script suggestion Tad.
Kevin, glad you have a solution that is working for you. Publishers tend to add tracking parameters to URLs quite a bit so it’s something that news and content sites are constantly trying to deal with. Rel=canonical has been helpful but it would be nice to be able to reduce links to non-canonical URLs.