Preventing duplicate content
I looked at Google analytics and some other webmaster tools (that Google provides), and saw we had some duplicate content.
Duplicate domain
(domain.com vs http://www.domain.com)
To prevent an entire duplication of a domain I already uncommented the following lines in .htaccess so that the http://www.domain.com and domain.com variations of my website do not show up as duplicates:
Code:
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
You can also disable/prevent this using a google setting in the Analytics settings. (btw; I like the domain.com version better, so I moved that www. portion to the first line which achieves the reverse of the default)
Home page
The home page (in my case called "home.html") is a duplicate from the website root. It is in reality the exact same page except Contao treats it in a special way allowing both urls. To prevent this I added an extra rule to the .htaccess:
Code:
##
# Custom redirect to prevent duplicate content of the home page
##
RedirectMatch 301 home(.*) /$1
I am by no means a htaccess expert, so I'd appreciate anyone confirming this is the best method.
URL parameters
In Google analytics you can define which url get parameters should not be considered part of a unique page url. These parameters can exist as ?foo=bar or /foo/bar variations and on another website this led to dozens of duplicates for any product in a catalog.
Additional duplication
Is there any other duplicate content I should be aware of that can exist? Also, are there better ways to achieve any of the above?
Re: Preventing duplicate content
These is what I use and can be varied to suit your own purposes:
# ENFORCE USE OF WWW
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^[a-z-]+\.(eu|co\.uk|me\.uk|org\.uk)$ [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}$1 [R=301,L]
I use this as the domain does not need to be changed for each site.
Re: Preventing duplicate content
Quote:
Originally Posted by Doublespark
...
# ENFORCE USE OF WWW
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^[a-z-]+\.(eu|co\.uk|me\.uk|org\.uk)$ [NC]
RewriteRule ^(.*)$
http://www.%{HTTP_HOST}$1 [R=301,L]
...
I see, so it is easier to reuse. Good one!
Can't we perhaps rewrite the top level domain to be generic as well? Then I'd never have to change it. Or do I get into trouble because of uk styled domains (.co.uk)? I mean if I'd have xx.nl then it might rewrite http://www.xx.nl to http://www.www.xx.nl if this generic rule fails. And other subdomain cannot be rewritten, so testing or anything not starting with www. is not good either) I'm not sure what regular expressions can do in htaccess??