Why Cleaning URLs with HTACCESS is Necessary?
In many cases, URLs on a website are appended with various symbols, leading to multiple versions of the same content being accessible under different links. This creates duplicate content, which can harm your website’s SEO ranking.
Who Causes This?
- Malicious Actors: Competitors or attackers may intentionally create problematic links to harm your website.
- Web Crawlers: Internet bots and crawlers may inadvertently generate invalid links while scanning your website.
- Misconfigurations: Lack of proper redirects for variations like “www” vs. “non-www” can contribute to the issue.
How Does It Happen?
Some common scenarios that cause this issue:
- Symbols like
?
//
/?/
- Query strings, e.g.,
?id=123&ref=abc
- Inconsistent “www” and “non-www” versions of your site can confuse search engines.
These issues result in duplicate URLs being indexed, flagged as errors in Google Search Console, and harming your site’s credibility.
Solution: Redirecting URLs in HTACCESS
To resolve these issues, you can use the
.htaccess
Redirect All Traffic to “www” Version
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
Redirect All Traffic to “non-www” Version
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [L,R=301]
Improving Feeds and Redirecting Links to the Parent URL
If you’re not using feeds (such as RSS, Atom, etc.), it’s advisable to set up a redirect for these links to the parent URL. This can prevent the creation of excessive links that may impact page rankings and help you maintain a clean URL structure.
Here’s an example of the
.htaccess
# BEGIN Feed redirect
RewriteEngine on
RewriteRule ^(.*/)?feed(/rss|/rss2|/atom|/rdf)?/?$ /$1 [R=301,NC,L]
RewriteCond %{QUERY_STRING} (?|&)feed=
RewriteRule (.*) $1/? [R=301,NC,L]
# END Feed redirect
This code used in
htaccess
Additional Redirects for Clean URLs
Maintaining a clean and optimized URL structure is crucial for your site’s SEO. Below are some useful
htaccess
1. Force End Slash
This rule adds a trailing slash at the end of URLs that don’t have one, helping maintain consistency across your site.
# Force end slash
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
2. Redirect Double Slash to Single Slash
This rule redirects any URLs with double slashes (e.g.,
https://example.com//page
# Redirect double slash to single slash
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ $0 [R=301,L,NE]
3. Redirect index.html to Clean URL
If your site still includes
index.html
index.html
# Redirect index.html to the clean URL
RewriteCond %{REQUEST_URI} /index\.html$
RewriteRule ^(.*)index\.html$ /$1 [R=301,L]
4. Redirect /index.php/ to / (With Slash)
This rule ensures that any URL containing
/index.php/
index.php
# Redirect /index.php/ to / (with slash)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php/
RewriteRule ^index\.php/ / [R=301,L]
5. Redirect index.xml to Clean URL
If your site uses
index.xml
# Redirect index.xml to the clean URL
RewriteCond %{REQUEST_URI} /index\.xml$
RewriteRule ^(.*)index\.xml$ /$1 [R=301,L]
By implementing these
htaccess