Why Cleaning URLs with HTACCESS is Necessary?

In many cases, URLs on a website are appended with various symbols, leading to multiple versions of the same content being accessible under different links. This creates duplicate content, which can harm your website’s SEO ranking.

Who Causes This?

  • Malicious Actors: Competitors or attackers may intentionally create problematic links to harm your website.
  • Web Crawlers: Internet bots and crawlers may inadvertently generate invalid links while scanning your website.
  • Misconfigurations: Lack of proper redirects for variations like “www” vs. “non-www” can contribute to the issue.

How Does It Happen?

Some common scenarios that cause this issue:

  • Symbols like
    ?
    ,
    //
    ,
    /?/
    can create duplicate versions of the same page.
  • Query strings, e.g.,
    ?id=123&ref=abc
    , are often exploited.
  • Inconsistent “www” and “non-www” versions of your site can confuse search engines.

These issues result in duplicate URLs being indexed, flagged as errors in Google Search Console, and harming your site’s credibility.

Solution: Redirecting URLs in HTACCESS

To resolve these issues, you can use the

.htaccess
file to set up proper redirects. Here’s how:

Redirect All Traffic to “www” Version

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]

Redirect All Traffic to “non-www” Version

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [L,R=301]

Improving Feeds and Redirecting Links to the Parent URL

If you’re not using feeds (such as RSS, Atom, etc.), it’s advisable to set up a redirect for these links to the parent URL. This can prevent the creation of excessive links that may impact page rankings and help you maintain a clean URL structure.

Here’s an example of the

.htaccess
code to redirect feeds to the parent URL:
# BEGIN Feed redirect
RewriteEngine on
RewriteRule ^(.*/)?feed(/rss|/rss2|/atom|/rdf)?/?$ /$1 [R=301,NC,L]
RewriteCond %{QUERY_STRING} (?|&)feed=
RewriteRule (.*) $1/? [R=301,NC,L]
# END Feed redirect

This code used in
htaccess
helps ensure that feeds do not create separate URLs, directing both users and crawlers straight to the main version of the page. This is a good step to make sure you don’t have duplicate content and to maintain a clean URL structure.

Additional Redirects for Clean URLs

Maintaining a clean and optimized URL structure is crucial for your site’s SEO. Below are some useful

htaccess
rules you can implement to ensure your URLs are properly formatted, avoiding unnecessary redirects or duplicate content.

1. Force End Slash

This rule adds a trailing slash at the end of URLs that don’t have one, helping maintain consistency across your site.

# Force end slash
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]

2. Redirect Double Slash to Single Slash

This rule redirects any URLs with double slashes (e.g.,

https://example.com//page
) to the clean version with a single slash, preventing duplicate content and issues with search engines.
# Redirect double slash to single slash
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ $0 [R=301,L,NE]

3. Redirect index.html to Clean URL

If your site still includes

index.html
in URLs, this rule will remove it and redirect visitors to the cleaner URL structure (without
index.html
).
# Redirect index.html to the clean URL
RewriteCond %{REQUEST_URI} /index\.html$
RewriteRule ^(.*)index\.html$ /$1 [R=301,L]

4. Redirect /index.php/ to / (With Slash)

This rule ensures that any URL containing

/index.php/
will be redirected to the base URL, removing the
index.php
from the URL.
# Redirect /index.php/ to / (with slash)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php/
RewriteRule ^index\.php/ / [R=301,L]

5. Redirect index.xml to Clean URL

If your site uses

index.xml
(perhaps for feeds), this rule will redirect it to the clean version of the page, ensuring that search engines don’t index duplicate content.
# Redirect index.xml to the clean URL
RewriteCond %{REQUEST_URI} /index\.xml$
RewriteRule ^(.*)index\.xml$ /$1 [R=301,L]

By implementing these
htaccess
rules, you’ll be helping to prevent duplicate content issues, improve site performance, and ensure that search engines crawl and index your site properly.