For quite a while my VPS was misconfigured – any HTTP requests it got but couldn’t assign to a vhost, it redirected to the main website, avris.it. I didn’t think it would be a big deal, until I recently found out that my post Ungoogling is indexed by Google under https://askara.avris.it/blog/ungoogling

This subdomain hadn’t existed for a long time already, my server doesn’t serve a certificate for it anymore, but it requires HSTS, so browsers end up showing users a scary error message.

I had to do something about it.

Certificate

First of all, users have to see anything other than a security warning. I need a wildcard certificate.

Fortunately, Let’s Encrypt offers them now, and it’s totally free. I just followed an instruction to obtain one, and then configured Apache to serve the /www/default directory with the *.avris.it certificate for all requests that don’t fit to any vhost.

Redirect for users

Once the users can see the website, I can show them things. Ideally, just a 404 with an information that it’s not a valid domain, and a suggestion where they might have wanted to go (same request string, but with the base domain). Easy.

Btw, I used Water.css, a ridiculously simple CSS framework – I just added two lines, no classes, and the page already looks way better!

Redirect for bots

But that doesn’t solve the root cause: bots are confused about which domains they should be using. They don’t care whether my certificate is working or not, they don’t understand the message I left there for the users.

They need a proper HTTP 301 -- Moved permanently. So I had to add a simple recognition whether I’m serving a bot or a user, and adjust the response for each of them.

Source

So, here’s what I ended up with:

.htaccess

DirectoryIndex index.php

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{REQUEST_URI}::$1 ^(/.+)/(.*)::\2$
    RewriteRule ^(.*) - [E=BASE:%1]

    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteRule ^index\.php(/(.*)|$) %{ENV:BASE}/$2 [R=301,L]

    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule .? - [L]

    RewriteRule .? %{ENV:BASE}/index.php [L]
</IfModule>

<IfModule !mod_rewrite.c>
    <IfModule mod_alias.c>
        RedirectMatch 302 ^/$ /index.php/
    </IfModule>
</IfModule>

index.php

<?php

function isBrowser($ua): bool
{
        if (!$ua) {
                return false;
        }
        $isProbablyBot = (bool) preg_match('#bot|crawler|baiduspider|80legs|ia_archiver|voyager|curl|wget|yahoo! slurp|mediapartners-google|facebookexternalhit|twitterbot|whatsapp|php|python#i', mb_strtolower($ua));
        $isProbablyBrowser = (bool) preg_match('#mozilla|msie|gecko|firefox|edge|opera|safari|netscape|konqueror|android#i', mb_strtolower($ua));

        return $isProbablyBrowser || !$isProbablyBot;
}

$url = 'https://avris.it' . $_SERVER['REQUEST_URI'];

if (!isBrowser($_SERVER['HTTP_USER_AGENT'] ?? null)) {
    http_response_code(301);
    header('Location: ' . $url);
    die;
}

http_response_code(404);

echo <<<HTML

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>404 -- Not found</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <link rel="shortcut icon" href="https://avris.it/assetic/gfx/favicon.png" />
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/kognise/water.css@latest/dist/light.min.css">
</head>
<body>
 <h1>
    404 -- Not found
</h1>
<p>
    This is not a valid subdomain.
</p>
<p>
    Did you mean <a href="$url">$url</a> ?
</p>
<hr/>
<p>
    <small>
        You'll be redirected there in <span id="countdown">15</span> seconds anyway... 🤷
    </small>
<p>
<script>
    const \$el = document.getElementById('countdown');
    seconds = 15;
    setInterval(_ => {
        if (seconds === 0) {
            window.location.href = '$url';
            return;
        }
        seconds--;
        \$el.innerHTML = seconds;
    }, 1000);
</script>
</body>
</html>
HTML;