For quite a while my VPS was misconfigured – any HTTP requests it got but couldn’t assign to a vhost, it redirected to the main website, avris.it. I didn’t think it would be a big deal, until I recently found out that my post Ungoogling is indexed by Google under https://askara.avris.it/blog/ungoogling
This subdomain hadn’t existed for a long time already, my server doesn’t serve a certificate for it anymore, but it requires HSTS, so browsers end up showing users a scary error message.
I had to do something about it.
Certificate
First of all, users have to see anything other than a security warning. I need a wildcard certificate.
Fortunately, Let’s Encrypt offers them now, and it’s totally free. I just followed an instruction to obtain one, and then configured Apache to serve the /www/default
directory with the *.avris.it
certificate for all requests that don’t fit to any vhost.
Redirect for users
Once the users can see the website, I can show them things. Ideally, just a 404 with an information that it’s not a valid domain, and a suggestion where they might have wanted to go (same request string, but with the base domain). Easy.
Btw, I used Water.css, a ridiculously simple CSS framework – I just added two lines, no classes, and the page already looks way better!
Redirect for bots
But that doesn’t solve the root cause: bots are confused about which domains they should be using. They don’t care whether my certificate is working or not, they don’t understand the message I left there for the users.
They need a proper HTTP 301 -- Moved permanently
. So I had to add a simple recognition whether I’m serving a bot or a user, and adjust the response for each of them.
Source
So, here’s what I ended up with:
.htaccess
DirectoryIndex index.php
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI}::$1 ^(/.+)/(.*)::\2$
RewriteRule ^(.*) - [E=BASE:%1]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^index\.php(/(.*)|$) %{ENV:BASE}/$2 [R=301,L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule .? - [L]
RewriteRule .? %{ENV:BASE}/index.php [L]
</IfModule>
<IfModule !mod_rewrite.c>
<IfModule mod_alias.c>
RedirectMatch 302 ^/$ /index.php/
</IfModule>
</IfModule>
index.php
<?php
function isBrowser($ua): bool
{
if (!$ua) {
return false;
}
$isProbablyBot = (bool) preg_match('#bot|crawler|baiduspider|80legs|ia_archiver|voyager|curl|wget|yahoo! slurp|mediapartners-google|facebookexternalhit|twitterbot|whatsapp|php|python#i', mb_strtolower($ua));
$isProbablyBrowser = (bool) preg_match('#mozilla|msie|gecko|firefox|edge|opera|safari|netscape|konqueror|android#i', mb_strtolower($ua));
return $isProbablyBrowser || !$isProbablyBot;
}
$url = 'https://avris.it' . $_SERVER['REQUEST_URI'];
if (!isBrowser($_SERVER['HTTP_USER_AGENT'] ?? null)) {
http_response_code(301);
header('Location: ' . $url);
die;
}
http_response_code(404);
echo <<<HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>404 – Not found</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="https://avris.it/assetic/gfx/favicon.png" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/kognise/water.css@latest/dist/light.min.css">
</head>
<body>
<h1>
404 – Not found
</h1>
<p>
This is not a valid subdomain.
</p>
<p>
Did you mean <a href="$url">$url</a> ?
</p>
<hr/>
<p>
<small>
You'll be redirected there in <span id="countdown">15</span> seconds anyway... 🤷
</small>
<p>
<script>
const \$el = document.getElementById('countdown');
seconds = 15;
setInterval(_ => {
if (seconds === 0) {
window.location.href = '$url';
return;
}
seconds--;
\$el.innerHTML = seconds;
}, 1000);
</script>
</body>
</html>
HTML;