How to IP Ban a Bot

Published on • Updated on

Disclaimer: I'm not a security hardening expert. I am aware of basic security principles, but this advice may or may not be best practice. This post is just about how I stopped a handful of bots and other unwanted clients from accessing my server.

I found so many bots in my nginx logs

So, I'm sitting in my favorite cafe on a chilly September evening. I decide to check my email, as one does before starting any real work. There's new messages! ... But it's all spam. Sigh.

I decided to go check my web server logs to see if there was any suspicious activity there, too: connections with weird user agent strings, bots or crawlers I didn't want, exploit attempts, stuff like that. Of course, I found all three, in decently large quantities:

...
194.38.20.13 - - [22/Sep/2024:01:58:43 +0000] "GET /admin/php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 162 "-" "ALittle Client" @afterlight.dev
...
5.188.86.25 - - [22/Sep/2024:02:32:37 +0000] "GET /.git/config HTTP/1.1" 500 42 "-" "Go-http-client/1.1" @164.90.154.173
...
51.222.253.13 - - [22/Sep/2024:03:04:57 +0000] "GET /repository/crucible/commit/f19ac218c5cca52179a7db77cf3e8b38d9b7b036/blob/executables/identify/CMakeLists.txt HTTP/1.1" 200 957 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)" @git.echowritescode.dev
...

Fortunately, these clients had to tell me their IP addresses if they expected my machine to respond to them. So I had enough information to cut them off at the source.

Blocking IP addresses with ufw

Basic security says to block Known Bad Stuff™ as close to the network boundary as possible. nginx possesses the ability to deny individual IP addresses, and some bots might respect a robots.txt, but a much better solution is to tell unwelcome guests that not only is nobody home, there isn't even a home here.

The best way to accomplish this is with a firewall. Firewalls are pretty simple in concept: based on a list of rules you specify, connections to your machine will be allowed or disallowed at the kernel level. In practice, configuring a firewall can be very complicated, but blocking IP addresses is usually not too hard.

Since I'm using a DigitalOcean droplet with Ubuntu 22.04 installed, the firewall I have is ufw (which is really a wrapper around iptables, the Linux kernel's firewall). Here's how to block an IP with ufw:

$ sudo ufw insert 1 deny from <the ip you want to block> comment '<the reason you want to block it>'

There are a few parts to this command:

Bad and good things about this solution

Blocking IPs this way is a very brute-force tool. Arguably, if you're working on a service that needs to be highly visible, you can't really afford to block every IP that annoys you, because some of them might still access your site legitimately later. It's not unlikely at all for the same host that keeps trying to GET /eval.php?command=/bin/bash to be a real user that just has a virus on their computer, or is behind a router or VPN with another user that does, or is a legitimate server with a floating IP.

It also scales very poorly if you're dealing with a very high ratio of unwanted traffic to legitimate users. If I'm coffee'd up and in the zone, I can block maybe 10 IPs by hand in about 2 minutes. There are over 4 billion IPv4 addresses; even 1% of that is a lot of trips to the coffee shop.

On the other hand, it's good to have a hand on the root shutoff switch for any public-facing service. Knowing how to say No, Stop It to requests that you don't want means that ultimately, you decide who and what is allowed to access you and your work.

It's also a dead simple solution that doesn't require setting up any other services or keeping any data besides the comments in your firewall rules. If you're only dealing with a small number of persistent delinquents, Just Block Them is a neat, tidy, uncomplicated answer.

For my server, which at the moment is just a small machine hosting a handful of services for me and my friends, I decided this solution was exactly the right level of effectiveness for how lazy it allowed me to be. (That may change in the future, in which case, expect an article about how to set up fail2ban or something similar.)

Addendum: configuring nginx to log hostnames

A minor annoyance about nginx is that its default combined log format, used for access.log, doesn't show any information about the virtual host that handled the request. Here's a simple configuration change you can apply to /etc/nginx/nginx.conf to add the hostname to the end of the log line:

http {
	...

	log_format combined_with_host
		'$remote_addr - $remote_user [$time_local] '
		'"$request" $status $body_bytes_sent '
		'"$http_referer" "$http_user_agent" '
		'@$host';

	access_log /var/log/nginx/access.log combined_with_host;

	...
}

Note that you can't apply this to error.log too; the error_log directive doesn't accept custom formats.