{"author":"jahenner","category":"Website","content":"<h2>Malicious Bots</h2>\n<p>It was only a matter of time before I was going to see bots scanning my website. I started to notice some weird traffic going to the server. It wasn't just a single IP that was scanning, so more likely a botnet. I then collected the logs and filtered out only the potential botnet traffic. I created a little python script to analyze the traffic.</p>\n<h3>Python Script</h3>\n<pre><code class=\"language-python\">import datetime\nimport pytz\n\nLOG_FILE = &quot;.local/weird_logs.txt&quot;\nLOCAL_TMZ = &quot;America/New_York&quot;\n\ndef convert_timestamp(timestamp, local_timezone_str):\n    &quot;&quot;&quot;Converts an RFC 5424-like timestamp to local timezone.&quot;&quot;&quot;\n\n    try:\n        dt_utc = datetime.datetime.strptime(timestamp, &quot;%Y-%m-%dT%H:%M:%S.%f%z&quot;)\n    except ValueError:\n        return None\n\n    local_timezone = pytz.timezone(local_timezone_str)\n    dt_local = dt_utc.astimezone(local_timezone)\n\n    return dt_local\n\n\ndef print_ip_hits(ips: dict, uas: dict, speed: dict[str, list[datetime.datetime, datetime.datetime]]) -&gt; None:\n    for ip, hits in ips.items():\n        start, end = speed[ip]\n        total_time = end - start\n        print(f&quot;IP: {ip} requested {hits} pages in {total_time.seconds} and used these {len(uas[ip])} UserAgents:&quot;)\n        for ua in uas[ip]:\n            print(f&quot;\\t{ua}&quot;)\n        print()\n\n\ndef print_paths(paths: set) -&gt; None:\n    print(f&quot;{len(paths)} Unique paths:&quot;)\n    for path in paths:\n        print(path)\n\n\nwith open(LOG_FILE, 'r') as log_file:\n    logs = log_file.readlines()\n\nunique_ips = {}\nunique_uas = set()\nunique_uas_per_ip = {}\nclean_logs = []\nspeed = {}\nunique_paths = set()\nfor log in logs:\n    heroku, msg = log.strip().split(': ')\n    timestamp, app = heroku.split()\n    timestamp = convert_timestamp(timestamp, LOCAL_TMZ)\n    ip, msg = msg.split(&quot; - - &quot;)\n    _, request, status, _, ua = [x for x in msg.split('&quot;') if len(x.strip()) != 0]\n    status = status.strip().split()[0]\n    unique_uas.add(ua)\n    clean_logs.append((timestamp, ip, request, status, ua))\n    unique_paths.add(request.split()[1])\n    unique_ips[ip] = unique_ips.get(ip, 0) + 1\n    unique_uas_per_ip.setdefault(ip, set()).add(ua)\n    if not speed.get(ip, None):\n        speed[ip] = [timestamp]\n    elif len(speed.get(ip, None)) == 1:\n        speed[ip].append(timestamp)\n    else:\n        speed[ip][-1] = timestamp\n\nprint_ip_hits(unique_ips, unique_uas_per_ip, speed)\nprint('-'*60)\nprint_paths(unique_paths)\nprint('-'*60)\nprint(f&quot;{len(unique_ips)} unique IPs and {len(unique_uas)} unique UserAgents&quot;)\n</code></pre>\n<h3>Output</h3>\n<pre><code>IP: 206.189.225.181 requested 40 pages in 3 and used these 4 UserAgents:\n        -\n        Go-http-client/1.1\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA94049) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2343.98 Mobile Safari/537.3\n        Mozilla/5.0 (l9scan/2.0.7383e2233313e2834323e23313; +https://leakix.net)\n\nIP: 143.110.217.244 requested 20 pages in 3 and used these 4 UserAgents:\n        -\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA426033) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.1666.98 Mobile Safari/537.3\n        Go-http-client/1.1\n        Mozilla/5.0 (l9scan/2.0.1373e2135313e23383e29393; +https://leakix.net)\n\nIP: 46.101.1.225 requested 20 pages in 15 and used these 4 UserAgents:\n        -\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA450617) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.1258.98 Mobile Safari/537.3\n        Mozilla/5.0 (l9scan/2.0.9373e27393e223e25373; +https://leakix.net)\n        Go-http-client/1.1\n\nIP: 165.227.173.41 requested 20 pages in 32 and used these 4 UserAgents:\n        -\n        Mozilla/5.0 (l9scan/2.0.130313e2534313e21373e25333; +https://leakix.net)\n        Go-http-client/1.1\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA96271) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.5918.98 Mobile Safari/537.3\n\nIP: 64.227.32.66 requested 40 pages in 3 and used these 6 UserAgents:\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA96271) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.5918.98 Mobile Safari/537.3\n        -\n        Mozilla/5.0 (l9scan/2.0.1373e2135313e23383e29393; +https://leakix.net)\n        Mozilla/5.0 (l9scan/2.0.130313e2534313e21373e25333; +https://leakix.net)\n        Go-http-client/1.1\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA426033) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.1666.98 Mobile Safari/537.3\n\nIP: 138.197.191.87 requested 20 pages in 17 and used these 4 UserAgents:\n        -\n        Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA450617) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.1258.98 Mobile Safari/537.3\n        Mozilla/5.0 (l9scan/2.0.9373e27393e223e25373; +https://leakix.net)\n        Go-http-client/1.1\n\n------------------------------------------------------------\n22 Unique paths:\n/.git/config\n/server-status\n/ecp/Current/exporttool/microsoft.exchange.ediscovery.exporttool.application\n/server\n/telescope/requests\n/.env\n/debug/default/view?panel=config\n/actuator/env\n/\n/s/130313e2534313e21373e25333/_/;/META-INF/maven/com.atlassian.jira/jira-webapp-dist/pom.properties\n/s/7383e2233313e2834323e23313/_/;/META-INF/maven/com.atlassian.jira/jira-webapp-dist/pom.properties\n/.DS_Store\n/s/1373e2135313e23383e29393/_/;/META-INF/maven/com.atlassian.jira/jira-webapp-dist/pom.properties\n/?rest_route=/wp/v2/users/\n/s/9373e27393e223e25373/_/;/META-INF/maven/com.atlassian.jira/jira-webapp-dist/pom.properties\n/login.action\n/info.php\n/_all_dbs\n/about\n/config.json\n/.vscode/sftp.json\n/v2/_catalog\n------------------------------------------------------------\n6 unique IPs and 10 unique UserAgents\n</code></pre>\n<h3>Analysis</h3>\n<p>Looking at the output, we can quickly see some interesting pieces of information. First there were 6 IP addresses associated with these scans that used a total of 10 unique UserAgents. Each of these IPs used 2 of the same UserAgents: &quot;-&quot; and &quot;Go-http-client/1.1&quot;. The &quot;-&quot; UserAgent was associated with the initial request for the IP addresses. Looking closer we notice that each IP uses a UserAgent with &quot;l9scan&quot; found in the string. The UserAgent with the string &quot;HTC One&quot; was also found in at least one request from the IP, where the only difference is the Build number and Chrome version number. IP address 165.227.173.41 and 64.227.32.66 both have instances of the same &quot;HTC One&quot; UserAgent, which suggests that they aren't being programatically created and instead may draw from a list.</p>\n<p>Each bot requested 20 or 40 paths. We can see we have 22 unique paths and with a quick glance of the logs you can tell that each of these IP addresses were requesting the same information. All of those paths typically lead to a vulnerability in different frameworks. For instance <code>/.env</code> is looking for environment variables that the developer may have left exposed (this file usually contains secret keys for encryption). We also had a path traversal vulnerability attempt that is associated with Jira <a href=\"https://nvd.nist.gov/vuln/detail/cve-2021-26086\" target=\"_blank\" rel=\"nopener noreferrer\">CVE-2021-26086</a>: <code>/s/9373e27393e223e25373/_/;/META-INF/maven/com.atlassian.jira/jira-webapp-dist/pom.properties</code>.</p>\n<h3>Mitigation</h3>\n<p>I decided to mitigate these bots myself by feeding them false information. I added a UserAgent check before my backend handles the request:</p>\n<pre><code class=\"language-python\">if any(mal_ua in ua for mal_ua in MALICIOUS_UA):\n            current_app.logger.info(f&quot;Sending dummy data to: {request.remote_addr}&quot;)\n            return (jsonify({&quot;msg&quot;: &quot;Penis&quot;})), 200\n</code></pre>\n<p>Since I am sending an HTTP Response Status of 200, it will taint their databases and waste their time.</p>\n<h2>&quot;Kind&quot; Bots</h2>\n<p>Since I was in the process of dealing with bots, I decided to deal with all the bots that follow the rules of the <code>robots.txt</code> file, such as the Google bot. So I created the <code>robots.txt</code> file and also the <code>sitemap.xml</code> file, which is important for SEO considerations. Instead of making a static <code>sitemap.xml</code>, I decided to have it be built programatically and automatically updated with new information dealing with my blog posts. That way I don't have to update it manually (I'd forget).</p>\n<h2>Conclusion</h2>\n<p>I hate bots.</p>\n","id":67,"image_url":"","summary":"Investigating bot traffic and mitigation solutions","tags":["Web","Website","Bots"],"timestamp":"Thu, 03 Apr 2025 17:11:19 GMT","title":"Me, Myself, I... and Bots?"}
