I added these in nginx.conf:
map $http_user_agent $blocked_user_agent {
default 0;
"~*AI2Bot" 1;
"~*AI2Bot-Dolma" 1;
"~*Amazonbot" 1;
"~*anthropic-ai" 1;
"~*anthropic.com" 1;
"~*Applebot" 1;
"~*Applebot-Extended" 1;
"~*AwarioBot" 1;
"~*AwarioRssBot" 1;
"~*AwarioSmartBot" 1;
"~*Bytespider" 1;
"~*CCBot" 1;
"~*ChatGPT-User" 1;
"~*ClaudeBot" 1;
"~*Claude-Web" 1;
"~*cohere-ai" 1;
"~*cohere-training-data-crawler" 1;
"~*DataForSeoBot" 1;
"~*Diffbot" 1;
"~*DuckAssistBot" 1;
"~*FacebookBot" 1;
"~*FriendlyCrawler" 1;
"~*Googlebot-Extended" 1;
"~*Google-CloudVertexBot" 1;
"~*Google-Extended" 1;
"~*GoogleOther" 1;
"~*GoogleOther-Image" 1;
"~*GoogleOther-Video" 1;
"~*GPTBot" 1;
"~*iaskspider/2.0" 1;
"~*ICC-Crawler" 1;
"~*ImagesiftBot" 1;
"~*img2dataset" 1;
"~*ISSCyberRiskCrawler" 1;
"~*Kangaroo Bot" 1;
"~*Meltwater" 1;
"~*Meta-ExternalAgent" 1;
"~*Meta-ExternalFetcher" 1;
"~*OAI-SearchBot" 1;
"~*Omgili" 1;
"~*Omgilibot" 1;
"~*openai.com" 1;
"~*PanguBot" 1;
"~*peer39_crawler" 1;
"~*PerplexityBot" 1;
"~*PetalBot" 1;
"~*Scrapy" 1;
"~*Seekr" 1;
"~*SemrushBot" 1;
"~*SemrushBot-OCOB" 1;
"~*Sentibot" 1;
"~*Sidetrade indexer bot" 1;
"~*Timpibot" 1;
"~*TurnitinBot" 1;
"~*VelenPublicWebCrawler" 1;
"~*webmeup-crawler.com" 1;
"~*Webzio-Extended" 1;
"~*YouBot" 1;
}
and then in each site's config: location / {
if ($blocked_user_agent) {
access_log /var/log/nginx/blockedbot.log ncsa;
return 401;
}
But it's far from perfect. For better results, https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blo... is probably better, but it was a tad too much for my needs.