使用 Awk/Python 分析access.log：统计每分钟访问次数和单个IP访问总次数

cdn带宽告警，拉取日志分析

分析每分钟访问次数

Shell分析access.log

awk '{timestamp = substr($4, 14, 5);ip_count[timestamp]++}END{for (time in ip_count){print time,ip_count[time]}}' access.log|sort -rn -k2

substr(string, start, length)

string：要提取子字符串的原始字符串。我这里取第四（$4）列
start：开始截取的位置索引（从1开始计数）。
length：可选参数，要截取的字符串的长度

在日志时间戳的例子中，时间戳格式为02/Aug/2023:15:00:01，使用substr($4, 14,5)截取的结果就是15:00，其中包含小时、分钟。如果想取秒不要限制长度即可substr($4, 14)

ip_count[timestamp]++ 表示使用 timestamp 作为键，在 ip_count 数组中记录每个时间戳的出现次数

sort -rn -k2 命令将会按照第二列的数值进行逆序排序

sort：排序

-r：按逆序进行排序，即从大到小排序。如果不使用该选项，则默认按照升序（从小到大）排序

-k2: 表示按照第二列进行排序

关于awk命令详细解释请看

运行结果如下

image-3 - zhpengfei.com — 添加图片注释不超过 140 字可选

Python分析access.log

python分析access.log，也能得到相同的结果，没做排序处理

#!/usr/bin/env python import re from collections import defaultdict  log_file = "access.log" ##日志格式 ##111.30.225.236 - - [02/Aug/2023:15:00:01 +0800] "GET https://xxx.xxx.com/32/74/17/32741735-2cff2d322s6213d7733b2a964972503bc983.jpg" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36" "image/webp" 0 Hit "U/200, G/200" Static "max-age=691200" 0.003 117.161.124.67  ip_count = defaultdict(int) with open(log_file, 'r') as file:     for log_line in file:         match = re.search(r'\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2}) \+\d{4}\] ".*?" (\d+)', log_line)         if match:             timestamp = match.group(1)             ip = log_line.split()[0]             minute_timestamp = timestamp[12:17]  # 提取时间戳中的分钟部分             ip_count[minute_timestamp] += 1      print("每分钟访问次数:")     for minute_timestamp, count in ip_count.items():         print(f"{minute_timestamp}: {count}")

运行结果：

统计单个ip访问次数

shell分析

awk '{print $1}' access.log|sort|uniq -c|sort -rn|head

ip count

这个ip访问快达11w次，果断封掉

python分析

#!/usr/bin/env python  from collections import defaultdict  # 创建一个字典，用于统计每个 IP 地址的访问次数 ip_count = defaultdict(int)  # 读取 access.log 文件 with open('access.log', 'r') as file:     for line in file:         # 日志文件的 IP 地址位于第一列         ip = line.split()[0]         ip_count[ip] += 1  # 按照访问次数进行倒序排列 sorted_ips = sorted(ip_count.items(), key=lambda x: x[1], reverse=True) #lambda 函数 lambda x: x[1] 接收一个输入 x（一个元组），并返回元组的第二个元素（x[1]），即从 ip_count 字典中获取的 IP 地址的访问次数 # 输出排名结果 for ip, count in sorted_ips:     print(ip, count)

运行结果也是一样