Chapter 4: Disk I/O Analysis — Deep Dive
Overview
Disk I/O is often the bottleneck that masquerades as other problems. When a database is slow, it's usually waiting for disk. When an application has high latency, it might be a log file write blocking the event loop.
melisai's DiskCollector (internal/collector/disk.go) uses two-point sampling of /proc/diskstats and enriches the data with sysfs device properties.
Source File: disk.go
- Lines: 174
- Functions: 6
- Data Sources:
/proc/diskstats,/sys/block/*/queue/
Function Walkthrough
Collect() — Two-Point I/O Sampling
func (c *DiskCollector) Collect(ctx context.Context, cfg CollectConfig) (*model.Result, error) {
sample1 := c.readDiskStats()
// Wait 1 second
sample2 := c.readDiskStats()
// Compute deltas for each device
for name, s2 := range sample2 {
s1, ok := sample1[name]
dev := model.DiskDevice{
Name: name,
ReadOps: int64(s2.readOps - s1.readOps),
WriteOps: int64(s2.writeOps - s1.writeOps),
ReadBytes: int64(s2.readBytes - s1.readBytes),
WriteBytes: int64(s2.writeBytes - s1.writeBytes),
IOInProgress: int64(s2.ioInProg),
IOTimeMs: int64(s2.ioTimeMs - s1.ioTimeMs),
WeightedIOMs: int64(s2.wIOTimeMs - s1.wIOTimeMs),
}
// Enrich with sysfs data
dev.Scheduler = c.readScheduler(basePath)
dev.QueueDepth = c.readQueueDepth(basePath)
dev.Rotational = c.readFile(...) == "1"
dev.ReadAheadKB, _ = strconv.Atoi(c.readFile(...))
}
}
readDiskStats() — Parsing /proc/diskstats
func (c *DiskCollector) readDiskStats() map[string]diskStatsRaw {
// /proc/diskstats has 14+ fields per line
// Fields: major minor name reads_completed reads_merged sectors_read
// time_reading writes_completed writes_merged sectors_written
// time_writing ios_in_progress time_doing_io weighted_time_io
readOps, _ := strconv.ParseUint(fields[3], 10, 64)
readSectors, _ := strconv.ParseUint(fields[5], 10, 64)
// ...
result[name] = diskStatsRaw{
readBytes: readSectors * 512, // sectors are always 512 bytes
// ...
}
}
The 512-byte sector convention: Regardless of the actual disk sector size (512 or 4096 bytes), /proc/diskstats always counts in 512-byte sectors. This is a Linux convention.
readScheduler() — I/O Scheduler Detection
func (c *DiskCollector) readScheduler(basePath string) string {
// /sys/block/sda/queue/scheduler
// "[mq-deadline] kyber bfq none" ← active scheduler in brackets
if idx := strings.Index(data, "["); idx >= 0 {
end := strings.Index(data[idx:], "]")
return data[idx+1 : idx+end] // "mq-deadline"
}
}
I/O Schedulers explained:
| Scheduler | Best For | How It Works |
|---|---|---|
mq-deadline |
General purpose, databases | Guarantees max latency per request |
bfq |
Desktops, interactive | Fair bandwidth sharing between processes |
kyber |
Fast SSDs | Minimal overhead, latency targets |
none |
NVMe, fast SSDs | No scheduling, direct dispatch |
Tuning rule: NVMe drives should use none or kyber. Rotational drives benefit from mq-deadline.
Key Metrics
IOTimeMs — Utilization
IOTimeMs counts milliseconds during which the device had I/O in progress. Over a 1-second sample:
If IOTimeMs = 1000 (1 second), the disk was 100% busy.
WeightedIOMs — Latency × Depth
WeightedIOMs = sum of (time spent × number of I/Os). It captures both latency and queue depth. High weighted IO time with low IO count = each operation is slow.
IOInProgress — Queue Depth
The number of I/O operations currently in flight. This is a saturation indicator:
| Queue Depth | Assessment |
|---|---|
| 0-1 | Normal |
| 2-8 | Active but healthy |
| 8-32 | Getting saturated |
| > 32 | Heavily saturated |
Diagnostic Examples
Fast NVMe Drive
{
"name": "nvme0n1",
"read_ops": 150,
"write_ops": 3200,
"io_time_ms": 45,
"io_in_progress": 1,
"scheduler": "none",
"rotational": false,
"queue_depth": 1023
}
none scheduler — perfect for NVMe.
Struggling HDD
{
"name": "sda",
"read_ops": 280,
"write_ops": 50,
"io_time_ms": 920,
"io_in_progress": 12,
"scheduler": "mq-deadline",
"rotational": true,
"read_ahead_kb": 128
}