Skip to content

Chapter 4: Disk I/O Analysis — Deep Dive

Overview

Disk I/O is often the bottleneck that masquerades as other problems. When a database is slow, it's usually waiting for disk. When an application has high latency, it might be a log file write blocking the event loop.

melisai's DiskCollector (internal/collector/disk.go) uses two-point sampling of /proc/diskstats and enriches the data with sysfs device properties.

Source File: disk.go

  • Lines: 174
  • Functions: 6
  • Data Sources: /proc/diskstats, /sys/block/*/queue/

Function Walkthrough

Collect() — Two-Point I/O Sampling

func (c *DiskCollector) Collect(ctx context.Context, cfg CollectConfig) (*model.Result, error) {
    sample1 := c.readDiskStats()
    // Wait 1 second
    sample2 := c.readDiskStats()
    // Compute deltas for each device
    for name, s2 := range sample2 {
        s1, ok := sample1[name]
        dev := model.DiskDevice{
            Name:         name,
            ReadOps:      int64(s2.readOps - s1.readOps),
            WriteOps:     int64(s2.writeOps - s1.writeOps),
            ReadBytes:    int64(s2.readBytes - s1.readBytes),
            WriteBytes:   int64(s2.writeBytes - s1.writeBytes),
            IOInProgress: int64(s2.ioInProg),
            IOTimeMs:     int64(s2.ioTimeMs - s1.ioTimeMs),
            WeightedIOMs: int64(s2.wIOTimeMs - s1.wIOTimeMs),
        }
        // Enrich with sysfs data
        dev.Scheduler = c.readScheduler(basePath)
        dev.QueueDepth = c.readQueueDepth(basePath)
        dev.Rotational = c.readFile(...) == "1"
        dev.ReadAheadKB, _ = strconv.Atoi(c.readFile(...))
    }
}

readDiskStats() — Parsing /proc/diskstats

func (c *DiskCollector) readDiskStats() map[string]diskStatsRaw {
    // /proc/diskstats has 14+ fields per line
    // Fields: major minor name reads_completed reads_merged sectors_read
    //         time_reading writes_completed writes_merged sectors_written
    //         time_writing ios_in_progress time_doing_io weighted_time_io
    readOps, _ := strconv.ParseUint(fields[3], 10, 64)
    readSectors, _ := strconv.ParseUint(fields[5], 10, 64)
    // ...
    result[name] = diskStatsRaw{
        readBytes:  readSectors * 512,   // sectors are always 512 bytes
        // ...
    }
}

The 512-byte sector convention: Regardless of the actual disk sector size (512 or 4096 bytes), /proc/diskstats always counts in 512-byte sectors. This is a Linux convention.

readScheduler() — I/O Scheduler Detection

func (c *DiskCollector) readScheduler(basePath string) string {
    // /sys/block/sda/queue/scheduler
    // "[mq-deadline] kyber bfq none"  ← active scheduler in brackets
    if idx := strings.Index(data, "["); idx >= 0 {
        end := strings.Index(data[idx:], "]")
        return data[idx+1 : idx+end]  // "mq-deadline"
    }
}

I/O Schedulers explained:

Scheduler Best For How It Works
mq-deadline General purpose, databases Guarantees max latency per request
bfq Desktops, interactive Fair bandwidth sharing between processes
kyber Fast SSDs Minimal overhead, latency targets
none NVMe, fast SSDs No scheduling, direct dispatch

Tuning rule: NVMe drives should use none or kyber. Rotational drives benefit from mq-deadline.

Key Metrics

IOTimeMs — Utilization

IOTimeMs counts milliseconds during which the device had I/O in progress. Over a 1-second sample:

Utilization% = IOTimeMs / 10

If IOTimeMs = 1000 (1 second), the disk was 100% busy.

WeightedIOMs — Latency × Depth

WeightedIOMs = sum of (time spent × number of I/Os). It captures both latency and queue depth. High weighted IO time with low IO count = each operation is slow.

IOInProgress — Queue Depth

The number of I/O operations currently in flight. This is a saturation indicator:

Queue Depth Assessment
0-1 Normal
2-8 Active but healthy
8-32 Getting saturated
> 32 Heavily saturated

Diagnostic Examples

Fast NVMe Drive

{
  "name": "nvme0n1",
  "read_ops": 150,
  "write_ops": 3200,
  "io_time_ms": 45,
  "io_in_progress": 1,
  "scheduler": "none",
  "rotational": false,
  "queue_depth": 1023
}
4.5% utilization, 1 in-flight IO, none scheduler — perfect for NVMe.

Struggling HDD

{
  "name": "sda",
  "read_ops": 280,
  "write_ops": 50,
  "io_time_ms": 920,
  "io_in_progress": 12,
  "scheduler": "mq-deadline",
  "rotational": true,
  "read_ahead_kb": 128
}
92% utilization, 12 in-flight IOs — this HDD is near saturation. Consider moving to SSD.


Next: Chapter 5 — Network Analysis