Skip to content

Chapter 9: BCC Tools — Deep Kernel Tracing

Overview

Tier 1 collectors read counters from /proc — they tell you what is happening (CPU is 90% busy). But they can't tell you why (which process, which function, what latency distribution).

That's where BCC (BPF Compiler Collection) tools come in. They attach eBPF programs to kernel functions and trace events in real-time, giving you histograms, per-event details, and stack traces.

melisai's executor package (internal/executor/) manages running these tools securely and parsing their output.

Source Files: executor/ (5 files)

File Lines Purpose
executor.go 133 BCCExecutor — runs external binaries with security checks
security.go 133 SecurityChecker — verifies binary integrity
registry.go 181 Registry — catalog of 20 BCC tools
parsers.go 463 Output parsers for histograms, tables, stacks
aggregate.go 149 Event aggregation (top-N, connections)

The Executor: How BCC Tools Are Run

BCCExecutor.Run()

func (e *BCCExecutor) Run(ctx context.Context, toolName string, duration time.Duration) (*model.Result, error) {
    spec, ok := Registry[toolName]          // Look up tool specification
    binary := e.security.ResolveBinary(spec.Binary)  // Find and verify path
    args := spec.BuildArgs(duration)         // Build CLI arguments
    env := e.security.SanitizeEnv()          // Clean environment

    cmd := exec.CommandContext(ctx, binary, args...)
    cmd.Env = env
    cmd.Stdout = NewLimitedWriter(50 * 1024 * 1024)  // Cap output at 50MB

    cmd.Start()
    // Wait for completion or context cancellation
    cmd.Wait()

    // Parse output using tool-specific parser
    result := spec.Parser(output)
}

LimitedWriter — Output Protection

type LimitedWriter struct {
    buf   bytes.Buffer
    limit int
}

func (w *LimitedWriter) Write(p []byte) (int, error) {
    if w.buf.Len() + len(p) > w.limit {
        return 0, ErrOutputLimitExceeded  // Stop accepting data
    }
    return w.buf.Write(p)
}

Why limit output? Some BCC tools (like execsnoop, tcpdrop) produce per-event output. On a busy server, this can generate gigabytes. The 50MB cap prevents memory exhaustion.

Security Model

Why Security Matters

BCC tools run as root and can trace ANY kernel function. Compromised binaries could: - Read arbitrary kernel memory - Modify system calls - Log sensitive data (passwords, crypto keys)

SecurityChecker

func (s *SecurityChecker) ResolveBinary(name string) string {
    for _, dir := range AllowedBinaryPaths {
        path := filepath.Join(dir, name)
        if s.VerifyBinary(path) { return path }
    }
    return ""  // Binary not found in allowed paths
}

func (s *SecurityChecker) VerifyBinary(path string) bool {
    info, _ := os.Stat(path)
    stat := info.Sys().(*syscall.Stat_t)
    // Must be owned by root (UID 0)
    if stat.Uid != 0 { return false }
    // Must not be world-writable
    if info.Mode()&0002 != 0 { return false }
    return true
}

Allowed Binary Paths

var AllowedBinaryPaths = []string{
    "/usr/share/bcc/tools",      // Ubuntu/Debian bcc-tools
    "/usr/sbin",                  // System binaries
    "/usr/bin",                   // Standard PATH
    "/sbin",                      // Legacy system binaries
    "/snap/bpftrace/current/bin", // Snap package
}

Environment Sanitization

func (s *SecurityChecker) SanitizeEnv() []string {
    return []string{
        "PATH=/usr/sbin:/usr/bin:/sbin:/bin",
        "HOME=/root",
        "LANG=C",
    }
    // Removes LD_PRELOAD, LD_LIBRARY_PATH, and all other
    // potentially dangerous environment variables
}

Tool Registry — The Complete Catalog

CPU Tools

Tool What It Traces Output Type
runqlat Time spent in CPU run queue Histogram (μs)
runqlen Run queue length Histogram (count)
cpudist On-CPU time per process Histogram (μs)
hardirqs Hardware interrupt time Histogram (μs)
softirqs Software interrupt time Histogram (μs)
profile CPU stack sampling (flame graph) Folded stacks
offcputime Off-CPU stack traces (why blocked?) Folded stacks

Disk Tools

Tool What It Traces Output Type
biolatency Block I/O latency per device Histogram (μs)
biosnoop Every block I/O operation Table (per-event)
ext4slower Slow ext4 filesystem operations Table (per-event)
fileslower Slow file reads/writes (>10ms) Table (per-event)
bitesize I/O size distribution Histogram (bytes)

Network Tools

Tool What It Traces Output Type
tcpconnlat TCP connection establishment time Table (per-event)
tcpretrans TCP retransmissions Table (per-event)
tcprtt TCP round-trip time Histogram (μs)
gethostlatency DNS resolution time Table (per-event)
tcpdrop Dropped TCP packets (with reason) Table (per-event)
tcpstates TCP state transitions Table (per-event)

Other Tools

Tool What It Traces Output Type
cachestat Page cache hit/miss ratio Table (per-interval)
execsnoop New process execution Table (per-event)

Output Parsers

ParseHistogram() — Power-of-2 Distributions

BCC histogram output looks like:

     usecs               : count    distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 15       |****                                    |
         8 -> 15         : 107      |*****************************           |
        16 -> 31         : 145      |****************************************|
        32 -> 63         : 83       |***********************                 |
        64 -> 127        : 12       |***                                     |
       128 -> 255        : 3        |*                                       |

The parser converts this to:

type Histogram struct {
    Name    string         // e.g. "runqlat"
    Unit    string         // e.g. "usecs"
    Buckets []HistBucket   // [{Low:0, High:1, Count:0}, {Low:2, High:3, Count:0}, ...]
    P50     float64        // 50th percentile (median)
    P90     float64        // 90th percentile
    P99     float64        // 99th percentile
    P999    float64        // 99.9th percentile
}

Percentile calculation: The parser iterates through buckets, accumulating counts until reaching the target percentage. The actual percentile value is the midpoint of the target bucket.

ParseTabularEvents() — Per-Event Data

PID    COMM         LAT(ms) RADDR            RPORT
5234   curl         1.52    203.0.113.50     443
8901   python       23.41   10.0.0.5         5432

Parsed into:

type Event struct {
    Fields map[string]string  // {"PID":"5234", "COMM":"curl", "LAT":"1.52", ...}
}

ParseFoldedStacks() — Flame Graph Data

main;handleRequest;db.Query;net.Write 42
main;handleRequest;json.Marshal 15
main;handleRequest;log.Printf 3

Each line: semicolon-separated stack (bottom→top) + sample count.

Aggregation

AggregateByField() — Top-N Analysis

func AggregateByField(events []model.Event, field string, topN int) []AggregatedEntry {
    // Groups events by field value, computes count, average, and total
    // Example: group tcpconnlat events by RADDR → top 10 slowest destinations
}

AggregateConnections() — Network Summary

func AggregateConnections(events []model.Event) []ConnectionSummary {
    // Groups by source→destination, computes connection count and avg latency
}

Interpreting BCC Results

runqlat — How long do processes wait for CPU?

Percentile Good Warning Critical
p50 < 10μs 10-100μs > 100μs
p99 < 100μs 100-1000μs > 1ms

biolatency — How long do disk operations take?

Percentile SSD HDD Critical
p50 < 100μs < 5ms
p99 < 1ms < 20ms > 50ms

tcpconnlat — Connection time

Latency Meaning
< 1ms Same datacenter
1-5ms Same region
5-50ms Cross-region
> 100ms Cross-continent or network issue
> 1000ms DNS or routing problem

Next: Chapter 10 — Native eBPF