Tracing Tokio in Production: tokio-console, Spans, and Finding the Blocked Await

A tokio service runs fine in development, ships to a real cluster, and the p99 latency chart grows a long tail. Requests that should finish in 20ms occasionally take 800ms. CPU is idle. The database is fast. The trace logs show the request entered the handler and then waited. Something is blocking a worker thread, but no log line names the culprit.

This article walks through the production wiring you need on a tokio service so the next time this happens you can find the culprit in under a minute. We set up tracing-subscriber with structured spans and JSON output for log shipping, add tokio-console behind a cargo feature flag so it stays out of release binaries, and then reproduce the textbook bug: a synchronous std::fs::read_to_string call inside an async handler. By the end you can see what the bug looks like in three places: tokio-console's task view, a tracing-subscriber FmtLayer log, and a cargo flamegraph profile. Each view is good at a different thing, and the article closes with a decision matrix so you know which tool to reach for based on the symptom.

The three observability tools

Reaching for any one of these as a default is wrong. Each surfaces a different layer of the system.

tokio-console is a runtime introspection tool. It connects to your binary over gRPC and shows the live state of every tokio task: how long each one has been polled, how long it has been idle waiting on a resource, and which spans are currently active. The project lives at github.com/tokio-rs/console. When a task pins a worker thread, tokio-console shows it as a task with a high busy duration and a near-zero idle time. The signal is loud.

tracing is a structured logging crate. The handler emits spans and events that get processed by a subscriber. With tracing-subscriber's EnvFilter you can dial verbosity per-module without recompiling, and with tracing_subscriber::fmt::json() the output becomes a line-delimited JSON stream that ships cleanly to Loki, Elasticsearch, or whatever log backend you run. The crate is documented at docs.rs/tracing.

cargo-flamegraph generates an SVG flamegraph from a profiling run. The tool wraps perf on Linux and dtrace on macOS. It shows you where CPU cycles go, which is great for a busy loop but useless for a task that is simply blocking on a syscall. The repository is at github.com/flamegraph-rs/flamegraph.

The decision rule, in one sentence: tokio-console for stuck async tasks, tracing for "what was this request doing", flamegraph for "where are CPU cycles going".

Production wiring for tracing-subscriber

The base layer of any tokio service should be tracing-subscriber configured for JSON output with EnvFilter driven by an environment variable. Add this to Cargo.toml:

[dependencies]
tokio = { version = "1.40", features = ["full", "tracing"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }

Then in main.rs:

use tracing_subscriber::{fmt, prelude::*, EnvFilter};

fn init_tracing() {
    let filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info,sqlx=warn"));

    tracing_subscriber::registry()
        .with(filter)
        .with(fmt::layer().json().with_current_span(true))
        .init();
}

#[tokio::main]
async fn main() {
    init_tracing();
    tracing::info!(version = env!("CARGO_PKG_VERSION"), "service starting");
    run_server().await;
}

The with_current_span(true) setting makes each log event carry its enclosing span as a JSON field. When you ship logs to a backend that indexes JSON fields, you can filter all events under a single request span by its request_id. The handler then looks like this:

use tracing::{info, instrument};

#[instrument(skip(state), fields(request_id = %uuid::Uuid::new_v4()))]
async fn handle_get_config(state: AppState, path: String) -> Result<String, Error> {
    info!("loading config");
    let body = load_config_file(&path).await?;
    info!(bytes = body.len(), "loaded config");
    Ok(body)
}

That is the baseline. Every handler is #[instrument]-ed, every log event is structured, and the environment variable RUST_LOG=debug,my_service=trace flips verbosity without a redeploy.

Adding tokio-console behind a feature flag

tokio-console is wonderful in staging and a liability in release builds. The instrumentation overhead is small but nonzero, the gRPC port is an attack surface, and most production servers do not need it running continuously. The pattern is a cargo feature that gates both the dependency and the subscriber layer.

[features]
default = []
console = ["dep:console-subscriber"]

[dependencies]
console-subscriber = { version = "0.4", optional = true }

Then in the init code:

fn init_tracing() {
    let filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info,sqlx=warn"));

    let registry = tracing_subscriber::registry()
        .with(filter)
        .with(fmt::layer().json().with_current_span(true));

    #[cfg(feature = "console")]
    let registry = registry.with(console_subscriber::spawn());

    registry.init();
}

console_subscriber::spawn() boots a background task that listens on 127.0.0.1:6669 by default. To enable it you also need the tokio_unstable cfg flag at compile time. The standard pattern is a .cargo/config.toml in the repo root:

[build]
rustflags = ["--cfg", "tokio_unstable"]

Or per-invocation:

RUSTFLAGS="--cfg tokio_unstable" cargo run --features console

Release binaries built without --features console skip both the dependency and the runtime overhead. The tokio_unstable cfg has no effect on the binary unless console-subscriber is also linked.

A useful pattern for staging: deploy two binaries, one built with --features console running on the canary slice, one built without on the rest of the fleet. When a bug shows up only on the canary, you SSH in and run tokio-console http://localhost:6669 from your laptop with port-forwarding.

Reproducing mystery tail latency

The bug we want to see is a blocked worker. Tokio's multi-threaded runtime defaults to one worker thread per logical CPU. When a worker is polling a future and that future calls a blocking syscall, the worker is pinned. Other tasks queued for that worker wait. Most of the time you do not notice because the runtime has more workers than load. Under bursty load you notice immediately.

Here is the reproduction:

use std::fs;
use tracing::instrument;

#[instrument]
async fn read_config_sync(path: &str) -> String {
    // BUG: this is std::fs, not tokio::fs.
    // The function never yields. The current worker is pinned
    // for the duration of the syscall, which on a slow disk
    // can be hundreds of milliseconds.
    fs::read_to_string(path).unwrap()
}

#[instrument]
async fn read_config_async(path: &str) -> String {
    tokio::fs::read_to_string(path).await.unwrap()
}

The sync variant is the kind of code that creeps into a real service. A junior engineer reaches for std::fs because it is the example in the Rust book. A senior engineer copies a snippet from a CLI tool. Code review passes because the function returns a String and is awaited by callers.

Now drive load against the handler with wrk or hey, hold concurrency above the worker count, and watch.

What the bug looks like in tokio-console

Connect to the running process:

tokio-console http://localhost:6669

The default view is a sortable table of every task. Sort by Busy descending. The read_config_sync task rises to the top with a busy duration in the hundreds of milliseconds and an Idle of near zero. That ratio is the signal. A healthy async task has busy in microseconds and idle in milliseconds. A blocked task has the inverse.

Drilling into the task shows its current span stack and which .await point it is sitting at. For a std::fs::read_to_string call there is no .await, so tokio-console shows the task as never having yielded since it started. That alone is the fingerprint: in async Rust, a task that never yields is a task blocking a worker.

What the bug looks like in a tracing log

The JSON log stream shows the request entered the handler and emitted the "loading config" event. Then nothing for 400ms. Then the "loaded config" event with the byte count. The gap between events is the symptom, but the log line does not tell you the worker was pinned. You see the latency, not the cause.

This is the limit of structured logging for runtime issues. Tracing tells you the story of a single request. It does not tell you the story of contention between requests. Two requests hitting the buggy handler at the same time both show the gap, but the log does not show that they were waiting on each other.

What the bug looks like in a flamegraph

Run the service under load and capture a profile:

cargo flamegraph --bin my-service -- --port 8080

After 30 seconds of load, stop the binary and open the SVG. The flame graph shows CPU samples grouped by stack frame. For a std::fs::read_to_string bug, the relevant stack appears under a tokio::runtime::worker::run frame and ends in a syscall. The flame is narrow because the bug is not CPU bound. The actual blocking happens in the kernel, and perf records most of those samples as kernel-mode time.

Flamegraphs lie about blocked async tasks. The graph shows where CPU cycles went, not where wall time went. A task blocked on a syscall is invisible in the user-mode flame and shows up only as kernel-mode samples that look like normal IO. If you reach for a flamegraph first when a service has tail latency, you will spend an hour looking at the wrong picture.

The decision matrix

Three symptoms, three tools.

Symptom	First tool	Why
p99 tail latency with idle CPU	tokio-console	Shows blocked tasks directly
One request slow, others fine	tracing JSON logs	Per-request timeline in one stream
CPU pinned at 100%, throughput dropping	cargo flamegraph	Shows where cycles go
Memory growth over hours	tracing + heap profiler	Logs tell you which spans allocate
Tasks completing 3× slower than benchmarks suggest	tokio-console	Compare `Busy` to expected `Idle` ratio

The order matters. Reaching for a flamegraph when CPU is idle wastes 30 minutes on a profile that will not show the bug. Reaching for tokio-console when CPU is at 100% wastes 5 minutes connecting to a process that is not actually stuck on async, it is stuck on compute.

What runs in production today

The baseline: tracing-subscriber with EnvFilter and JSON output, on every binary, always on. The console feature: behind a cargo feature, deployed only to canary, only when chasing a specific symptom. The flamegraph: ad-hoc, run from a developer machine against a load-replicating environment, never against a live binary.

Cost-wise, the tracing layer adds about 80ns per log event when filtered out by EnvFilter and a few microseconds when emitted. The console-subscriber layer adds a hundreds-of-bytes-per-task memory overhead and a small per-poll CPU cost. The flamegraph has no runtime cost because it only runs during profiling.

The bug we reproduced, a single std::fs::read_to_string in an async handler, is the canonical first bug every tokio service hits. The second is calling .lock() on a std::sync::Mutex across .await points. The third is spawning a long-running CPU task on the tokio runtime instead of using spawn_blocking. All three look the same in tokio-console: a task with high busy, low idle, never yields. All three are invisible in a flamegraph. All three show up as silent gaps in a tracing log.

Once the wiring is set up, finding the bug becomes a routine: connect tokio-console, sort by busy, look at the top three tasks. The whole investigation takes under a minute. The wiring takes an hour to set up once and never again.

References: