rust.
rust6 min read

Tokio Task Cancellation: select! drop, CancellationToken, and Structured Concurrency

A practical tour of cancellation in Tokio: when to drop a JoinHandle, when to reach for CancellationToken, and how structured concurrency keeps daemons honest.

Tokio Task Cancellation: select! drop, CancellationToken, and Structured Concurrency

Cancelling an async task in Tokio sounds trivial until a long-running daemon needs to shut down without leaking sockets, half-written files, or zombie subprocesses. Rust's async model gives you no preemption: a task only stops at an .await point, and only if something tells it to. The "something" is what this article is about.

Three patterns cover almost every real shutdown scenario: dropping a future inside tokio::select!, propagating a CancellationToken, and scoping work with structured concurrency primitives. Each one trades cooperation for control in a different way, and picking the wrong one leaves you with tasks that either die mid-write or refuse to die at all.

Cancellation in Tokio is cooperative

A spawned task runs until it hits an .await. If nobody polls the resulting future again, the task's state machine is dropped, and any Drop impls along its stack run. That's it. There is no Thread::interrupt, no signal, no forced unwind. The Tokio docs make this explicit in the task module: JoinHandle::abort() only requests cancellation at the next yield point.

The practical consequence: a task stuck in a CPU loop with no .await is uncancellable. A task awaiting a 10-second tokio::time::sleep cancels in microseconds because dropping the sleep future is instant. Cancellation latency is bounded by the time to the next yield, not by Tokio.

Pattern 1 \u2014 select! drop

The cheapest cancellation primitive is tokio::select! racing your work against a shutdown signal. Whichever branch completes first wins; the other branch's future is dropped on the floor.

use tokio::sync::oneshot;
use tokio::time::{sleep, Duration};

async fn worker(mut shutdown: oneshot::Receiver<()>) {
    loop {
        tokio::select! {
            _ = &mut shutdown => {
                tracing::info!("shutdown received, exiting worker");
                return;
            }
            _ = sleep(Duration::from_secs(5)) => {
                do_one_unit_of_work().await;
            }
        }
    }
}

This is the right tool when:

  • You have one task and one shutdown channel.
  • The work inside the non-shutdown branch is safely droppable: the future holds no half-applied database transaction, no partially-flushed buffer, no acquired lock that needs explicit release beyond Drop.

Where it falls down: dropping a future cancels every nested .await simultaneously. If your task was in the middle of db.execute("INSERT \u2026"), the connection is returned to the pool with the statement in an indeterminate state. You can't run cleanup code after the cancellation point because there is no cancellation point \u2014 the future just stops existing. This is the classic cancellation safety footgun called out in the select! macro docs.

Rule of thumb: use select! drop when the inner future is either short, idempotent, or composed entirely of cancel-safe primitives like tokio::sync::mpsc::Receiver::recv and tokio::time::sleep.

Pattern 2 \u2014 CancellationToken

For anything more than a single worker, tokio_util::sync::CancellationToken from the tokio-util crate is the workhorse. A token is a cheap clonable handle. Cancelling the parent flips a flag; every clone observes it.

use tokio_util::sync::CancellationToken;

async fn supervisor() {
    let root = CancellationToken::new();

    let ingest = tokio::spawn(ingest_loop(root.child_token()));
    let writer = tokio::spawn(writer_loop(root.child_token()));
    let metrics = tokio::spawn(metrics_loop(root.child_token()));

    tokio::signal::ctrl_c().await.unwrap();
    root.cancel();

    let _ = tokio::join!(ingest, writer, metrics);
}

async fn writer_loop(cancel: CancellationToken) {
    loop {
        tokio::select! {
            _ = cancel.cancelled() => {
                flush_pending_writes().await;
                return;
            }
            msg = next_message() => {
                handle(msg).await;
            }
        }
    }
}

Two properties matter here. First, cancel.cancelled() returns a future you can select! on, so cancellation composes with normal work. Second, the task chooses when to exit. After observing cancellation, writer_loop runs flush_pending_writes().await before returning. With select! drop alone, that flush would never happen.

CancellationToken shines over plain oneshot channels in three cases:

  • Multiple consumers \u2014 clone the token freely; oneshot can only be received once.
  • Hierarchical cancellation \u2014 child_token() produces a token cancelled when either the child or any ancestor is cancelled. A subsystem can shut down independently without taking the rest of the daemon with it.
  • Already-cancelled checks \u2014 token.is_cancelled() is a synchronous read, useful inside CPU-bound sections that yield occasionally.

The cost: you pay one allocation per token tree and one Notify wakeup per cancellation. For a daemon with dozens of long-lived tasks, this is invisible. For a hot loop spawning thousands of micro-tasks per second, prefer select! drop with a shared Arc<AtomicBool>.

Pattern 3 \u2014 Structured concurrency with JoinSet and TaskTracker

tokio::task::JoinSet and tokio_util::task::TaskTracker give you the third tool: scoped lifetime guarantees. A JoinSet owns its spawned tasks. When the set is dropped, every task in it is aborted. This is structured concurrency in the Trio sense, backported to Rust async \u2014 no task outlives the scope that spawned it.

use tokio::task::JoinSet;
use tokio_util::sync::CancellationToken;

async fn process_batch(items: Vec<Item>) -> Vec<Result<Output, Error>> {
    let cancel = CancellationToken::new();
    let mut set = JoinSet::new();

    for item in items {
        let token = cancel.child_token();
        set.spawn(async move {
            tokio::select! {
                _ = token.cancelled() => Err(Error::Cancelled),
                res = process(item) => res,
            }
        });
    }

    let mut results = Vec::new();
    while let Some(joined) = set.join_next().await {
        let res = joined.unwrap_or(Err(Error::Panicked));
        if matches!(res, Err(Error::Fatal(_))) {
            cancel.cancel();
        }
        results.push(res);
    }
    results
}

The advantage over manual Vec<JoinHandle<T>>: if process_batch is itself dropped because its caller cancelled, the JoinSet drops, every child task aborts, and the cancellation propagates without bookkeeping. You can't accidentally leak a task by forgetting to await it \u2014 the type system enforces the join.

TaskTracker is the cousin you want when tasks must run to completion but must not start new work after a shutdown signal. It pairs naturally with CancellationToken: token says "stop pulling new work," tracker.wait() says "let the in-flight work finish."

Choosing between the three

PatternBest forCancellation latencyCleanup-on-cancel
select! dropSingle task, cancel-safe inner futureMicrosecondsDrop impls only
CancellationTokenMulti-task daemon, hierarchical shutdownMicroseconds + cooperationArbitrary async cleanup
JoinSet / TaskTrackerBounded scope, fan-out/fan-inBounded by parent dropScope-based

In practice, production Tokio daemons combine all three: a root CancellationToken for graceful shutdown, JoinSet for fan-out work units inside each subsystem, and select! drop at the leaves where the future is provably cancel-safe.

Two traps worth memorising

Holding a MutexGuard across .await defeats cancellation cleanup. If a task is cancelled while holding a tokio::sync::Mutex lock, the guard is dropped \u2014 fine. But if the lock is std::sync::Mutex and the task panics during cancellation cleanup, the mutex is poisoned. Prefer tokio::sync::Mutex or parking_lot::Mutex for any lock that lives across await points.

spawn_blocking tasks are not cancellable. They run on a separate thread pool and cannot be interrupted. If a task may run for minutes inside spawn_blocking, give the closure a CancellationToken and check is_cancelled() between work units. The spawn_blocking docs call this out: aborting the JoinHandle only stops you from seeing the result, not the work itself.

Cancellation in Tokio rewards the operator who plans for it from day one. Pick the smallest pattern that gives you the cleanup guarantees you need, write the cancellation path in the same PR as the happy path, and your shutdown will be measured in milliseconds instead of kill -9.

References: