Skip to content

kedro-psutil-telemetry

A Kedro hook that continuously samples system resources in a background thread during pipeline execution and dispatches metrics to pluggable sinks (console, MLflow, or your own).

Installation

pip install kedro-psutil-telemetry

That's it. Kedro auto-discovers the hook via entry points — no settings.py changes needed. Metrics are logged to the console at the default 1-second interval.

With MLflow support:

pip install kedro-psutil-telemetry[mlflow]

Logging

Kedro's default logging configuration only enables INFO-level output for the kedro logger namespace. Since kedro_psutil_telemetry lives in a separate namespace, its metrics are silently dropped unless you explicitly enable them.

Add a conf/logging.yml to your project (Kedro picks this up automatically):

version: 1

disable_existing_loggers: False

handlers:
  rich:
    class: kedro.logging.RichHandler
    rich_tracebacks: True

loggers:
  kedro:
    level: INFO
  kedro_psutil_telemetry:
    level: INFO

root:
  handlers: [rich]

Without this file you will see the hook registered in kedro info but no telemetry output during kedro run. This is not a bug in the hook — it is a consequence of how Kedro scopes its default logging.

Note

If you use a custom conf/logging.yml already, just add kedro_psutil_telemetry: {level: INFO} under the loggers key.

Usage

Zero-config (auto-discovery)

Install the package and run your pipeline. Kedro registers the hook automatically with default settings (console logging, 1s interval, all metrics enabled).

Custom configuration

To change the sink, interval, or any metric, register the hook explicitly in settings.py:

from kedro_psutil_telemetry import PipelinePsutilTelemetry, mlflow_sink

HOOKS = (
    PipelinePsutilTelemetry(
        sink=mlflow_sink,  # log to MLflow instead of console
        interval=2.0,      # sample every 2 seconds
    ),
)

Manual registration takes precedence — Kedro won't double-register the hook.

Metrics

All metrics are tagged with the currently-running node name and use the configurable prefix (default: "pipeline").

Metric Description
{prefix}.mem.rss_mb RSS memory of main process + children (MB)
{prefix}.mem.swap_mb System swap usage (MB)
{prefix}.cpu.percent System-wide CPU utilisation (%)
{prefix}.io.read_mb Disk read since last sample (MB)
{prefix}.io.write_mb Disk write since last sample (MB)
{prefix}.net.sent_mbs Network bytes sent per second (MB/s)
{prefix}.net.recv_mbs Network bytes received per second (MB/s)

A peak-RSS and peak-CPU summary is logged at the end of every run.

All options

HOOKS = (
    PipelinePsutilTelemetry(
        sink=mlflow_sink,      # or a list of sinks, or your own callable
        interval=2.0,          # sample every 2 seconds (default: 1.0)
        prefix="pipeline",     # metric name prefix (default: "pipeline")
        include_children=True, # include child processes (default: True)
        cpu=False,             # disable a metric by passing False
        net_sent=False,
        disk_read="my.reads",  # rename a metric by passing a string
    ),
)

Example project

A minimal Kedro project demonstrating the hook in action is available at github.com/saemeon/kedro-hooks-test-project. Clone it to see the full setup and expected output.

Custom Sinks

Any callable matching (name: str, value: float, step: int, tags: dict | None) -> None works as a sink:

def my_sink(name, value, step, tags=None):
    print(f"{name}={value:.2f} @ step {step} node={tags.get('node')}")

HOOKS = (PipelinePsutilTelemetry(sink=my_sink),)