Experiment result metrics#

class ablator.modules.metrics.main.Metrics(*args: Any, batch_limit: int | None = 30, memory_limit: int | None = 100000000, evaluation_functions: dict[str, collections.abc.Callable] | None = None, moving_average_limit: int | None = 3000, static_aux_metrics: dict[str, Any] | None = None, moving_aux_metrics: Iterable[str] | None = None)[source]

Stores and manages predictions and calculates metrics given some custom evaluation functions. This class makes batch-updates as metrics are calculated while training/evaluating a model. It takes into account the memory limits, applies evaluation functions, and provides cached or online updates on the metrics.

We can access all the metrics from the Metrics object using its to_dict() method. Refer to Prototyping Models tutorial for more details.

Parameters:

*argsty.Any: This argument is just for disabling passing by positional arguments.
batch_limitint | None: Maximum number of batches to keep for every category of data (specified by tags), so only batch_limit number of latest batches is stored for each of the categories, by default 30.
memory_limitint | None: Maximum memory (in bytes) of batches to keep for every category of data (specified by tags). Every time this limit is exceeded, batch_limit will be reduced by 1, by default 1e8.
evaluation_functionsdict[str, Callable] | None: A dictionary of key-value pairs, keys are evaluation function names, values are callable evaluation functions, e.g mean, sum. Note that arguments to this Callable must match with names of prediction batches that the model returns. So if model prediction over a batch looks like this: {"preds": <batch of predictions>, "labels": <batch of predicted labels>}, then callable’s arguments should be preds and labels, e.g evaluation_functions= {"mean": lambda preds, labels: np.mean(preads) + np.mean(labels)}, by default None.
moving_average_limitint | None: The maximum number of values allowed to store moving average metrics, by default 3000.
static_aux_metricsdict[str, ty.Any] | None: A dictionary of static metrics, those with their initial value that are updated manually, such as learning rate, best loss, total steps, etc. Keys of this dictionary are static metric names, while values is a proper initial value, by default None.
moving_aux_metricsIterable[str] | None: A list of metrics, those we update with their moving average, such as loss, by default None.

Examples

Initialize an object of Metrics:

>>> from ablator.modules.metrics.main import Metrics
>>> train_metrics = Metrics(
...     batch_limit=30,
...     memory_limit=None,
...     evaluation_functions={"mean": lambda x: np.mean(x)},
...     moving_average_limit=100,
...     static_aux_metrics={"lr": 1.0},
...     moving_aux_metrics={"loss"},
... )
>>> train_metrics.to_dict() # metrics are set to np.nan if it's not updated yet
{'loss': nan, 'lr': 1.0, 'mean': nan}

to_dict() → dict[str, Any][source]

Get all metrics, i.e moving auxiliary metrics, moving evaluation metrics, and static auxiliary metrics. Note that moving attributes will be an averaged value of all previous batches. Metrics are set to np.nan if it’s never updated.

Returns:

dict[str, ty.Any]: Contains key-value pairs for the metric’s name and its value.

Examples

>>> from ablator.modules.metrics.main import Metrics
>>> train_metrics = Metrics(
...     batch_limit=30,
...     memory_limit=None,
...     evaluation_functions={"mean": lambda preds: np.mean(preds)},  # mean of all predictions appended
...     moving_average_limit=100,
...     static_aux_metrics={"lr": 0.75},
...     moving_aux_metrics={"loss"},
... )
>>> train_metrics.append_batch(preds=np.array([[100]*10]))
>>> train_metrics.evaluate(reset=False, update=True)
>>> train_metrics.to_dict()
{'loss': nan, 'lr': 0.75, 'mean': 100.0}
>>> train_metrics.append_batch(preds=np.array([0] * 10))
>>> train_metrics.evaluate(reset=True, update=True)
>>> train_metrics.to_dict()
{'loss': nan, 'lr': 0.75, 'mean': 50.0}