Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models

by Karn N. Watcharasupat and Junyoung Lee

Controllable deep generative models have promising applications in various fields such as computer vision, natural language processing, or music. However, implementations of evaluation metrics for these generative models remain non-standardized. Evaluating disentanglement learning, in particular, might require implementing your own metrics, possibly entangling you more than when you started.

Latte (for latent tensor evaluation) is a package designed to help both you and your latent-based model to stay disentangled at the end of the day (YMMW, of course). The Python package is, by design, created to work with both PyTorch and TensorFlow. All metrics are first implemented in NumPy with minimal dependencies (like scikit-learn) and then a wrapper is created to turn these NumPy functions into TorchMetrics or Keras Metric modules. This way, each metric is always calculated in the exact same way regardless of the deep learning framework being used. In addition, the functional NumPy API is also exposed, so that post-hoc evaluation or models from other frameworks can also enjoy our metric implementations.

Currently, our package supports the classic disentanglement metrics: Mutual Information Gap (MIG), Separate Attribute Predictability (SAP), Modularity. In addition, several dependency-aware variants of MIG proposed here and here are also included. These metrics are useful for situations where your semantic attributes are inherent dependent with respect to one another, a situation where traditional metrics might penalize a latent space that has correctly learned the nature of the semantic attributes. Latte also implements interpolatability metrics which evaluate how smoothly or monotonically your decoder translates the latent vectors into generated samples.

To make life simpler, Latte also comes equipped with metric bundles which are optimized implementations of multiple metrics commonly used together. The bundles optimize away duplicate computation of identical or similar steps in the metric computation, reducing both the lines of code needed and the runtime. We are working to add more metrics and bundles into our package. The most updated list can always be found at our GitHub repository.

Latte can be easily installed via pip using pip install latte-metrics. We have also created a few Google Colab notebooks demonstrating how you can use Latte to evaluate an attribute-regularized VAE for controlling MNIST digits, using vanilla PyTorch, PyTorch Lightning, and TensorFlow. The full documentation of our package can be found here.