Invited Speakers

Professor Gitta Kutyniok

Ludwig Maximilian University of Munich, University of Tromsø

CartoonX: Using information theory to reveal the reason for (wrong) decisions by DNNs

CartoonX is a novel model-agnostic explanation method tailored towards image classifiers and based on the rate-distortion explanation framework coined RDE from information theory. It is the first explanation method which aims for higher-level explanations by exploiting the sparsity of images in the wavelet domain. We will show that CartoonX is not only highly interpretable due to its piece-wise smooth nature but also particularly apt at explaining misclassifications. This is joint work with Stefan Kolek, Duc Anh Nguyen, Ron Levie, and Joan Bruna.  

Professor Alireza Makhzani

Vector Institute for Artificial Intelligence; University of Toronto

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of mutual information is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high dimensional log partition function. In this work, we view mutual information estimation from the perspective of importance sampling. Since naive importance sampling with the marginal density as a proposal requires exponential sample complexity in the true mutual information, we propose Multi-Sample Annealed Importance Sampling (AIS) bounds on mutual information to bridge between the marginal and joint distributions. In settings where the full joint distribution is available, we provide lower and upper bounds that can tightly estimate large values of MI in our experiments. In settings where only a single marginal distribution is known, we improve upon existing variational methods by directly optimizing a tighter lower bound on MI, using energy-based training to estimate gradients and multi-sample AIS for evaluation. Our methods are particularly suitable for evaluating MI in deep generative models, since explicit forms for the marginal or joint densities are often available. We evaluate our bounds on estimating the MI of VAEs and GANs trained on the MNIST and CIFAR datasets, and showcase significant gains over existing bounds in these challenging settings with high ground truth MI.

Professor Jose Dolz

ETS Montreal

The role of the Shannon entropy as a regularizer of deep neural networks

With the advent of deep learning models a variety of additional terms have been integrated into the main learning objective, which typically serve as a regularizer of the model predictions. This is the case, for example, of the Shannon entropy, which has been widely used in semi-supervised learning to penalize high-entropy predictions, and therefore encourage confident predictions on the unlabeled samples. Nevertheless, having this term as the only learning function is not sufficient, as it obtains its minimum when all data points are assigned to the same class, typically yielding trivial solutions. To overcome this limitation, many recent works have coupled this term with a strong prior, which guides the entropy term, and avoids the model to converge towards such trivial solutions. In this talk, several relevant works where the Shannon entropy is coupled with other learning objectives during training, as well as in testing, will be presented, showing that minimizing the entropy on the predictions has the potential to provide state-of-the-art performances in a variety of learning scenarios, including: semi-supervised learning, few-shot learning, or unsupervised domain adaptation, among others.

Professor Abdellatif Zaidi

Université Paris-Est Marne la Vallée

Learning and Inference over Networks. Information-Theoretic Approaches, Architectures and Algorithms

In this keynote, we review recent advances on distributed learning architectures and algorithms, with a special focus on information-theoretic approaches. Specifically, we establish connections with network rate-distortion problems under logarithm loss measure; and, for some network topologies, we use that to provide optimal tradeoffs between accuracy and generalization capability. Examples of data models for which these tradeoffs can be computed analytically will be given. Furthermore, when data distributions are not known, we provide ways of parametrizing using neural networks. The resulting neural architecture, which we call in-network learning, applies to arbitrary network topologies that can be modeled as a directed graph; and ways of training it will be discussed. For multiaccess type network topologies (i.e., a number of devices, each holding a distinct feature and all accessing a common device whose goal is to infer on a correlated random variable), we will provide comprehensive comparisons with the so-called Federated Learning of Google Brain and Split Learning of MIT in terms of both accuracy and bandwidth requirements.

Professor Jose C. Principe

University of Florida

Review of Measures and Estimators of Statistical Dependence

The quantification of statistical dependence and divergence between probability density functions (PDFs) is becoming central to machine learning. This talk will review the history of the field and present a taxonomy for different methodologies. We will discuss measures of association, non-parametric reproducing kernel Hilbert space (RKHS) estimators for entropy, mutual information and divergence, and maximal correlation for functional estimation of statistical dependence using deep learning.