JCSE, vol. 7, no. 2, pp.99-111, 2013
DOI: http://dx.doi.org/10.5626/JCSE.2013.7.2.99
Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning
Masashi Sugiyama, Song Liu, and Marthinus Christoffel du Plessis, Masao Yamanaka, Makoto Yamada, Taiji Suzuki, Takafumi Kanamori
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan/ Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan/ NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan/ Department of Mathematical Informatics, The University of Tokyo, Tokyo, Japan/ Department of Computer Science and Mathematical Informatics, Nagoya University, Nagoya, Japan
Abstract: Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics,
information theory, and machine learning. A divergence approximator can be used for various purposes, such as two-sample
homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence
between the joint distribution and the product of marginals can be used for independence testing, which has a wide range
of applications, including feature selection and extraction, clustering, object matching, independent component analysis,
and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is
that directly approximating the divergence without estimating probability distributions is more sensible than a naive twostep
approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite
the overwhelming popularity of the Kullback-Leibler divergence as a divergence measure, we argue that alternatives
such as the Pearson divergence, the relative Pearson divergence, and the L2-distance are more useful in practice because
of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.
Keyword:
Machine learning; Probability distributions; Kullback-Leibler divergence; Pearson divergence; L2-distance
Full Paper: 224 Downloads, 2535 View
|