Algorithms and lower bounds for testing properties of structured distributions
View/ Open
Date
29/11/2016Author
Nikishkin, Vladimir
Metadata
Abstract
In this doctoral thesis we consider various property testing problems for structured
distributions. A distribution is said to be structured if it belongs to a certain class
which can be simply described in approximation terms. Such distributions often arise
in practice, e.g. log-concave distributions, easily approximated by polynomials (see
[Bir87a]), often appear in econometric research. For structured distributions, testing a
property often requires far less samples than for general unrestricted distributions.
In this thesis we prove that this is indeed the case for several distance-related properties.
Namely, we give explicit sub-linear time algorithms for L1 and L2 distance
testing between two structured distributions for the cases when either one or both of
them are available as a “black box”.
We also prove that the given algorithms have the best possible asymptotic complexity
by proving matching lower bounds in the form of explicit problem instances (albeit
constructed using randomized techniques) demanding at least a specified amount of
data to be tested successfully.
As the main numerical result, we prove that testing that total variation distance
to an explicitly given distribution is at least e requires O(√k/e²) samples, where k is an approximation parameter, dependent on the class of distribution being tested and
independent of the support size. Testing that the total variation distance between two
“black box” distributions is at least e requires O(k⁴/⁵e⁶/⁵). In some cases, when k ~ n,
this result may be worse than using an unrestricted testing algorithm (which requires
O( n²/3/e² ) samples where n is the domain size). To address this issue, we develop a third
algorithm, which requires O(k²/³e⁴/³ log⁴/³(n/k) log log(n/k)) and serves as a bridge between
the cases of small and large domain sizes.