Algorithms and lower bounds for testing properties of structured distributions
In this doctoral thesis we consider various property testing problems for structured distributions. A distribution is said to be structured if it belongs to a certain class which can be simply described in approximation terms. Such distributions often arise in practice, e.g. log-concave distributions, easily approximated by polynomials (see [Bir87a]), often appear in econometric research. For structured distributions, testing a property often requires far less samples than for general unrestricted distributions. In this thesis we prove that this is indeed the case for several distance-related properties. Namely, we give explicit sub-linear time algorithms for L1 and L2 distance testing between two structured distributions for the cases when either one or both of them are available as a “black box”. We also prove that the given algorithms have the best possible asymptotic complexity by proving matching lower bounds in the form of explicit problem instances (albeit constructed using randomized techniques) demanding at least a specified amount of data to be tested successfully. As the main numerical result, we prove that testing that total variation distance to an explicitly given distribution is at least e requires O(√k/e²) samples, where k is an approximation parameter, dependent on the class of distribution being tested and independent of the support size. Testing that the total variation distance between two “black box” distributions is at least e requires O(k⁴/⁵e⁶/⁵). In some cases, when k ~ n, this result may be worse than using an unrestricted testing algorithm (which requires O( n²/3/e² ) samples where n is the domain size). To address this issue, we develop a third algorithm, which requires O(k²/³e⁴/³ log⁴/³(n/k) log log(n/k)) and serves as a bridge between the cases of small and large domain sizes.