skip to main content
10.1145/2688073.2688095acmconferencesArticle/Chapter ViewAbstractPublication PagesitcsConference Proceedingsconference-collections
research-article

Lp Testing and Learning of Discrete Distributions

Published: 11 January 2015 Publication History

Abstract

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general lp metrics. The intuitions and results often contrast with the classic l1 case. For p > 1, we can learn and test with a number of samples that is independent of the support size of the distribution: For 1 < p < 2, with a lp distance parameter ε, O(ü√1/εq) samples suffice for testing uniformity and O(1/εq) samples suffice for learning, where q=p/(p-1) is the conjugate of p. These bounds are tight precisely when the support size n of the distribution exceeds 1/εq, which seems to act as an upper bound on the "apparent" support size.
For some lp metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if p > 4/3. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all 1 ≤ p ≤ 2. Another algorithm gives order-optimal sample complexity for l uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all lp metrics.
The author thanks Cément Canonne for discussions and contributions to this work.

References

[1]
T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing closeness of discrete distributions. Journal of the ACM (JACM), 60(1):4, 2013.
[2]
P. Berman, S. Raskhodnikova, and G. Yaroslavtsev. Testing with respect to'p distances. In Proceedings, ACM Symp. on Theory of Computing (STOC), volume 6, 2014.
[3]
C. Canonne. Private communication, 2014. In collaboration with the author.
[4]
S.-O. Chan, I. Diakonikolas, P. Valiant, and G. Valiant. Optimal algorithms for testing closeness of discrete distributions. In SODA, pages 1193{1203. SIAM, 2014.
[5]
G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). Knowledge and Data Engineering, IEEE Transactions on, 15(3):529--540, May 2003.
[6]
C. Daskalakis, I. Diakonikolas, R. ODonnell, R. A. Servedio, and L.-Y. Tan. Learning sums of independent integer random variables. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 217--226. IEEE, 2013.
[7]
K. Do Ba, H. L. Nguyen, H. N. Nguyen, and R. Rubinfeld. Sublinear time algorithms for earth mover^a A, 48(2):428--442, 2011.
[8]
A. Dvoretzky, J. Kiefer, and J. Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642--669, 1956.
[9]
O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. In Electronic Colloquium on Computational Complexity. 2000.
[10]
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307--323, May 2006.
[11]
M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. lp-norm multiple kernel learning. J. Mach. Learn. Res., 12:953--997, July 2011.
[12]
J. R. Lee and A. Naor. Embedding the diamond graphinlp and dimension reduction inl1. Geometric & Functional Analysis GAFA, 14(4):745--747, 2004.
[13]
L. Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. Information Theory, IEEE Transactions on, 54(10):4750--4755, 2008.
[14]
R. Rubinfeld. Taming big probability distributions. XRDS, 19(1):24--28, Sept. 20

Cited By

View all
  • (2023)The Minimax Risk in Testing the Histogram of Discrete Distributions for Uniformity under Missing Ball Alternatives2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/Allerton58177.2023.10313383(1-8)Online publication date: 26-Sep-2023
  • (2021)Random restrictions of high dimensional distributions and uniformity testing with subcube conditioningProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458085(321-336)Online publication date: 10-Jan-2021
  • (2020)Learning discrete distributions with infinite supportProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496056(3942-3951)Online publication date: 6-Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ITCS '15: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science
January 2015
404 pages
ISBN:9781450333337
DOI:10.1145/2688073
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. discrete distributions
  2. learning
  3. lp norms
  4. property testing
  5. uniformity testing

Qualifiers

  • Research-article

Conference

ITCS'15
Sponsor:
ITCS'15: Innovations in Theoretical Computer Science
January 11 - 13, 2015
Rehovot, Israel

Acceptance Rates

ITCS '15 Paper Acceptance Rate 45 of 159 submissions, 28%;
Overall Acceptance Rate 172 of 513 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Minimax Risk in Testing the Histogram of Discrete Distributions for Uniformity under Missing Ball Alternatives2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/Allerton58177.2023.10313383(1-8)Online publication date: 26-Sep-2023
  • (2021)Random restrictions of high dimensional distributions and uniformity testing with subcube conditioningProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458085(321-336)Online publication date: 10-Jan-2021
  • (2020)Learning discrete distributions with infinite supportProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496056(3942-3951)Online publication date: 6-Dec-2020
  • (2019)AnacondaProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310478(679-693)Online publication date: 6-Jan-2019
  • (2019)Testing Ising ModelsIEEE Transactions on Information Theory10.1109/TIT.2019.293225565:11(6829-6852)Online publication date: Nov-2019
  • (2018)Which distribution distances are sublinearly testable?Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175479(2747-2764)Online publication date: 7-Jan-2018
  • (2018)Adaptive sampling for rapidly matching histogramsProceedings of the VLDB Endowment10.14778/3231751.323175311:10(1262-1275)Online publication date: 1-Jun-2018
  • (2015)Sample Complexity for Winner Prediction in ElectionsProceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems10.5555/2772879.2773334(1421-1430)Online publication date: 4-May-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media