skip to main content
article
Free access

The maximum entropy approach and probabilistic IR models

Published: 01 July 2000 Publication History

Abstract

This paper takes a fresh look at modeling approaches to information retrieval that have been the basis of much of the probabilistically motivated IR research over the last 20 years. We shall adopt a subjectivist Bayesian view of probabilities and argue that classical work on probabilistic retrieval is best understood from this perspective. The main focus of the paper will be the ranking formulas corresponding to the Binary Independence Model (BIM), presented originally by Roberston and Sparck Jones [1977] and the Combination Match Model (CMM), developed shortly thereafter by Croft and Harper [1979]. We will show how these same ranking formulas can result from a probabilistic methodology commonly known as Maximum Entropy (MAXENT).

References

[1]
BEEFERMAN, D., BERGER, A., AND LAFFERTY, J. 1997. Text segmentation using exponential models. In Proceedings of Empirical Methods in Natural Language Processing.
[2]
BRETTHORST, G. L. 1988. Excerpts from bayesian spectrum analysis and parameter estima-tion. In Maximum Entropy and Bayesian Methods in Science and Engineering,G.J. Erickson and C. R. Smith, Eds. Kluwer Academic Publishers, Norwell, MA, 75-146.
[3]
CHIANG, A. C. 1967. Fundamental Methods of Mathematical Economics. McGraw-Hill, New York.
[4]
COOPER, W. S. 1983. Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science 34, 1, 31-39.
[5]
COOPER, W. S. 1991. Some inconsistencies and misnomers in probabilistic information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, Eds. Chicago, Illinois, USA, 57-61.
[6]
COOPER,W.S.AND HUIZINGA, P. 1982. The maximum entropy principle and its application to the design of probabilistic retrieval systems. Information Technology, Research & Devel-opment 1, 99-112.
[7]
CROFT,W.B.AND HARPER, D. J. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (Dec.), 285-295.
[8]
DARROCH,J.AND RATCLIFF, D. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics 43, 1470-1480.
[9]
DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences, S. Kotz and N. L. Johnson, Eds. Vol. 7. Wiley, New York, 210-218.
[10]
DEGROOT,M.AND FEINBERG, S. 1982. The comparison and evaluation of forecasters. The Statistician 32, 12-22.
[11]
DELLA PIETRA, S., DELLA PIETRA, V., AND LAFFERTY, J. 1997. Inducing features of random fields. IEEE PAMI 19,3.
[12]
ERICKSON,G.J.AND SMITH, C. R. 1988. Maximum Entropy and Bayesian Methods in Science and Engineering. Kluwer Academic Publishers, Norwell, MA.
[13]
FINE, T. L. 1973. Theories of Probability: An Examination of Foundations. Academic Press,New York.
[14]
GOLAN, A., JUDGE,G.G.,AND MILLER, D. 1996. Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New York.
[15]
GOOD, I. J. 1950. Probability and the Weighing of Evidence. Charles Griffin, London.
[16]
GOOD, I. J. 1960. Weight of evidence, corroboration, explanatory power, information and the utility of experiments. Journal of the Royal Statistical Society:Series B. 22, 319-331.
[17]
GULL,S.F.AND DANIELL, G. J. 1978. Image reconstruction from incomplete and noisy data. Nature 272, 686-690.
[18]
HACKING, I. 1965. Logic of Statistical Inference. Cambridge University Press, Cambridge.
[19]
HARPER,D.J.AND VAN RIJSBERGEN, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 3 (Sept.), 189-216.
[20]
JAYNES, E. T. 1957a. Information theory and statistical mechanics: Part I. Physical Review 106, 620-630.
[21]
JAYNES, E. T. 1957b. Information theory and statistical mechanics: Part II. Physical Review 108, 171.
[22]
JAYNES, E. T. 1963. Information theory and statistical mechanics. In Statistical Physics: Brandeis Summer Institute Lectures in Theoretical Physics, G. E. Uhlenbeck, Ed. Brandeis Summer Institute Lectures in Theoretical Physics, vol. 3. W. A. Benjamin, New York, 182-218.
[23]
JAYNES, E. T. 1979. Where do we stand on maximum entropy. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 15-118.
[24]
JAYNES, E. T. 1994. Probability theory: The logic of science. Available via ftp://bayes. wustl.edu/pub/Jaynes/book.probability.theory/.
[25]
KANTOR, P. B. 1984. Maximum entropy and the optimal design of automated information retrieval systems. Information Technology, Research and Development 3, 2 (Apr.), 88-94.
[26]
KANTOR,P.B.AND LEE, J. J. 1986. The maximum entropy principle in information retrieval. In Proceedings of the 9th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, F. Rabitti, Ed. Pisa, Italy, 269-274.
[27]
KANTOR,P.B.AND LEE, J. J. 1998. Testing the maximum entropy principle for information retrieval. Journal of the American Society for Information Science 49, 6, 557-566.
[28]
LEE,J.J.AND KANTOR, P. B. 1991. A study of probabilistic information retrieval in the case of inconsistent expert judgments. Journal of the American Society for Information Science 42, 3, 166-172.
[29]
MARSHALL,K.T.AND OLIVER, R. M. 1995. Decision Making and Forecasting: with Emphasis on Model Building and Policy Analysis. McGraw-Hill, New York.
[30]
ROBERTSON, S. E. 1977. The probability ranking principle in IR. Journal of Documentation 33, 294-304.
[31]
ROBERTSON,S.E.AND SPARCK JONES, K. 1977. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-146.
[32]
SALTON, G., WONG, A., AND YU, C. T. 1976. Automatic indexing using term discrimination and term precision measurements. Information Processing and Management 12, 43-51.
[33]
SHANNON, C. E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379-423 & 623-656.
[34]
SMEATON,A.F.AND VAN RIJSBERGEN, C. J. 1983. The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal 25, 3, 239-246.
[35]
SPARCK JONES, K. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21.
[36]
TRIBUS, M. 1969. Rational Descriptions, Decisions, and Designs. Pergamon-Hall, New York.
[37]
TRIBUS, M. 1979. Thirty years of information theory. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 1-14.
[38]
VAN RIJSBERGEN, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106-119.
[39]
VAN RIJSBERGEN, C. J. 1979. Information Retrieval, 2 ed. Butterworths, London.

Cited By

View all
  • (2019)Integrating learned and explicit document features for reputation monitoring in social mediaKnowledge and Information Systems10.1007/s10115-019-01383-wOnline publication date: 19-Jul-2019
  • (2016)How Informative is a Term?Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914687(853-856)Online publication date: 7-Jul-2016
  • (2009)Uncertainty management in rule-based information extraction systemsProceedings of the 2009 ACM SIGMOD International Conference on Management of data10.1145/1559845.1559858(101-114)Online publication date: 29-Jun-2009
  • Show More Cited By

Index Terms

  1. The maximum entropy approach and probabilistic IR models

    Recommendations

    Reviews

    Caroline Merriam Eastman

    This paper explores existing probabilistic models of information retrieval and their relationship to alternative probabilistic models based upon the principle of maximum entropy. It shows that the formulas for document matching and retrieval used in the binary independence model and the combination match model can be derived using the principle of maximum entropy. This principle states that, in the absence of full information about a probability distribution, the appropriate distribution to assume is that which gives maximum entropy under the constraints imposed by the partial information. The authors argue that this approach provides an appropriate unifying framework for probabilistic models of information retrieval. Probabilistic models of information retrieval require for implementation either full knowledge of the joint probability distribution of query terms in relevant and nonrelevant documents or the use of simplifying assumptions of some kind. The binary independence model assumes that relevance judgements are known and that query term occurrences are independent in both relevant and nonrelevant documents. The combination match model estimates the probability of a query term appearing in a nonrelevant document by using information about overall query term distributions; it also assumes that all query terms are equally likely to appear in relevant documents. No information about relevance judgements is used. Both models are shown to be equivalent to maximum entropy models using appropriate sets of constraints. Both coordination matching and inverse document frequency matching are also shown to be equivalent to maximum entropy models using even fewer constraints. The paper is well organized and well written. It should be of interest to researchers in this area. It assumes a reasonable background in mathematical models of information retrieval, especially probabilistic models. However, it is well worth reading for those with such a background.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 18, Issue 3
    July 2000
    111 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/352595
    • Editor:
    • W. Bruce Croft
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2000
    Published in TOIS Volume 18, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. idf weighting
    2. binary independence model
    3. combination match
    4. linked dependence
    5. probability ranking principle

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Integrating learned and explicit document features for reputation monitoring in social mediaKnowledge and Information Systems10.1007/s10115-019-01383-wOnline publication date: 19-Jul-2019
    • (2016)How Informative is a Term?Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914687(853-856)Online publication date: 7-Jul-2016
    • (2009)Uncertainty management in rule-based information extraction systemsProceedings of the 2009 ACM SIGMOD International Conference on Management of data10.1145/1559845.1559858(101-114)Online publication date: 29-Jun-2009
    • (2008)Applying maximum entropy to known-item email retrievalProceedings of the IR research, 30th European conference on Advances in information retrieval10.5555/1793274.1793324(406-413)Online publication date: 30-Mar-2008
    • (2008)Applying Maximum Entropy to Known-Item Email RetrievalAdvances in Information Retrieval10.1007/978-3-540-78646-7_37(406-413)Online publication date: 2008
    • (2007)Ranking Issues for Information IntegrationProceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop10.1109/ICDEW.2007.4401001(257-260)Online publication date: 17-Apr-2007
    • (2007)Consistent selectivity estimation via maximum entropyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-006-0030-116:1(55-76)Online publication date: 25-Jan-2007
    • (2006)Consolidation of Diversifying Terms Weighting Impact on IR System PerformancesInformation Technology Journal10.3923/itj.2006.7.125:1(7-12)Online publication date: 1-Jan-2006
    • (2006)Term context models for information retrievalProceedings of the 15th ACM international conference on Information and knowledge management10.1145/1183614.1183694(559-566)Online publication date: 6-Nov-2006
    • (2005)Consistently estimating the selectivity of conjuncts of predicatesProceedings of the 31st international conference on Very large data bases10.5555/1083592.1083638(373-384)Online publication date: 30-Aug-2005
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media