Skip to main content

Data Representations in Learning

  • Chapter
Learning from Data

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

  • 877 Accesses

Abstract

This paper examines the effect of varying the coarse-ness (or fine-ness) in a data representation upon the learning or recognition accuracy achievable. This accuracy is quantified by the least probability of error in recognition or the Bayes error rate, for a finite-class pattern recognition problem. We examine variation in recognition accuracy as a function of resolution, by modeling the granularity variation of the representation as a refinement of the underlying probability structure of the data. Specifically, refining the data representation leads to improved bounds on the probability of error. Indeed, this confirms the intuitive notion that more information can lead to improved decision-making. This analysis may be extended to multiresolution methods where coarse-to-fine and fineto-coarse variations in representations are possible.

We also discuss a general method to examine the effects of image resolution on recognizer performance. Empirical results in a 840-class Japanese optical character recognition task are presented. Considerable improvements in performance are observed as resolution increases from 40 to 200 ppi. However, diminshed performance improvements are observed at resolutions higher than 200 ppi. These results are useful in the design of optical character recognizers. We suggest that our results may be relevant to human letter recognition studies, where such an objective evaluation of the task is required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. K. R. Alexander, W. Xie, and D. J. Derlacki. Spatial-frequency characteristics of letter identification. Journal of the Optical Society of America A, 11: 2375–2382, 1994.

    Article  Google Scholar 

  2. H. S. Baird. Document image defect models and their uses. In Proceedings of ICDAR, 1993, 1993.

    Google Scholar 

  3. R. E. Blahut. Principles and Practice of Information Theory. Addison-Wesley, 1990.

    Google Scholar 

  4. T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley, 1991.

    Google Scholar 

  5. R. E. Crochiere and L. R. Rabiner. Interpolation and decimation of digital signals: A tutorial review. Proceedings of the IEEE, 69: 300–331, 1981.

    Article  Google Scholar 

  6. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley, 1973.

    Google Scholar 

  7. M. Feder and N. Merhay. Relations between entropy and error probability. IEEE Transactions on Information Theory, 1994.

    Google Scholar 

  8. A. K. Jain and B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. Handbook of Statistics-Classification, Pattern Recognition and Reduction of Dimensionality, Ed. P. R. Krishnaiah L. N. Kanal, 2: 835–855, 1982.

    Google Scholar 

  9. B. Kanal, L. & Chandrasekaran. On dimensionality and sample size in statistical pattern recognition. Pattern Recognition, 3: 225–234, 1971.

    Article  Google Scholar 

  10. D. Lee, T. Pavlidis, and G. W. Wasilkowski. A note on the trade-off between sampling and quantization in signal processing. Journal of Complexity, 3: 359–371, 1987.

    Article  MathSciNet  MATH  Google Scholar 

  11. A. V. Oppenheim and R. W. Schaeffer. Discrete-time Signal Processing. Prentice-Hall, 1989.

    Google Scholar 

  12. P. Palumbo, S. N. Srihari, J. Soh, R. Sridhar, and V. Demjanenko. Postal address block location in real time. IEEE Computer, pages 34–42, 1992.

    Google Scholar 

  13. D. H. Parish and G. Sperling. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Research, 31: 1399–1415, 1991.

    Article  Google Scholar 

  14. S. N. Srihari. High-performance reading machines. Proceedings of the IEEE, 80: 1120 1132, 1992.

    Google Scholar 

  15. S. N. Srihari and J. J. Hull. Character recognition. Encyclopaedia of Artificial Intelligence, 1, 1992.

    Google Scholar 

  16. G. Srikantan. Image Sampling Rate and Image Pattern Recognition. Doctoral Dissertation, Department of Computer Science, SUNY at Buffalo, 1994.

    Google Scholar 

  17. G. Srikantan and S. N. Srihari. A study relating image sampling rate and image pattern recognition. In CVPR-94. IEEE Press, 1994.

    Google Scholar 

  18. W. G. Waller and A. K. Jain. On the monotonicity of the performance of bayesian classifiers. IEEE Transactions on Information Theory, 24: 392–394, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  19. L. Wang and T. Pavlidis. Direct gray-scale extraction of features for character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15: 1053 1067, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Srikantan, G., Srihari, S.N. (1996). Data Representations in Learning. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2404-4_29

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-94736-5

  • Online ISBN: 978-1-4612-2404-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics