Data Representations in Learning

Srikantan, Geetha; Srihari, Sargur N.

doi:10.1007/978-1-4612-2404-4_29

Geetha Srikantan³ &
Sargur N. Srihari³

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

877 Accesses

Abstract

This paper examines the effect of varying the coarse-ness (or fine-ness) in a data representation upon the learning or recognition accuracy achievable. This accuracy is quantified by the least probability of error in recognition or the Bayes error rate, for a finite-class pattern recognition problem. We examine variation in recognition accuracy as a function of resolution, by modeling the granularity variation of the representation as a refinement of the underlying probability structure of the data. Specifically, refining the data representation leads to improved bounds on the probability of error. Indeed, this confirms the intuitive notion that more information can lead to improved decision-making. This analysis may be extended to multiresolution methods where coarse-to-fine and fineto-coarse variations in representations are possible.

We also discuss a general method to examine the effects of image resolution on recognizer performance. Empirical results in a 840-class Japanese optical character recognition task are presented. Considerable improvements in performance are observed as resolution increases from 40 to 200 ppi. However, diminshed performance improvements are observed at resolutions higher than 200 ppi. These results are useful in the design of optical character recognizers. We suggest that our results may be relevant to human letter recognition studies, where such an objective evaluation of the task is required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-Resolution Geometric Analysis for Data in High Dimensions

Sparse Representation for Machine Learning

Efficient Dictionary Learning with Sparseness-Enforcing Projections

Article 11 February 2015

References

K. R. Alexander, W. Xie, and D. J. Derlacki. Spatial-frequency characteristics of letter identification. Journal of the Optical Society of America A, 11: 2375–2382, 1994.
Article Google Scholar
H. S. Baird. Document image defect models and their uses. In Proceedings of ICDAR, 1993, 1993.
Google Scholar
R. E. Blahut. Principles and Practice of Information Theory. Addison-Wesley, 1990.
Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley, 1991.
Google Scholar
R. E. Crochiere and L. R. Rabiner. Interpolation and decimation of digital signals: A tutorial review. Proceedings of the IEEE, 69: 300–331, 1981.
Article Google Scholar
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley, 1973.
Google Scholar
M. Feder and N. Merhay. Relations between entropy and error probability. IEEE Transactions on Information Theory, 1994.
Google Scholar
A. K. Jain and B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. Handbook of Statistics-Classification, Pattern Recognition and Reduction of Dimensionality, Ed. P. R. Krishnaiah L. N. Kanal, 2: 835–855, 1982.
Google Scholar
B. Kanal, L. & Chandrasekaran. On dimensionality and sample size in statistical pattern recognition. Pattern Recognition, 3: 225–234, 1971.
Article Google Scholar
D. Lee, T. Pavlidis, and G. W. Wasilkowski. A note on the trade-off between sampling and quantization in signal processing. Journal of Complexity, 3: 359–371, 1987.
Article MathSciNet MATH Google Scholar
A. V. Oppenheim and R. W. Schaeffer. Discrete-time Signal Processing. Prentice-Hall, 1989.
Google Scholar
P. Palumbo, S. N. Srihari, J. Soh, R. Sridhar, and V. Demjanenko. Postal address block location in real time. IEEE Computer, pages 34–42, 1992.
Google Scholar
D. H. Parish and G. Sperling. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Research, 31: 1399–1415, 1991.
Article Google Scholar
S. N. Srihari. High-performance reading machines. Proceedings of the IEEE, 80: 1120 1132, 1992.
Google Scholar
S. N. Srihari and J. J. Hull. Character recognition. Encyclopaedia of Artificial Intelligence, 1, 1992.
Google Scholar
G. Srikantan. Image Sampling Rate and Image Pattern Recognition. Doctoral Dissertation, Department of Computer Science, SUNY at Buffalo, 1994.
Google Scholar
G. Srikantan and S. N. Srihari. A study relating image sampling rate and image pattern recognition. In CVPR-94. IEEE Press, 1994.
Google Scholar
W. G. Waller and A. K. Jain. On the monotonicity of the performance of bayesian classifiers. IEEE Transactions on Information Theory, 24: 392–394, 1978.
Article MathSciNet MATH Google Scholar
L. Wang and T. Pavlidis. Direct gray-scale extraction of features for character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15: 1053 1067, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

CEDAR, Department of Computer Science, State University of New York at Buffalo, 202 UB Commons, 520 Lee Entrance, Amherst, NY, 14228-2567, USA
Geetha Srikantan & Sargur N. Srihari

Authors

Geetha Srikantan
View author publications
You can also search for this author in PubMed Google Scholar
Sargur N. Srihari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Vanderbilt University, Box 1679, Station B, Nashville, Tennessee, 37235, USA
Doug Fisher
Department of Economics Institute of Statistics and Econometrics, Free University of Berlin, 14185, Berlin, Garystre 21, Germany
Hans-J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Srikantan, G., Srihari, S.N. (1996). Data Representations in Learning. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_29

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2404-4_29
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Data Representations in Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multi-Resolution Geometric Analysis for Data in High Dimensions

Sparse Representation for Machine Learning

Efficient Dictionary Learning with Sparseness-Enforcing Projections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Representations in Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multi-Resolution Geometric Analysis for Data in High Dimensions

Sparse Representation for Machine Learning

Efficient Dictionary Learning with Sparseness-Enforcing Projections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation