skip to main content
10.1145/1363686.1363790acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Stacked dependency networks for layout document structuring

Published: 16 March 2008 Publication History

Abstract

We address the problems of structuring and annotation of layout-oriented documents. We model the annotation problems as the collective classification on graph-like structures with typed instances and links that capture the domain-specific knowledge. We use the relational dependency networks (RDNs) for the collective inference on the multi-typed graphs. We then describe a variant of RDNs where a stacked approximation replaces the Gibbs sampling in order to accelerate the inference. We report results of evaluation tests for both the Gibbs sampling and stacking inference on two document structuring examples.

References

[1]
Crf++: http://chasen.org/taku/software/crf++/.
[2]
A. Berger, S. Della Pietra, and V. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71, 1996.
[3]
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[4]
H. Dejean and J.-L. Meunier. Structuring documents according to their table of contents. In Proc. ACM DocEng'05, pp. 2--9, 2005.
[5]
S. Feng, R. Manmatha, and A. McCallum. Exploring the use of conditional random field models and hmms for historical handwritten document recognition. In Proc. DIAL'06, pp. 30--37, 2007.
[6]
C. Jensen, A. Kong, and U. Kjaerulff. Blocking Gibbs sampling in very large probabilistic expert systems. Intern. Journal of Human Computer Studies, 42:647--666, 1995.
[7]
G. E. Kopec and P. A. Chou. Document image decoding using markov source models. IEEE Trans. Pattern Anal. Mach. Intell., 16(6):602--617, 1994.
[8]
Z. Kou and W. Cohen. Stacked graphical models for efficient inference in Markov random fields. In Proc. SIAM Data Mining, 2007.
[9]
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proc. ICML'01, pp. 282--289, 2001.
[10]
P. Liang, M. Narasimhan, M. Shilman, et al. Efficient geometric algorithms for parsing in two dimensions. In Proc. ICDAR'05, pp. 1172--1177, 2005.
[11]
S. Mao, A. Rosenfeld, and T. Kanungo. Document structure analysis algorithms: a literature survey. In Proc. SPIE Electronic Imaging, Vol. 5010, page 197, 2003.
[12]
J. Neville and D. Jensen. Relational dependency networks. Journal of Machine Learning Research, 8:653--692, 2007.
[13]
S. Shetty, H. Srinivasan, M. Beal, et al. Segmentation and labeling of documents using Conditional Random Fields. In Proceedings of SPIE, 2007.

Cited By

View all
  • (2011)Machine Learning for Document Structure RecognitionModeling, Learning, and Processing of Text Technological Data Structures10.1007/978-3-642-22613-7_12(221-247)Online publication date: 2011
  • (2009)Simulated Iterative Classification A New Learning Procedure for Graph LabelingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-642-04174-7_4(47-62)Online publication date: 2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dependency networks
  2. document structuring
  3. evaluation
  4. stacking

Qualifiers

  • Research-article

Conference

SAC '08
Sponsor:
SAC '08: The 2008 ACM Symposium on Applied Computing
March 16 - 20, 2008
Fortaleza, Ceara, Brazil

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Machine Learning for Document Structure RecognitionModeling, Learning, and Processing of Text Technological Data Structures10.1007/978-3-642-22613-7_12(221-247)Online publication date: 2011
  • (2009)Simulated Iterative Classification A New Learning Procedure for Graph LabelingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-642-04174-7_4(47-62)Online publication date: 2009

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media