| Title: | C-Structures and F-Structures for the British National Corpus (In Proceedings of the Twelfth International Lexical Functional Grammar Conference LFG07) |
| Authors: | Joachim Wagner, Djamé Seddah, Jennifer Foster and Josef van Genabith, 2007 |
| Abstract: | We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%. |
| ICHEC Project: | Parsing the British National Corpus (100M Words) with Automatically Acquired Deep probabilistic LFG Resources |
| Publication: | CSLI Publications, Stanford University, 28-30, pages 418-438 |
| URL: | http://rian.ie/en/item/view/30472.html |
| Keywords: | Machine translating; lexical functional grammar |
| Status: | Published |