| Title: | Parser-based retraining for domain adaptation of probabilistic generators |
| Authors: | Hogan et al., 2008 |
| Abstract: | While the effect of domain variation on Penn-treebank-
trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus
results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically
produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data. |
| ICHEC Project: | Parsing the British National Corpus (100M Words) with Automatically Acquired Deep probabilistic LFG Resources |
| URL: | http://www.aclweb.org/anthology/W/W08/ |
| Status: | Published |