How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges

In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these resul
In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
show moreshow less

Export metadata

  • Export Bibtex
  • Export RIS

Additional Services

    Share in Twitter Search Google Scholar
Metadaten
Author:Sandra Kübler
URN:urn:nbn:de:hebis:30-1110588
Document Type:Article
Language:English
Date of Publication (online):2008/10/21
Year of first Publication:2005
Publishing Institution:Univ.-Bibliothek Frankfurt am Main
Release Date:2008/10/21
Source:http://jones.ling.indiana.edu/~skuebler/papers/treebanks.pdf ; (in:) Proceedings of RANLP 2005 - Borovets, 2005.
HeBIS PPN:206763557
Dewey Decimal Classification:400 Sprache
Sammlungen:Linguistik
Linguistic-Classification:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Licence (German):License Logo Veröffentlichungsvertrag für Publikationen

$Rev: 11761 $