A hybrid architecture for robust parsing of german
Erhard W. Hinrichs
Frank H. Müller
- This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation.
Combining dependency parsing with PP attachment
- Prepositional phrase (PP) attachment is one of the major sources for errors in traditional statistical parsers. The reason for that lies in the type of information necessary for resolving structural ambiguities. For parsing, it is assumed that distributional information of parts-of-speech and phrases is sufficient for disambiguation. For PP attachment, in contrast, lexical information is needed. The problem of PP attachment has sparked much interest ever since Hindle and Rooth (1993) formulated the problem in a way that can be easily handled by machine learning approaches: In their approach, PP attachment is reduced to the decision between noun and verb attachment; and the relevant information is reduced to the two possible attachment sites (the noun and the verb) and the preposition of the PP. Brill and Resnik (1994) extended the feature set to the now standard 4-tupel also containing the noun inside the PP. Among many publications on the problem of PP attachment, Volk (2001; 2002) describes the only system for German. He uses a combination of supervised and unsupervised methods. The supervised method is based on the back-off model by Collins and Brooks (1995), the unsupervised part consists of heuristics such as ”If there is a support verb construction present, choose verb attachment”. Volk trains his back-off model on the Negra treebank (Skut et al., 1998) and extracts frequencies for the heuristics from the ”Computerzeitung”. The latter also serves as test data set. Consequently, it is difficult to compare Volk’s results to other results for German, including the results presented here, since not only he uses a combination of supervised and unsupervised learning, but he also performs domain adaptation. Most of the researchers working on PP attachment seem to be satisfied with a PP attachment system; we have found hardly any work on integrating the results of such approaches into actual parsers. The only exceptions are Mehl et al. (1998) and Foth and Menzel (2006), both working with German data. Mehl et al. report a slight improvement of PP attachment from 475 correct PPs out of 681 PPs for the original parser to 481 PPs. Foth and Menzel report an improvement of overall accuracy from 90.7% to 92.2%. Both integrate statistical attachment preferences into a parser. First, we will investigate whether dependency parsing, which generally uses lexical information, shows the same performance on PP attachment as an independent PP attachment classifier does. Then we will investigate an approach that allows the integration of PP attachment information into the output of a parser without having to modify the parser: The results of an independent PP attachment classifier are integrated into the parse of a dependency parser for German in a postprocessing step.