<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>OPUS 4 Latest Documents RSS Feed</title>
    <description>Latest documents</description>
    <link>http://publikationen.stub.uni-frankfurt.de/index/index/</link>
    <pubDate>Tue, 05 May 2009 15:55:08 +0200</pubDate>
    <lastBuildDate>Tue, 05 May 2009 15:55:08 +0200</lastBuildDate>
    <item>
      <title>A Testsuite for Testing Parser Performance onComplex German Grammatical Constructions</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/11889</link>
      <description>Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.</description>
      <author>Sandra Kübler; Ines Rehbein; Josef van Genabith</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/11889</guid>
      <pubDate>Tue, 05 May 2009 15:55:08 +0200</pubDate>
    </item>
    <item>
      <title>Parsing coordinations</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/11883</link>
      <description>The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.</description>
      <author>Sandra Kübler; Erhard Hinrichs; Wolfgang Maier; Eva Klett</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/11883</guid>
      <pubDate>Tue, 05 May 2009 15:29:52 +0200</pubDate>
    </item>
    <item>
      <title>Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9942</link>
      <description/>
      <author>Sandra Kübler; Jakub Piskorski; Adam Przepiorkowski</author>
      <category>conferenceobject</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9942</guid>
      <pubDate>Mon, 03 Nov 2008 15:48:54 +0100</pubDate>
    </item>
    <item>
      <title>Why is German dependency parsing more reliable than constituent parsing?</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9939</link>
      <description>In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.</description>
      <author>Sandra Kübler; Jelena Prokic</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9939</guid>
      <pubDate>Mon, 03 Nov 2008 15:20:56 +0100</pubDate>
    </item>
    <item>
      <title>What linguists always wanted to know about german and did not know how to estimate</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9938</link>
      <description>This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.</description>
      <author>Erhard W. Hinrichs; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9938</guid>
      <pubDate>Mon, 03 Nov 2008 15:13:23 +0100</pubDate>
    </item>
    <item>
      <title>Treebank profiling of spoken and written German</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9937</link>
      <description>This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.</description>
      <author>Erhard W. Hinrichs; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9937</guid>
      <pubDate>Mon, 03 Nov 2008 15:05:16 +0100</pubDate>
    </item>
    <item>
      <title>Towards case-based parsing : are chunks reliable indicators for syntax trees?</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9935</link>
      <description>This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.</description>
      <author>Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9935</guid>
      <pubDate>Mon, 03 Nov 2008 15:00:55 +0100</pubDate>
    </item>
    <item>
      <title>Towards a dependency-oriented evaluation for partial parsing</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9934</link>
      <description>Quantitative evaluation of parsers has traditionally centered around the PARSEVAL measures of crossing brackets, (labeled) precision, and (labeled) recall. However, it is well known that these measures do not give an accurate picture of the quality of the parsers output. Furthermore, we will show that they are especially unsuited for partial parsers. In recent years, research has concentrated on dependencybased evaluation measures. We will show in this paper that such a dependency-based evaluation scheme is particularly suitable for partial parsers. TüBa-D, the treebank used here for evaluation, contains all the necessary dependency information so that the conversion of trees into a dependency structure does not have to rely on heuristics. Therefore, the dependency representations are not only reliable, they are also linguistically motivated and can be used for linguistic purposes.</description>
      <author>Sandra Kübler; Heike Telljohann</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9934</guid>
      <pubDate>Mon, 03 Nov 2008 14:56:15 +0100</pubDate>
    </item>
    <item>
      <title>The Tüba-D/Z treebank: annotating German with a context-free backbone</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9933</link>
      <description>The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.</description>
      <author>Heike Telljohann; Erhard Hinrichs; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9933</guid>
      <pubDate>Mon, 03 Nov 2008 14:51:42 +0100</pubDate>
    </item>
    <item>
      <title>The PaGe 2008 shared task on parsing German</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9931</link>
      <description>The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.</description>
      <author>Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9931</guid>
      <pubDate>Mon, 03 Nov 2008 14:46:14 +0100</pubDate>
    </item>
    <item>
      <title>The CoNLL 2007 shared task on dependency parsing</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9930</link>
      <description>The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.</description>
      <author>Joakim Nivre; Johan Hall; Sandra Kübler; Ryan McDonald; Jens Nilsson; Sebastian Riedel; Deniz Yuret</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9930</guid>
      <pubDate>Mon, 03 Nov 2008 14:41:49 +0100</pubDate>
    </item>
    <item>
      <title>Sometimes less is more : Romanian word sense disambiguation revisited</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9929</link>
      <description>Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation.</description>
      <author>Georgiana Dinu; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9929</guid>
      <pubDate>Mon, 03 Nov 2008 14:34:47 +0100</pubDate>
    </item>
    <item>
      <title>Robustes chunkparsing mit variabler Analysetiefe</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9928</link>
      <description>Das Chunkparsing bietet einen besonders vielversprechenden Ansatz zum robusten, partiellen Parsing mit dem Ziel einer breiten Datenabdeckung. Ziel beim Chunkparsing ist eine partielle, nicht-rekursive syntaktische Struktur. Dieser extrem effiziente Parsing-Ansatz läßt sich als Kaskade endlicher Transducer realisieren. In diesem Beitrag wird TüSBL vorgestellt, ein System, bei dem die Eingabe aus spontaner, gesprochener Spache besteht, die dem Parser in Form eines Worthypothesengraphen aus einem Spracherkenner zur Verfügung gestellt wird. Chunkparsing ist für eine solche Anwendung besonders geeignet, da es fragmentarische oder nicht wohlgeformte Äußerungen robust behandeln kann. Des weiteren wird eine Baumkonstruktionskomponente vorgestellt, die die partiellen Chunkstrukturen zu vollständigen Bäumen mit grammatischen Funktionen erweitert. Das System wird anhand manuell überprüfter Systemeingaben evaluiert, da sich die üblichen Evaluationsparameter hierfür nicht eignen.</description>
      <author>Sandra Kübler; Erhard W. Hinrichs</author>
      <category>bookpart</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9928</guid>
      <pubDate>Mon, 03 Nov 2008 14:25:38 +0100</pubDate>
    </item>
    <item>
      <title>Recent developments in linguistic annotations of the TüBa-D/Z treebank</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9888</link>
      <description>The purpose of this paper is to describe recent developments in the morphological, syntactic, and semantic annotation of the TüBa-D/Z treebank of German. The TüBa-D/Z annotation scheme is derived from the Verbmobil treebank of spoken German [4, 10], but has been extended along various dimensions to accommodate the characteristics of written texts. TüBa-D/Z uses as its data source the "die tageszeitung" (taz) newspaper corpus. The Verbmobil treebank annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German [3, 6]. The TüBa-D/Z annotation relies on a context-free backbone (i.e. proper trees without crossing branches) of phrase structure combined with edge labels that specify the grammatical function of the phrase in question. The syntactic annotation scheme of the TüBa-D/Z is described in more detail in [12, 11]. TüBa-D/Z currently comprises approximately 15 000 sentences, with approximately 7 000 sentences being in the correction phase. The latter will be released along with an updated version of the existing treebank before the end of this year. The treebank is available in an XML format, in the NEGRA export format [1] and in the Penn treebank bracketing format. The XML format contains all types of information as described above, the NEGRA export format contains all sentenceinternal information while the Penn treebank format includes only those layers of information that can be expressed as pure tree structures. Over the course of the last year, more fine grained linguistic annotations have been added along the following dimensions: 1. the basic Stuttgart-Tübingen tagset, STTS, [9] labels have been enriched by relevant features of inflectional morphology, 2. named entity information has been encoded as part of the syntactic annotation, and 3. a set of anaphoric and coreference relations has been added to link referentially dependent noun phrases. In the following sections, we will describe each of these innovations in turn and will demonstrate how the additional annotations can be incorporated into one comprehensive annotation scheme.</description>
      <author>Erhard Hinrichs; Sandra Kübler; Karin Naumann; Heike Telljohann; Julia Trushkina</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9888</guid>
      <pubDate>Tue, 21 Oct 2008 16:54:38 +0200</pubDate>
    </item>
    <item>
      <title>POS tagging for German : how important is the right context?</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9887</link>
      <description>Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.</description>
      <author>Steliana Ivanova; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9887</guid>
      <pubDate>Tue, 21 Oct 2008 16:47:12 +0200</pubDate>
    </item>
    <item>
      <title>Parsing without grammar - using complete trees instead</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9886</link>
      <description>The definition of similarity between sentences is formulated on the levels of words, POS tags, and chunks (Abney 91; Abney 96). The evaluation of this approach shows that while precision and recall based on the PARSEVAL measures (Black et al. 91) do not reach state of the art Parsers yet (F1=87.19 on syntactic constituents, F1=77.78 including functionargument structure), the parser shows a very reliable performance where function-argument structure is concerned (F1=96.52). The lower F-scores are very often due to unattached constituents.</description>
      <author>Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9886</guid>
      <pubDate>Tue, 21 Oct 2008 16:41:48 +0200</pubDate>
    </item>
    <item>
      <title>Memory-based vocalization of Arabic</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9885</link>
      <description>The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.</description>
      <author>Sandra Kübler; Emad Mohamed</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9885</guid>
      <pubDate>Tue, 21 Oct 2008 16:33:19 +0200</pubDate>
    </item>
    <item>
      <title>Maschineller Erwerb von Wortklassifikationsregeln</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9884</link>
      <description>In dieser Arbeit soll erst ein kurzer Überblick über die Gebiete der Wortklassifizierung und des maschinellen Lernens gegeben werden (Kap. 1). Dann wird der Ansatz der transformationsbasierten fehlergesteuerten Wortklassifizierung (Transformation-Based Error-Driven Tagging) von Brill (1992, 1993, 1994) vorgestellt und für die Verwendung für deutschsprachige Korpora angepaßt (Kap. 2). Hierbei handelt es sich um ein regelbasiertes System, bei dem die Regeln im Gegensatz zu den bisher vorhandenen Systemen nicht manuell erarbeitet und dem System vorgegeben werden; das System erwirbt die Regeln vielmehr selbst anhand von wenigen Regelschemata aus einem kleinen bereits getaggten Lernkorpus. In Kapitel 3 werden die Ergebnisse aus der Anwendung des Systems auf Teile eines deutschsprachigen Korpus dargestellt. In Kapitel 4 schließlich werden andere Taggingsysteme vorgestellt und mit dem System von Brill (1993) anhand von acht Kriterien verglichen.</description>
      <author>Sandra Kübler</author>
      <category>book</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9884</guid>
      <pubDate>Tue, 21 Oct 2008 16:25:55 +0200</pubDate>
    </item>
    <item>
      <title>Learning a lexicalized grammar for German</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9883</link>
      <description>In syntax, the trend nowadays is towards lexicalized grammar formalisms. It is now widely accepted that dividing words into wordclasses may serve as a laborsaving mechanism - but at the same time, it discards all detailed information on the idiosyncratic behavior of words. And that is exactly the type of information that may be necessary in order to parse a sentence. For learning approaches, however, lexicalized grammars represent a challenge for the very reason that they include so much detailed and specific information, which is difficult to learn. This paper will present an algorithm for learning a link grammar of German. The problem of data sparseness is tackled by using all the available information from partial parses as well as from an existing grammar fragment and a tagger. This is a report about work in progress so there are no representative results available yet.</description>
      <author>Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9883</guid>
      <pubDate>Tue, 21 Oct 2008 16:20:00 +0200</pubDate>
    </item>
    <item>
      <title>Machine learning approaches in computational linguistics : introduction</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9882</link>
      <description/>
      <author>Erhard W. Hinrichs; Sandra Kübler</author>
      <category>other</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9882</guid>
      <pubDate>Tue, 21 Oct 2008 16:14:59 +0200</pubDate>
    </item>
    <item>
      <title>Is it really that difficult to parse German? </title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9881</link>
      <description>This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TüBa-D/Z treebanks. Parser performance for the models trained on TüBa-D/Z are comparable to parsing results for English with the Stanford parser, when trained on the Penn treebank. This comparison at least suggests that German is not harder to parse than its West-Germanic neighbor language English.</description>
      <author>Sandra Kübler; Erhard W. Hinrichs; Wolfgang Maier</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9881</guid>
      <pubDate>Tue, 21 Oct 2008 16:09:17 +0200</pubDate>
    </item>
    <item>
      <title>How to compare treebanks</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9880</link>
      <description>Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.</description>
      <author>Sandra Kübler; Wolfgang Maier; Ines Rehbein; Yannick Versley</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9880</guid>
      <pubDate>Tue, 21 Oct 2008 16:01:44 +0200</pubDate>
    </item>
    <item>
      <title>How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9630</link>
      <description>In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.</description>
      <author>Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9630</guid>
      <pubDate>Tue, 21 Oct 2008 15:55:24 +0200</pubDate>
    </item>
    <item>
      <title>From phrase structure to dependencies, and back</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9878</link>
      <description>Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a wider audience, and theory-specific annotation, that provides more fine-grained information for a smaller audience. As a compromise, it has been pointed out that treebanks can be designed to support more than one theory from the start (Nivre, 2003). We argue that information can also be added to an existing annotation scheme so that it supports additional theory-specific annotations. We also argue that such a transformation is useful for improving and extending the original annotation scheme with respect to both ambiguous annotation and annotation errors. We show this by analysing problems that arise when generating dependency information from the constituent-based TüBa-D/Z.</description>
      <author>Tylman Ule; Sandra Kübler</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9878</guid>
      <pubDate>Tue, 21 Oct 2008 13:25:11 +0200</pubDate>
    </item>
    <item>
      <title>From chunks to function-argument structure : a similarity-based approach</title>
      <link>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9877</link>
      <description>Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach.</description>
      <author>Sandra Kübler; Erhard W. Hinrichs</author>
      <category>article</category>
      <guid>http://publikationen.stub.uni-frankfurt.de/frontdoor/index/index/docId/9877</guid>
      <pubDate>Tue, 21 Oct 2008 13:19:04 +0200</pubDate>
    </item>
  </channel>
</rss>
