TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z annotation guidelines (Telljohann et al, 2006) as closely as possible. TüBa-D/DP currently consists of the following subcorpora:
Subcorpus | Genre | Sentences | Tokens | Download |
---|---|---|---|---|
Europarl | Parliamentary proceedings | 2.2M | 55M | Download |
Political speeches | Speeches held by officials | 619,152 | 12.8M | Download |
taz (1986-2009) | Newspaper | 29.9M | 393.7M | Contact us |
Wikipedia (2020) | Encyclopedia | 45.5M | 917.5M | Download |
Each subcorpus has the following annotation layers:
A description of the annotation format can be found in the stylebook.
Feel free to ask any questions by creating an issue on GitHub.