tueba-ddp

TüBa-D/DP release 4

Introduction

TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. TüBa-D/DP attempts to provide familiar annotations by following the TüBa-D/Z annotation guidelines (Telljohann et al, 2006) as closely as possible. TüBa-D/DP currently consists of the following subcorpora:

Subcorpus Genre Sentences Tokens Download
Europarl Parliamentary proceedings 2.2M 55M Download
taz (1986-2009) Newspaper 29.9M 393.7M Contact us
Wikipedia (2019) Encyclopedia 42.2M 849.5M Download
Common Crawl (2019) Webpages 1.4B 27.3B Contact us

Each subcorpus has the following annotation layers:

A description of the annotation format can be found in the stylebook.

Licensing & availability

Questions

Please send any questions to daniel.de-kok@uni-tuebingen.de or create an issue on GitHub.