tueba-ddp

TüBa-D/DP release 5

Introduction

TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z annotation guidelines (Telljohann et al, 2006) as closely as possible. TüBa-D/DP currently consists of the following subcorpora:

Subcorpus Genre Sentences Tokens Download
Europarl Parliamentary proceedings 2.2M 55M Download
Political speeches Speeches held by officials 619,152 12.8M Download
taz (1986-2009) Newspaper 29.9M 393.7M Contact us
Wikipedia (2020) Encyclopedia 45.5M 917.5M Download

Each subcorpus has the following annotation layers:

A description of the annotation format can be found in the stylebook.

Licensing & availability

Questions

Feel free to ask any questions by creating an issue on GitHub.