Political Delegitimization Discourse - Dataset

This repository contains the dataset for the paper "The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech", accepted at EMNLP 2025.

Dataset Description

The dataset contains 10,410 manually annotated Hebrew sentences drawn from three distinct sources, capturing a wide range of political speech. The data is split into train.csv, val.csv, and test.csv.

Data Sources

The corpus is compiled from the following sources:

Facebook: 6,690 sentences from public posts by Israeli politicians (Members of Parliament, candidates, and party accounts) between December 2018 and April 2021.
Knesset: 2,504 sentences from official transcripts of speeches in the Israeli Parliament (the Knesset) between 1993 and 2023.
News Media: 1,216 sentences from leading Hebrew-language news outlets between 2018 and 2021.

Key Statistics

Total Sentences: 10,410
Sentences with PDD: 1,812 (17.4%)
Sentences with Rich Annotations: A subset of 642 PDD-positive sentences were further annotated for intensity, rhetorical strategies, and target types.
Inter-Coder Reliability: The primary delegitimization label was annotated by three coders, achieving a substantial agreement with an average Cohen's Kappa of $0.82$.

Feature	Count / Mean	Percent / SD	Notes
Total Sentences	10,410	100%
Delegitimization (True)	1,812	17.4%

PDD Subset (N=642)
Incivility	157	25.0%	Contains mockery, slander, or profanity
Common Good	147	23.4%	Accuses target of harming society
Outgroup	147	23.4%	Frames target as an external enemy
Target: Group	174	27.7%	Target is a political or social group
Target: Person	271	43.2%	Target is an individual
Target: Institute	163	26.0%	Target is an organization (e.g., Supreme Court)
Intensity (0-2 scale)	1.225 (avg)	0.638 (std)	Note: Average calculated on a different subset in paper.
Target Spans	471	54.9%	Percent of sentences with an annotated target span

Data Fields

The data is provided in UTF-8 encoded CSV files. Each file contains the following columns:

Column	Type	Description
`source`	String	The source of the text. One of `Facebook`, `Knesset`, or `News`.
`text`	String	The original sentence in Hebrew.
`anno_text`	String	For a subset of the data, this contains the `text` with PDD targets surrounded by `%%%`. `NULL` if not annotated.
`text_en_machine_translated`	String	Machine-translated English version of `text` for accessibility.
`Page Name`	String	Metadata from the source (e.g., Facebook page name).
`User Name`	String	Metadata from the source (e.g., Facebook user name).
`Post Created Date`	String	Metadata from the source (e.g., post timestamp).
`URL`	String	URL to the original post/document, where available.
`delegitimization`	Integer	The primary label. `1` if the sentence contains PDD, `0` otherwise.
`intensity`	Integer	The strength of delegitimization on a 3-point scale (`0`=weak, `1`=moderate, `2`=strong). `NULL` if `delegitimization` is `0` or not in the richly annotated subset.
`incivility`	Integer	`1` if the PDD includes mockery, swearing, or insults. `NULL` otherwise.
`group`	Integer	`1` if the PDD target is a social or political group. `NULL` otherwise.
`person`	Integer	`1` if the PDD target is an individual. `NULL` otherwise.
`outgroup`	Integer	`1` if the PDD casts the target as an external "enemy". `NULL` otherwise.
`common_good`	Integer	`1` if the PDD invokes a threat to society or the state. `NULL` otherwise.
`institute`	Integer	`1` if the PDD target is an institution or organization. `NULL` otherwise.

Annotation Scheme

Political Delegitimization Discourse (PDD) is defined as discourse aimed at undermining the legitimacy of political entities (actors, groups, institutions) by attacking their symbolic aspects, rather than criticizing specific policies. PDD seeks to frame opponents as unworthy of normative inclusion in the political arena.

The annotation scheme consists of two main stages:

PDD Identification: A binary classification (delegitimization) to determine if a sentence contains PDD. A sentence is marked as PDD if it targets a political entity with hostile characterizations such as:
- Expressions of disgust, ridicule, or hatred.
- Claims that the target poses a threat to the state or society.
- Denial of the target's right to political participation.
- Association with stigmatized groups (e.g., Nazis, terrorists).
- Critiques of policy are explicitly excluded.
PDD Characterization: For sentences identified as PDD, a set of finer-grained labels were annotated to describe its attributes (intensity, incivility, etc.) and the type of target.

License

This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Political Delegitimization Discourse - Dataset

Dataset Description

Data Sources

Key Statistics

Data Fields

Annotation Scheme

License

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
test.csv		test.csv
train.csv		train.csv
val.csv		val.csv

guymorlan/pdd

Folders and files

Latest commit

History

Repository files navigation

Political Delegitimization Discourse - Dataset

Dataset Description

Data Sources

Key Statistics

Data Fields

Annotation Scheme

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages