This repository contains the dataset for the paper "The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech", accepted at EMNLP 2025.
The dataset contains 10,410 manually annotated Hebrew sentences drawn from three distinct sources, capturing a wide range of political speech. The data is split into train.csv
, val.csv
, and test.csv
.
The corpus is compiled from the following sources:
- Facebook: 6,690 sentences from public posts by Israeli politicians (Members of Parliament, candidates, and party accounts) between December 2018 and April 2021.
- Knesset: 2,504 sentences from official transcripts of speeches in the Israeli Parliament (the Knesset) between 1993 and 2023.
- News Media: 1,216 sentences from leading Hebrew-language news outlets between 2018 and 2021.
- Total Sentences: 10,410
- Sentences with PDD: 1,812 (17.4%)
- Sentences with Rich Annotations: A subset of 642 PDD-positive sentences were further annotated for intensity, rhetorical strategies, and target types.
-
Inter-Coder Reliability: The primary
delegitimization
label was annotated by three coders, achieving a substantial agreement with an average Cohen's Kappa of$0.82$ .
Feature | Count / Mean | Percent / SD | Notes |
---|---|---|---|
Total Sentences | 10,410 | 100% | |
Delegitimization (True) | 1,812 | 17.4% | |
PDD Subset (N=642) | |||
Incivility | 157 | 25.0% | Contains mockery, slander, or profanity |
Common Good | 147 | 23.4% | Accuses target of harming society |
Outgroup | 147 | 23.4% | Frames target as an external enemy |
Target: Group | 174 | 27.7% | Target is a political or social group |
Target: Person | 271 | 43.2% | Target is an individual |
Target: Institute | 163 | 26.0% | Target is an organization (e.g., Supreme Court) |
Intensity (0-2 scale) | 1.225 (avg) | 0.638 (std) | Note: Average calculated on a different subset in paper. |
Target Spans | 471 | 54.9% | Percent of sentences with an annotated target span |
The data is provided in UTF-8 encoded CSV files. Each file contains the following columns:
Column | Type | Description |
---|---|---|
source |
String | The source of the text. One of Facebook , Knesset , or News . |
text |
String | The original sentence in Hebrew. |
anno_text |
String | For a subset of the data, this contains the text with PDD targets surrounded by %%% . NULL if not annotated. |
text_en_machine_translated |
String | Machine-translated English version of text for accessibility. |
Page Name |
String | Metadata from the source (e.g., Facebook page name). |
User Name |
String | Metadata from the source (e.g., Facebook user name). |
Post Created Date |
String | Metadata from the source (e.g., post timestamp). |
URL |
String | URL to the original post/document, where available. |
delegitimization |
Integer | The primary label. 1 if the sentence contains PDD, 0 otherwise. |
intensity |
Integer | The strength of delegitimization on a 3-point scale (0 =weak, 1 =moderate, 2 =strong). NULL if delegitimization is 0 or not in the richly annotated subset. |
incivility |
Integer | 1 if the PDD includes mockery, swearing, or insults. NULL otherwise. |
group |
Integer | 1 if the PDD target is a social or political group. NULL otherwise. |
person |
Integer | 1 if the PDD target is an individual. NULL otherwise. |
outgroup |
Integer | 1 if the PDD casts the target as an external "enemy". NULL otherwise. |
common_good |
Integer | 1 if the PDD invokes a threat to society or the state. NULL otherwise. |
institute |
Integer | 1 if the PDD target is an institution or organization. NULL otherwise. |
Political Delegitimization Discourse (PDD) is defined as discourse aimed at undermining the legitimacy of political entities (actors, groups, institutions) by attacking their symbolic aspects, rather than criticizing specific policies. PDD seeks to frame opponents as unworthy of normative inclusion in the political arena.
The annotation scheme consists of two main stages:
-
PDD Identification: A binary classification (
delegitimization
) to determine if a sentence contains PDD. A sentence is marked as PDD if it targets a political entity with hostile characterizations such as:- Expressions of disgust, ridicule, or hatred.
- Claims that the target poses a threat to the state or society.
- Denial of the target's right to political participation.
- Association with stigmatized groups (e.g., Nazis, terrorists).
- Critiques of policy are explicitly excluded.
-
PDD Characterization: For sentences identified as PDD, a set of finer-grained labels were annotated to describe its attributes (
intensity
,incivility
, etc.) and the type oftarget
.
This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).