Skip to content

guymorlan/pdd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Political Delegitimization Discourse - Dataset

This repository contains the dataset for the paper "The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech", accepted at EMNLP 2025.

Dataset Description

The dataset contains 10,410 manually annotated Hebrew sentences drawn from three distinct sources, capturing a wide range of political speech. The data is split into train.csv, val.csv, and test.csv.

Data Sources

The corpus is compiled from the following sources:

  • Facebook: 6,690 sentences from public posts by Israeli politicians (Members of Parliament, candidates, and party accounts) between December 2018 and April 2021.
  • Knesset: 2,504 sentences from official transcripts of speeches in the Israeli Parliament (the Knesset) between 1993 and 2023.
  • News Media: 1,216 sentences from leading Hebrew-language news outlets between 2018 and 2021.

Key Statistics

  • Total Sentences: 10,410
  • Sentences with PDD: 1,812 (17.4%)
  • Sentences with Rich Annotations: A subset of 642 PDD-positive sentences were further annotated for intensity, rhetorical strategies, and target types.
  • Inter-Coder Reliability: The primary delegitimization label was annotated by three coders, achieving a substantial agreement with an average Cohen's Kappa of $0.82$.
Feature Count / Mean Percent / SD Notes
Total Sentences 10,410 100%
Delegitimization (True) 1,812 17.4%
PDD Subset (N=642)
Incivility 157 25.0% Contains mockery, slander, or profanity
Common Good 147 23.4% Accuses target of harming society
Outgroup 147 23.4% Frames target as an external enemy
Target: Group 174 27.7% Target is a political or social group
Target: Person 271 43.2% Target is an individual
Target: Institute 163 26.0% Target is an organization (e.g., Supreme Court)
Intensity (0-2 scale) 1.225 (avg) 0.638 (std) Note: Average calculated on a different subset in paper.
Target Spans 471 54.9% Percent of sentences with an annotated target span

Data Fields

The data is provided in UTF-8 encoded CSV files. Each file contains the following columns:

Column Type Description
source String The source of the text. One of Facebook, Knesset, or News.
text String The original sentence in Hebrew.
anno_text String For a subset of the data, this contains the text with PDD targets surrounded by %%%. NULL if not annotated.
text_en_machine_translated String Machine-translated English version of text for accessibility.
Page Name String Metadata from the source (e.g., Facebook page name).
User Name String Metadata from the source (e.g., Facebook user name).
Post Created Date String Metadata from the source (e.g., post timestamp).
URL String URL to the original post/document, where available.
delegitimization Integer The primary label. 1 if the sentence contains PDD, 0 otherwise.
intensity Integer The strength of delegitimization on a 3-point scale (0=weak, 1=moderate, 2=strong). NULL if delegitimization is 0 or not in the richly annotated subset.
incivility Integer 1 if the PDD includes mockery, swearing, or insults. NULL otherwise.
group Integer 1 if the PDD target is a social or political group. NULL otherwise.
person Integer 1 if the PDD target is an individual. NULL otherwise.
outgroup Integer 1 if the PDD casts the target as an external "enemy". NULL otherwise.
common_good Integer 1 if the PDD invokes a threat to society or the state. NULL otherwise.
institute Integer 1 if the PDD target is an institution or organization. NULL otherwise.

Annotation Scheme

Political Delegitimization Discourse (PDD) is defined as discourse aimed at undermining the legitimacy of political entities (actors, groups, institutions) by attacking their symbolic aspects, rather than criticizing specific policies. PDD seeks to frame opponents as unworthy of normative inclusion in the political arena.

The annotation scheme consists of two main stages:

  1. PDD Identification: A binary classification (delegitimization) to determine if a sentence contains PDD. A sentence is marked as PDD if it targets a political entity with hostile characterizations such as:

    • Expressions of disgust, ridicule, or hatred.
    • Claims that the target poses a threat to the state or society.
    • Denial of the target's right to political participation.
    • Association with stigmatized groups (e.g., Nazis, terrorists).
    • Critiques of policy are explicitly excluded.
  2. PDD Characterization: For sentences identified as PDD, a set of finer-grained labels were annotated to describe its attributes (intensity, incivility, etc.) and the type of target.

License

This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published