Telematika
Vol 18, No 2: August (2025)

Automatic Analysis of Natural Disaster Messages on Social Media Using IndoBERT and Multilingual BERT

Safitri, Yasmin Dwi (Lambung Mangkurat University)
Faisal, Mohammad Reza (Lambung Mangkurat University)
Kartini, Dwi (Lambung Mangkurat University)
Saragih, Triando Hamonangan (Lambung Mangkurat University)
Abadi, Friska (Lambung Mangkurat University)
Bachtiar, Adam Mukharil (Japan Advanced Institute of Science and Technology)



Article Info

Publish Date
31 Aug 2025

Abstract

Information about natural disasters disseminated through social media can serve as an important data source for mitigation processes and early warning systems. Social media platforms, such as X (formerly known as Twitter), have become primary channels for conveying real-time information, especially during disaster emergencies. With the large amount of unstructured disaster-related text that must be processed, the main challenge is accurately filtering and classifying messages into three categories: eyewitness, non-eyewitness, and don’t know. This research aims to compare the performance of four BERT-based natural language processing models, namely IndoBERT, IndoBERT with Masked Language Modeling (MLM), Multilingual BERT, and Multilingual BERT with MLM, in classifying Indonesian-language disaster messages. The dataset used in this study was obtained from previous research and publicly available data on GitHub, consisting of annotated messages related to floods, earthquakes, and forest fires. The method applied is a deep learning approach using the hold-out technique with an 80:20 ratio for training and testing data, and the same ratio applied to split the training data into training and validation subsets, with stratification to maintain balanced class proportions. In addition, variations in batch size were explored to evaluate their effect on model performance stability. The results show that the IndoBERT model achieved the highest performance on the flood and earthquake datasets, with accuracies of 80.67% and 81.50%, respectively. Meanwhile, IndoBERT with MLM pre-training recorded the highest accuracy on the forest fire dataset, 88.33%. Overall, IndoBERT demonstrated the most consistent and superior performance across datasets compared to the other models. These findings indicate that IndoBERT has strong capabilities in understanding Indonesian disaster-related text, and the results can be used as a foundation for developing automatic classification systems to support real-time disaster monitoring and early warning applications

Copyrights © 2025






Journal Info

Abbrev

TELEMATIKA

Publisher

Subject

Education

Description

Jl. Letjend Pol. Soemarto No.126, Watumas, Purwanegara, Kec. Purwokerto Utara, Kabupaten Banyumas, Jawa Tengah ...