LSTM-Based NLP Approach for Spelling Error Detection and Correction in Scientific Writing Indonesian Language

Yeru Dwi Pratama Halim; Ida Nurhaida

doi:10.33122/ejeset.v5i1.309


Article Open Access LSTM-Based NLP Approach for Spelling Error Detection and Correction in Scientific Writing Indonesian Language

⁽¹⁾ Yeru Dwi Pratama Halim

(Universitas Pembangunan Jaya, South Tangerang, 15413, Indonesia)
^{(2) *} Ida Nurhaida

(Universitas Pembangunan Jaya, South Tangerang, 15413, Indonesia)
^*Corresponding author

Abstract

Scientific writing requires precision and clarity to uphold credibility and effective communication. Errors such as spelling mistakes and typos can compromise the quality and reliability of scientific texts. This study proposes a Long Short-Term Memory (LSTM)-based approach to detect and correct spelling errors, enhancing text accuracy and readability. The dataset comprises 45,698 standard words, supplemented with typo variations to improve model performance. Data is sourced from the Indonesian Dictionary (KBBI) and undergoes normalization and preprocessing to capture diverse error patterns. The model’s performance is evaluated using a confusion matrix, achieving 93% accuracy and high precision, recall, and F1-score metrics. These results demonstrate that the proposed NLP-based LSTM model offers an effective and reliable solution for identifying and correcting spelling errors. This approach significantly enhances the quality of scientific writing, ensuring more transparent and credible communication.

Keywords

Scientific Writing; Natural Language Processing; Text Correction System; LSTM

DOI

https://doi.org/10.33122/ejeset.v5i1.309

Article metrics

Abstract views : 552 | PDF views : 488

Cite

How to cite item

Full Text

Download

References

Adawiyah, R. (2023). Analisis kesalahan penulisan paragraf bahasa Inggris oleh mahasiswa non-jurusan bahasa Inggris. Innovative: Journal of Social Science Research, 3(6), 7308–7320.

Amien, M. (2023). Sejarah dan perkembangan teknik Natural Language Processing (NLP) bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia. Research Gate. https://www.researchgate.net/publication/369855102_Sejarah_dan_Perkembangan_Teknik_Natural_Language_Processing_NLP_Bahasa_Indonesia_Tinjauan_tentang_sejarah_perkembangan_teknologi_dan_aplikasi_NLP_dalam_bahasa_Indonesia

Baghoussi, Y., Soares, C., & Mendes-Moreira, J. (2024). Corrector LSTM: Built-in training data correction for improved time-series forecasting. Neural Computing and Applications, 36(26), 16213–16231. https://doi.org/10.1007/s00521-024-09962-x

Dewi, N. C., & Qoiriah, A. (2021). Implementasi algoritma jaro-winkler distance dan N-Gram untuk deteksi dan prediksi perbaikan kesalahan penulisan kata bahasa Indonesia pada karya tulis ilmiah mahasiswa. Journal of Informatics and Computer Science, 2(03), 169–177. https://doi.org/10.26740/jinacs.v2n03.p169-177

Herawati, I., Kanzunnudin, M., & Wiranti, D. A. (2022). Analisis kesalahan ejaan dalam penulisan karangan deskripsi siswa kelas IV SD 04 Besito Kudus. Jurnal Prasasti Ilmu, 2(3), 128–132. https://doi.org/10.24176/jpi.v2i3.8643

Juniarti, Y. (2019). Pentingnya keterampilan menulis akademik di perguruan tinggi. Prosiding Sembadra Universitas Sriwijaya, 2(1), 185–189.

Khaidir, J., Erlinawati, Sriani, Y., & Hidayat, A. (2023). Teknik penulisan karya ilmiah (N. Saputra (ed.); Vol. 1, Issue February). Yayasan Penerbit Muhammad Zaini. https://www.google.co.id/books/edition/pengantar_teknik_penulisan_karya_ilmiah/nx7eeaaaqbaj?hl=id&gbpv=0

Kusuma, A. T., & Ratnasari, C. I. (2023). Comparison of spell correction in bahasa Indonesia: Peter norvig, LSTM, and N-Gram. JIKO (Jurnal Informatika Dan Komputer), 6(3), 214–220. https://doi.org/10.33387/jiko.v6i3.7072

Marlina, Y. I. (2019). Bentuk kesalahan berbahasa ruang publik: kajian struktural bahasa [Thesis, Universitas Muhammadiyah Surakarta]. https://eprints.ums.ac.id/76214/1/NASKAH PUBLIKASI

Patel, B. M., & Sule, M. (2023). Tokenization techniques in NLP: A comprehensive review. International Journal of Advance Research and Innovative Ideas in Education, 9(1), 1873–1892. https://ijariie.com/adminuploadpdf/tokenization_techniques_in_nlp_a_comprehensive_review_ijariie22082.pdf

Putri, R. R., & Cahyono, N. (2024). Analisis sentimen komentar masyarakat terhadap pelayanan publik pemerintah DKI Jakarta dengan algoritma super vector machine and naive bayes. JATI (Jurnal Mahasiswa Teknik Informatika), 8(2), 2363–2371. https://doi.org/10.36040/jati.v8i2.9472

Rayhan, A., Kinzler, R., & Rayhan, R. (2023). Natural language processing: Transforming how machines understand human language. Researchgate. https://doi.org/10.13140/RG.2.2.34900.99200

Riehl, K., Neunteufel, M., & Hemberg, M. (2023). Hierarchical confusion matrix for classification performance evaluation. Journal of the Royal Statistical Society. Series C: Applied Statistics, 72(5), 1394–1412. https://doi.org/10.1093/jrsssc/qlad057

Rosmiati, A. (2017). Dasar-dasar penulisan karya ilmiah., ISI Press. http://repository.isi-ska.ac.id/1395/3/Dasar-Dasar Penulisan Ilmiah.pdf

Rustanti, H. D. (2024). Analisis kesalahan penggunaan ejaan bahasa indonesia pada karya ilmiah siswa kelas XI SMA Negeri 86 Jakarta Tahun Pelajaran 2021/2022. In UIN. https://repository.uinjkt.ac.id/dspace/bitstream/123456789/77844/1/HANIFAH DWI RUSTANTI11180130000023.pdf

Sathyanarayanan, S., & Tantri, B. R. (2024). Confusion matrix-based performance evaluation metrics. Afr. J. Biomed. Res., 27(4), 4023–4031. https://doi.org/10.53555/AJBR.v27i4S.4345

Siregar, S., Hasibuan, N. S., & Harahap, E. M. (2023). Pengaruh penggunaan teknik koreksi secara langsung pada keterampilan menulis puisi siswa kelas X di SMA Negeri I Siabu. Linguistik: Jurnal Bahasa Dan Sastra, 8(3), 449–513. http://jurnal.um-tapsel.ac.id/index.php/Linguistik/article/view/12837/pdf

Suhendar, A., Sugiarti, D. H., & Rosalina, S. (2023). Analisis kesalahan penulisan judul pada berita online Karawangpost.com dan Purwakartanews.com. Jurnal Onoma: Pendidikan, Bahasa, Dan Sastra, 9(1), 113–124. https://doi.org/10.30605/onoma.v9i1.2141

Suprihatma. (2024). Analisis penggunaan bahasa Indonesia dalam jurnalistik pada media massa online. Journal on Education, 6(2), 11011–11018. https://doi.org/10.31004/joe.v6i2.4892

Utama, F. P., Nurhadi, R. M. H., Fitria, D., & Ramadhan, M. P. (2021). Studi perbandingan implementasi string matching dengan metode sequential searching dan kondisi like pada pencarian judul skripsi. Jurnal Rekursif, 9(1), 43–47. https://doi.org/10.33369/rekursif.v9i1.14315

Wiranda, L., & Sadikin, M. (2019). Penerapan long short term memory pada data time series untuk memprediksi penjualan produk PT. Metiska Farma. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 8(3), 184–196.

Susanti, W., Wulandari, W., Hasanah, U., Aprindah, & Wahyuni,. (2022). Analisis kesalahan berbahasa pada berita dalam media surat kabar Kompas.com. KASTRAL: Kajian Sastra Nusantara Linggau, 2(2), 1–8. https://doi.org/10.55526/kastaral.v2i2.277

Yang, Z., Zeng, H., & Li, H. (2020). Chinese text error correction method based on prefix tree merging. IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering, 272–276. https://doi.org/10.1109/AUTEEE50969.2020.9315643

Yulianizar, R., & Waliah, S. Z. (2022). Analisis kesalahaan ejaan terhadap teks berita “Bikin gagal ginjal, etilen glikol di obat sirup ternyata 'familiar’ di mesin” pada media online Detikoto”. Sinar Dunia: Jurnal Riset Sosial Humaniora dan Ilmu Pendidikan, 1(4), 62–73. https://doi.org/10.58192/sidu.v1i4.225

Zaky, D., & Romadhony, A. (2019). An LSTM-based spell checker for Indonesian text. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, 1–6. https://doi.org/10.1109/ICAICTA.2019.8904218

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About Journal

- Focus and Scope

- Editorial Team

- Peer-Reviewers

- Open Access Statement

- Sponsorship/Publisher

- Contact Us

Journal Policies

- Publication Ethics & Malpractice Statement

- Peer Review Process

- Publication Frequency

- Plagiarism Screening Policy

Authors

- Author Fees/Article Charges

- Manuscripts Withdrawal

Informations

- For Readers

- For Authors

- For Librarians

Electronic Journal of Education, Social Economics and Technology (eJESET)
ISSN 2723-6250 (Online)
Published by the SAINTIS Publishing
Homepage: http://ejeset.saintispub.com
Editor E-mail : ejeset@saintispub.com; editor.ejeset@gmail.com

Editorial Office Address:
Jl. Banda Aceh-Medan, Mns. Mesjid, Muara Dua, Kota Lhokseumawe, Province Aceh, Indonesia, 24351

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me