Article Open Access

LSTM-Based NLP Approach for Spelling Error Detection and Correction in Scientific Writing Indonesian Language

(1) Yeru Dwi Pratama Halim Mail (Universitas Pembangunan Jaya, South Tangerang, 15413, Indonesia)
(2) * Ida Nurhaida Mail (Universitas Pembangunan Jaya, South Tangerang, 15413, Indonesia)
*Corresponding author

Abstract


Scientific writing requires precision and clarity to uphold credibility and effective communication. Errors such as spelling mistakes and typos can compromise the quality and reliability of scientific texts. This study proposes a Long Short-Term Memory (LSTM)-based approach to detect and correct spelling errors, enhancing text accuracy and readability. The dataset comprises 45,698 standard words, supplemented with typo variations to improve model performance. Data is sourced from the Indonesian Dictionary (KBBI) and undergoes normalization and preprocessing to capture diverse error patterns. The model’s performance is evaluated using a confusion matrix, achieving 93% accuracy and high precision, recall, and F1-score metrics. These results demonstrate that the proposed NLP-based LSTM model offers an effective and reliable solution for identifying and correcting spelling errors. This approach significantly enhances the quality of scientific writing, ensuring more transparent and credible communication.

Keywords


Scientific Writing; Natural Language Processing; Text Correction System; LSTM

   

DOI

https://doi.org/10.33122/ejeset.v5i1.309
      

Article metrics

Abstract views : 552 | PDF views : 488

   

Cite

   

Full Text

Download

References


Adawiyah, R. (2023). Analisis kesalahan penulisan paragraf bahasa Inggris oleh mahasiswa non-jurusan bahasa Inggris. Innovative: Journal of Social Science Research, 3(6), 7308–7320.

Amien, M. (2023). Sejarah dan perkembangan teknik Natural Language Processing (NLP) bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia. Research Gate. https://www.researchgate.net/publication/369855102_Sejarah_dan_Perkembangan_Teknik_Natural_Language_Processing_NLP_Bahasa_Indonesia_Tinjauan_tentang_sejarah_perkembangan_teknologi_dan_aplikasi_NLP_dalam_bahasa_Indonesia

Baghoussi, Y., Soares, C., & Mendes-Moreira, J. (2024). Corrector LSTM: Built-in training data correction for improved time-series forecasting. Neural Computing and Applications, 36(26), 16213–16231. https://doi.org/10.1007/s00521-024-09962-x

Dewi, N. C., & Qoiriah, A. (2021). Implementasi algoritma jaro-winkler distance dan N-Gram untuk deteksi dan prediksi perbaikan kesalahan penulisan kata bahasa Indonesia pada karya tulis ilmiah mahasiswa. Journal of Informatics and Computer Science, 2(03), 169–177. https://doi.org/10.26740/jinacs.v2n03.p169-177

Herawati, I., Kanzunnudin, M., & Wiranti, D. A. (2022). Analisis kesalahan ejaan dalam penulisan karangan deskripsi siswa kelas IV SD 04 Besito Kudus. Jurnal Prasasti Ilmu, 2(3), 128–132. https://doi.org/10.24176/jpi.v2i3.8643

Juniarti, Y. (2019). Pentingnya keterampilan menulis akademik di perguruan tinggi. Prosiding Sembadra Universitas Sriwijaya, 2(1), 185–189.

Khaidir, J., Erlinawati, Sriani, Y., & Hidayat, A. (2023). Teknik penulisan karya ilmiah (N. Saputra (ed.); Vol. 1, Issue February). Yayasan Penerbit Muhammad Zaini. https://www.google.co.id/books/edition/pengantar_teknik_penulisan_karya_ilmiah/nx7eeaaaqbaj?hl=id&gbpv=0

Kusuma, A. T., & Ratnasari, C. I. (2023). Comparison of spell correction in bahasa Indonesia: Peter norvig, LSTM, and N-Gram. JIKO (Jurnal Informatika Dan Komputer), 6(3), 214–220. https://doi.org/10.33387/jiko.v6i3.7072

Marlina, Y. I. (2019). Bentuk kesalahan berbahasa ruang publik: kajian struktural bahasa [Thesis, Universitas Muhammadiyah Surakarta]. https://eprints.ums.ac.id/76214/1/NASKAH PUBLIKASI

Patel, B. M., & Sule, M. (2023). Tokenization techniques in NLP: A comprehensive review. International Journal of Advance Research and Innovative Ideas in Education, 9(1), 1873–1892. https://ijariie.com/adminuploadpdf/tokenization_techniques_in_nlp_a_comprehensive_review_ijariie22082.pdf

Putri, R. R., & Cahyono, N. (2024). Analisis sentimen komentar masyarakat terhadap pelayanan publik pemerintah DKI Jakarta dengan algoritma super vector machine and naive bayes. JATI (Jurnal Mahasiswa Teknik Informatika), 8(2), 2363–2371. https://doi.org/10.36040/jati.v8i2.9472

Rayhan, A., Kinzler, R., & Rayhan, R. (2023). Natural language processing: Transforming how machines understand human language. Researchgate. https://doi.org/10.13140/RG.2.2.34900.99200

Riehl, K., Neunteufel, M., & Hemberg, M. (2023). Hierarchical confusion matrix for classification performance evaluation. Journal of the Royal Statistical Society. Series C: Applied Statistics, 72(5), 1394–1412. https://doi.org/10.1093/jrsssc/qlad057

Rosmiati, A. (2017). Dasar-dasar penulisan karya ilmiah., ISI Press. http://repository.isi-ska.ac.id/1395/3/Dasar-Dasar Penulisan Ilmiah.pdf

Rustanti, H. D. (2024). Analisis kesalahan penggunaan ejaan bahasa indonesia pada karya ilmiah siswa kelas XI SMA Negeri 86 Jakarta Tahun Pelajaran 2021/2022. In UIN. https://repository.uinjkt.ac.id/dspace/bitstream/123456789/77844/1/HANIFAH DWI RUSTANTI11180130000023.pdf

Sathyanarayanan, S., & Tantri, B. R. (2024). Confusion matrix-based performance evaluation metrics. Afr. J. Biomed. Res., 27(4), 4023–4031. https://doi.org/10.53555/AJBR.v27i4S.4345

Siregar, S., Hasibuan, N. S., & Harahap, E. M. (2023). Pengaruh penggunaan teknik koreksi secara langsung pada keterampilan menulis puisi siswa kelas X di SMA Negeri I Siabu. Linguistik: Jurnal Bahasa Dan Sastra, 8(3), 449–513. http://jurnal.um-tapsel.ac.id/index.php/Linguistik/article/view/12837/pdf

Suhendar, A., Sugiarti, D. H., & Rosalina, S. (2023). Analisis kesalahan penulisan judul pada berita online Karawangpost.com dan Purwakartanews.com. Jurnal Onoma: Pendidikan, Bahasa, Dan Sastra, 9(1), 113–124. https://doi.org/10.30605/onoma.v9i1.2141

Suprihatma. (2024). Analisis penggunaan bahasa Indonesia dalam jurnalistik pada media massa online. Journal on Education, 6(2), 11011–11018. https://doi.org/10.31004/joe.v6i2.4892

Utama, F. P., Nurhadi, R. M. H., Fitria, D., & Ramadhan, M. P. (2021). Studi perbandingan implementasi string matching dengan metode sequential searching dan kondisi like pada pencarian judul skripsi. Jurnal Rekursif, 9(1), 43–47. https://doi.org/10.33369/rekursif.v9i1.14315

Wiranda, L., & Sadikin, M. (2019). Penerapan long short term memory pada data time series untuk memprediksi penjualan produk PT. Metiska Farma. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 8(3), 184–196.

Susanti, W., Wulandari, W., Hasanah, U., Aprindah, & Wahyuni,. (2022). Analisis kesalahan berbahasa pada berita dalam media surat kabar Kompas.com. KASTRAL: Kajian Sastra Nusantara Linggau, 2(2), 1–8. https://doi.org/10.55526/kastaral.v2i2.277

Yang, Z., Zeng, H., & Li, H. (2020). Chinese text error correction method based on prefix tree merging. IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering, 272–276. https://doi.org/10.1109/AUTEEE50969.2020.9315643

Yulianizar, R., & Waliah, S. Z. (2022). Analisis kesalahaan ejaan terhadap teks berita “Bikin gagal ginjal, etilen glikol di obat sirup ternyata 'familiar’ di mesin” pada media online Detikoto”. Sinar Dunia: Jurnal Riset Sosial Humaniora dan Ilmu Pendidikan, 1(4), 62–73. https://doi.org/10.58192/sidu.v1i4.225

Zaky, D., & Romadhony, A. (2019). An LSTM-based spell checker for Indonesian text. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, 1–6. https://doi.org/10.1109/ICAICTA.2019.8904218


Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Yeru Dwi Pratama Halim, Ida Nurhaida

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0