TY - JOUR
T1 - Revolutionizing Historical Document Digitization
T2 - LSTM-Enhanced OCR for Arabic Handwritten Manuscripts
AU - Faizullah, Safiullah
AU - Ayub, Muhammad Sohaib
AU - Alghamdi, Turki
AU - Ali, Toqeer Syed
AU - Khan, Muhammad Asad
AU - Nabil, Emad
N1 - Publisher Copyright:
© (2024), (Science and Information Organization). All rights reserved.
PY - 2024
Y1 - 2024
N2 - Optical Character Recognition (OCR) holds immense practical value in the realm of handwritten document analysis, given its widespread use in various human transactions. This scientific process enables the conversion of diverse documents or images into analyzable, editable, and searchable data. In this paper, we present a novel approach that combines transfer learning and Arabic OCR technology to digitize ancient handwritten scripts. Our method aims to preserve and enhance accessibility to extensive collections of historically significant materials, including fragile manuscripts and rare books. Through a comprehensive examination of the challenges encountered in digitizing Arabic handwritten texts, we propose a transfer learning-based framework that leverages pre-trained models to overcome the scarcity of labeled data for training OCR systems. The experimental results demonstrate a remarkable improvement in the recognition accuracy of Arabic handwritten texts, thereby offering a highly promising solution for the digitization of historical documents. Our work enables the digitization of large collections of ancient historical materials, including manuscripts and rare books characterized by delicate physical conditions. The proposed approach signifies a significant step towards preserving our cultural heritage and facilitating advanced research in historical document analysis.
AB - Optical Character Recognition (OCR) holds immense practical value in the realm of handwritten document analysis, given its widespread use in various human transactions. This scientific process enables the conversion of diverse documents or images into analyzable, editable, and searchable data. In this paper, we present a novel approach that combines transfer learning and Arabic OCR technology to digitize ancient handwritten scripts. Our method aims to preserve and enhance accessibility to extensive collections of historically significant materials, including fragile manuscripts and rare books. Through a comprehensive examination of the challenges encountered in digitizing Arabic handwritten texts, we propose a transfer learning-based framework that leverages pre-trained models to overcome the scarcity of labeled data for training OCR systems. The experimental results demonstrate a remarkable improvement in the recognition accuracy of Arabic handwritten texts, thereby offering a highly promising solution for the digitization of historical documents. Our work enables the digitization of large collections of ancient historical materials, including manuscripts and rare books characterized by delicate physical conditions. The proposed approach signifies a significant step towards preserving our cultural heritage and facilitating advanced research in historical document analysis.
KW - Arabic OCR
KW - classification
KW - convolutional neural Network
KW - image processing
KW - Optical character recognition
KW - transfer learning
UR - https://www.scopus.com/pages/publications/86000671516
U2 - 10.14569/IJACSA.2024.01510120
DO - 10.14569/IJACSA.2024.01510120
M3 - Article
AN - SCOPUS:86000671516
SN - 2158-107X
VL - 15
SP - 1185
EP - 1194
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 10
ER -