Min-Cost Flow Network to Detect Text Line on Certificate


Author

Dr. Ednawati Rainarli, S.Si., M.Si.

Abstrak

This study aims to use the Min-Cost Flow network to obtain text lines in the character detection that process on the certificate. Determining the text line is part of the text detection process before the word recognition process. Detection of the text that appears on the certificate is part of the process for automatically extracting information. The diversity of colour, font types and sizes, and the complexity of the certificate background make the scanner text detection more complex than other optical character detections. This study used the Tesseract tool to get character candidates and added merging characters into a text line using the Min-Cost Flow method. To improve the quality of the image, we used smoothing techniques with integral images. After the thresholding and segmentation process, the result is the input to Tesseract. The Tesseract detects the candidates of character. The test varies the confidence score, the threshold value for the vertical length of the two components, horizontal width, and font size difference. The best results are the confident score of at least 50, the horizontal width of less than 2, the vertical length of less than 0.2, and a difference in size between letters less than 2 with an F-Score of 62%. Even though the F-score is less than 70%, we found that the Min-Cost Flow method can select objects other than text detected by the Tesseract result, such as signatures and logos. Using the Min-Cost Flow, we can combine character candidates into one text line for later processing on text recognition.

Detail Publikasi Jurnal

Penelitian Induk: -
Jenis Publikasi:Jurnal Internasional Bereputasi
Jurnal:Journal of Engineering Science and Technology
Volume:16
Nomor:5
Tahun:2021
Halaman:3726 - 3736
P-ISSN:-
E-ISSN:1823-4690
Penerbit:School of Engineering, Taylor’s University
Tanggal Terbit:-
URL: https://jestec.taylors.edu.my/Vol%2016%20Issue%205%20October%202021/16_5_08.pdf
DOI: -