Fan Jing Meng, Ying Huang, et al.
ICEBE 2007
The Bible was introduced as a dataset for evaluating multilingual optical character recognition (OCR) techniques for language technology research. Noise-free and degraded document images were generated for complete Bibles in seven languages, and 15 OCR systems were evaluated. Results show that a synthetically degraded image of a page from a Spanish Bible at 300dpi resolution. It was observed that Arabic OCR systems in general perform more poorly than the English and Spanish, the Arabic text has connected script, and the shape of the symbols change depending on the preceding and following symbols.
Fan Jing Meng, Ying Huang, et al.
ICEBE 2007
Bowen Zhou, Bing Xiang, et al.
SSST 2008
Rajeev Gupta, Shourya Roy, et al.
ICAC 2006
Lixi Zhou, Jiaqing Chen, et al.
VLDB