Optical Character Recognition on Scanned Maps for Information Extraction and Automated Georeference
Item statusRestricted Access
In the recent years most libraries around the world have initiated projects for the complete digitization of their map collections so as to enable easy access to information. When it comes to paper documents, optical character recognition (OCR) techniques are widely used for text digitization. Although maps contain a wealth of alphanumeric information, recent literature has indicated very limited examples of OCR implementation on maps. These examples are mainly focused on the extraction of textual information, whereas the extraction of numerical information has not been examined separately. Purpose of this research is to evaluate how an OCR algorithm can be used to extract numerical information from maps in form of bathymetric points and house numbers, which can ultimately lead in the automatic generation of geographic information layers. Additionally, the possibility of an automated georeference method is examined based on the recognition of coordinates on the map borders and their assignment to the corresponding grid intersections. In the present research paper, the difficulties of developing an OCR algorithm specialized for maps are identified and the methodology is explained. The produced algorithm achieves a very good recognition rate, however, its accuracy is highly dependent on the characteristics of each map and customization is necessary for application on different map series. The same applies for the georeference process and based on an implementation of the proposed method, the points where the user intervention is necessary are identified. Further improvements that could enhance the performance of the OCR algorithm and of the proposed georeferencing method are also discussed.