Towards the Development of a Web-based Alignment Platform
Item statusRestricted Access
In this work, a platform is developed that makes existing sentence and word alignment tools available as web services. The tools implemented are Hunalign and GIZA++; after creating wrappers and format converters, they are embedded into a pipeline that produces a collection of word-aligned sentences from a parallel corpus provided by the user. The single components of the platform are independent of each other and therefore can be used in any order and by any web service client. The platform components are implemented in a generalisable way such that additional modules from any other sub-fields of natural language processing or completely different fields as well can be developed using methods shown in this work. After giving a background view on the state of the art alignment techniques, the platform development itself is demonstrated and evaluated. Examples of how to use it with different clients are presented. The alignment platform is implemented as a Java servlet class that is by design usable on any operating system. It is generated using the Soaplab software suite. Its tool acd2xml automatically creates web service descriptions from a definition written in the ACD format that has been designed by the Emboss project (European Molecular Biology Open Software Suite).