Project documentation

Published reports

M1.2 — Corpora for the Machine Translation Engines

Cristina España-Bonet & Juliane Stiller & Sophie Henning

This document describes the corpora used for training and evaluating the baseline MT engines used within the CLUBS project.

M1.3.1 — Evaluation plan for CLUBS project

Juliane Stiller & Vivien Petras

This document describes the different evaluation studies which will be executed during the course of the project. The studies assess the performance of different MT approaches for cross-lingual retrieval in the bibliographic search engine PubPsych.

M1.3.2 — CLUBS - Testing retrieval performance

Juliane Stiller & Vivien Petras

This document will detail the experiment that will be conducted to determine, which of the five approaches performs best with regard to retrieval performance.

This document describes the architecture options and final choices for implementing the machine translation (MT) system aimed to translate articles' titles and abstracts. The alternatives are presented and a comparison among the two most promising architectures, Statistical MT and Neural MT, is given.

This document describes the in-domain vocabularies the project has available, and how their multilingual counterparts are built from them. It also proposes several public resources we can use to complement and extend this data with general-domain vocabulary. Finally, application of the multilingual lexicons to controlled terms and query translations is sketched.

This document describes the data, resources, methodology and software developed to translate the controlled terms and related text available as metadata in the PubPsych database.

Software

MeSHMerger version 1.0

Roland Ramthun

A software to create a multilingual term list from several MeSH translations. More information is available in the MeSHMerger read-me file on GitHub.