Final Project Workshop

The final project workshop Overcoming language barriers - Cross-lingual retrieval of bibliographic metadata with neural machine translation took place on Friday, June 7th, 2019, 11 a.m. - 5 p.m. at DFKI Saarbrücken, Saarland Informatics Campus D 3_2.

Invited keynote speakers:
Prof. Dr. Christa Womser-Hacker (Universität Hildesheim): Which language does the internet speak? (canceled)
Prof. Gareth Jones (Dublin City University): Reconsidering Domain-Specific Cross-Language Information Access in the Age of Distributional Semantics
Talks by:
Prof. Daniela Petrelli (Sheffield Hallam University): A designerly approach to interactive cross-language information retrieval
Assoc. Prof. Pavel Pecina (Charles University, Prague): Breaking the language barrier in health-related web search
Antoine Isaac, PhD (Europeana Foundation): Multilingual challenges and ongoing work to tackle them at Europeana
and the CLuBS project members.

Workshop Description

Does overcoming the language barrier contribute to global advancement of science or has English become the lingua franca of science?

Research has shown that results in non-English languages are less available and referenced than results published in English. This may lead to situations where information published in languages other than English is lost or, in practice, non-existent for individual researchers or even the scientific community as a whole. The aim of the project Cross-Lingual Bibliographic Search (CLuBS) is to investigate strategies to address these problems in Psychology. The use case is PubPsych, an open access multilingual search engine for psychological literature, tests, treatment programs and research data with metadata in four languages: English, French, German, and Spanish.

In order to make bibliographic metadata better available to non-native speakers, information retrieval performances of query translation versus complete record translation were empirically tested and evaluated. Strategies that proved to be successful despite few query terms and the resulting scarcity of context information:
  1. lexicon (thesaurus) based query translation together with some simple translation rules;
  2. neural machine translation for content translation to cover those language pairs with few in-domain parallel data.
The resulting system is a combination of human expert translation, multilingual thesaurus mappings and neural machine translation.
The objective of this workshop is to present and discuss the CLuBS project results. The main focus is on information retrieval and on machine translation and to demonstrate the improved PubPsych search engine. Participants are invited from the fields of
  • Library and Information Science,
  • Computer Science,
  • Psychology,
  • Linguistics,
  • and other disciplines interested in multilingual systems.
By sharing experience and knowledge this contribution to overcome the language barrier would benefit all researchers.