Workshop Abstracts

Which language does the internet speak?

Prof. Dr. Christa Womser-Hacker (Universität Hildesheim)

The internet is commonly seen as an “international” phenomenon, since it is accessible from (almost) everywhere, connecting users all around the world. In my talk, I want to investigate what “international” means in the context of the internet and I assume, that the aspect of language has a significant share in the true internationality. I.e. that internationality cannot only be attributed a technological importance. The focus of my research is to analyze how multilingualism and interculturality are realized on the internet. From an information science point of view, user behavior in the context of multilingualism or cultural boundaries (for example in e-commerce, in information seeking or in the use of mobile systems) seems to be interesting. Cross-language Information Retrieval (CLIR) is a research field that has created various solutions to handle multilingualism. Additionally, another research area to examine is the international design of graphical user interfaces (GUIs), that is much more than just a “pure” translation. The most important strategies are localization or globalization of interfaces. Requirements from the user experience on the one hand, but also from the intercultural communication on the other hand, come into effect here. In my presentation, I will try to combine these two directions and illustrate them with examples from my research.

Reconsidering Domain-Specific Cross-Language Information Access in the Age of Distributional Semantics

Prof. Gareth Jones (Dublin City University)

Early research in cross-language information access around the turn of the century saw considerable interest in the development of domain-specific resources and translation methods. This research focused on ensuring coverage and reliable translation for search terms associated with specific domains when standard machine translation struggled in both to achieve both of these objectives. Dramatic improvements in machine translation technologies in recent years have greatly reduced these concerns, and it has become assumed widely that the best current machine translation address these problems sufficiently. However, to the best of my knowledge, this assumption has not been tested. Recent advances in machine translation owe much to the introduction of neural methods using distributional semantics, in particular word embedding. In this presentation, I will consider the specific challenges of cross-language information access and how application of cross-lingual transfer of word embeddings in this task may offer benefits beyond their use in machine translation systems.

A designerly approach to interactive cross-language information retrieval

Prof. Daniela Petrelli (Sheffield Hallam University)

When a new technology is invented, it is unknown how a potential user could and would interact with it. To design for such new scenario means to simultaneously understand and define what the scenario itself is: how does the user approach such new task? what do they do? How does the system perform? What works and what instead must be changed or improved? When designing the user interface and the interaction for a cross-language retrieval system, I had first to understand what the task really was, how potential users would approach it, and then progressively try different solutions in order to find the optimal one. Through this iterative process I was also able to reveal some limitations of the translation-retrieval mechanisms that, when addressed, radically improved the user interaction. In turn, the improved performance of the system changed the way users responded and pushed for changes on the user interface and interaction. I will use two case studies in cross language text and image retrieval to illustrate the key stages in the design process and show how the research progressed iteratively through phases of user evaluation and redesign.

Breaking the language barrier in health-related web search

Assoc. Prof. Pavel Pecina (Charles University, Prague)

The World Wide Web become an important source of health-related information and the number of people searching for medical subjects online is growing. At the same time, most of the health-related content on the Web is published in English only and therefore not accessible for non-English-speaking users, whose number is increasing as Internet becomes more accessible in non-English-speaking countries. In our research we aim to help overcome this growing language barrier by developing methods for Cross-lingual Information Retrieval where the information which is searched for is contained in documents written in a language which is different from the language of the search query. The talk will present our recent experiments and results in the area of Cross-lingual Information Retrieval, including methods for query translation reranking, query translation expansion and experiments comparing query-translation and document-translation approaches.

Multilingual challenges and ongoing work to tackle them at Europeana

Antoine Isaac, PhD (Europeana Foundation)

Europeana, Europe's online digital library, provides access to nearly 60 million objects from over 3,700 libraries, museums and archives spread over 44 countries. We operate on top of metadata provided to us by these institutions and other partner initiatives, which mostly comes in the languages of these countries. Making the service work over such a diversity of languages, for everyone in Europe, has always been a core challenge for Europeana. In this presentation we will lay down the concrete obstacles we have to face, which precise functions they impact most, and how we have progressed on tackling them over the past ten years. In particular, we will present how implementing the vision of exploiting "semantic", contextual data gathered either from our partners or from third parties has proven the most realistic way to make incremental progress.