Select content type to embed

All Research products

arrow_drop_down

<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::6d9b29a7f1aff869d72f71528fcfbebf&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

MULTISEM

Advanced Models for Multilingual Semantic Processing

assignment_turned_inprojectFrom01 Mar 2017

Funder: French National Research Agency (ANR)Project code: ANR-16-CE33-0013

Funder Contribution: 255,611 EUR

MULTISEM

- Summary
- DMPs

Description

The MultiSem project will propose novel advanced models for multilingual semantic processing. Existing data-driven models employ robust machine learning techniques for handling vast amounts of textual data but overlook the intricacies of the mechanisms involved in language processing which should be reflected in automatic methods. At the same time, findings in the computational semantics field fail to make their way to large-scale NLP systems, mainly due to the focus on small lexical samples which restricts the potential of the models to scale up and be used on unrestricted text. Interaction between disciplines has thus been limited up to now and the mutual potential benefits of their respective research remain unclear. At this moment of burgeoning interest in multilingual processing and semantics-related research, the MultiSem project proposes to bridge the gap between disciplines by combining the efficiency and robustness of state of the art approaches to semantic analysis with linguistically motivated semantic representations. The main novelty of the semantic processing models proposed in MultiSem is that they will be able to adapt processing to different lexical items and text types, inspired by findings regarding the organisation of semantic information in the mental lexicon and the role of context in meaning activation. It has been shown that instead of considering all possible interpretations for words in context, human bilinguals and translators restrict their choice to specific senses. This focus is largely influenced by the parameters of the communicative context and by the domain and topic of the processed texts, while a finer-grained filtering occurs only when needed for improving text understanding. Based on these findings, the models developed in MultiSem will differentiate semantic processing according to the disambiguation needs of specific words, contexts and textual genres. To achieve this ambitious goal, we intend to combine continuous space representations and topic models with traditional vector-space models for ambiguity resolution. The selection of the optimal representation for specific lexical items and text types will be guided by the output of an ambiguity type detection mechanism, combined with genre and domain identification techniques. These parameters have up to now been left unexploited in favor of models that adopt a uniform approach (either topic-based or fine-grained) for handling different words and types of text. This is largely due to the difficulty of identifying the disambiguation needs of specific lexical items and texts, a challenge that MultiSem intends to address. The models that will be developed will be mainly data-driven and enriched with knowledge from large-scale semantic resources which have been shown to improve the performance of machine learning semantic processing methods. The combination of high-level ambiguity resolution techniques (topic models and neural networks) with fine-grained (vector-based) models, and the exploitation of the knowledge available in these resources will enhance the descriptive and processing capacities of the models. The research that will be conducted in MultiSem will renew the scientific perspectives in multilingual NLP, but also in linguistics and semantics due to the knowledge that will be extracted from large volumes of data. The proposed multi-layer ambiguity resolution models will also be exploited for improving lexical selection in translation applications. Lexical errors are found to be the predominant type of errors in automatically produced translations and could be avoided if Machine Translation (MT) systems were able to identify the meaning of words and larger textual units. By improving the quality of the generated translations, MultiSem will enhance the experience of numerous users of MT systems and will have an important social impact given the current pressing demand for quality processing of large volumes of digital content.

Partners

Laboratoire dinformatique pour la mécanique et les sciences de lingénieur , LIMSI

Data Management Plans

Start a new DMP in Argos

Select content type to embed

All Research products

arrow_drop_down

<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::6d9b29a7f1aff869d72f71528fcfbebf&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

MULTISEM

MULTISEM

Loading