Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur

corporate_fareorganization

Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur

- Funding / Projects
  (37)

37 Projects, page 1 of 8

CABeRneT (Automatic Understanding of Biomedical Texts for Translational Research)
assignment_turned_inProjectFrom 2013Partners:Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur, LIMSI
Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur,
LIMSI
Funder: French National Research Agency (ANR) Project Code: ANR-13-JS02-0009
Funder Contribution: 225,853 EUR
Much clinical and biomedical knowledge is contained in the text of published articles, Electronic Health Records (EHRs) or online patient forums and is not directly accessible for automatic computation. Natural Language Processing (NLP) techniques have been successfully developed to extract information from text and convert it to machine-readable representations. The most advanced applications have focused on identifying clinically relevant entities and concepts from English text. However, for many biomedical informatics tasks it is necessary to go beyond the identification of isolated instances in single documents – the context of concept occurrences and the nature of the relationships between co-occurring concepts are often crucial for a specific understanding of the analyzed text. Furthermore, while most of the literature is available in English, EHRs in French hospitals are written in French. Therefore, it is important to develop advanced methods for French that will provide structured representations of clinical text compatible with existing representations for English. This research project will focus on the following aims: 1. Providing material for text analysis in a specialized domain (i.e. the biomedical domain) in French 2. Adaptation to a specialized domain of NLP tools developed for the general language 3. Application to the automatic detection of links between clinical characteristics and medical history of patients described in EHRs, predictive biomarkers identified by immunologic or genetic studies and evidence of such associations reported in the literature The proposed research is innovative and will provide an in-depth study of multiple biomedical texts in French (EHRs) and in English (literature). It will be guided by linguistic principles and by the application to personalized medicine. A global approach should ensure that the methods used can be generalized to other biomedical applications.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
HUMAAINE (HUman-MAchine Affective INteraction & Ethics)
assignment_turned_inProjectFrom 2020Partners:Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur, LIMSI
Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur,
LIMSI
Funder: French National Research Agency (ANR) Project Code: ANR-19-CHIA-0019
Funder Contribution: 595,084 EUR
Abstract: The new uses of affective social robots, conversational agents, and the so-called “intelligent” systems, in fields as diverse as health, education or transport reflect a phase of significant change in human-machine relations which should receive great attention. How will Human co-learn, co-create and co-adapt with the Machine? Notably, how will vulnerable people will be protected against potential threats from the machine? The first results from an original pre-experiment, conducted by the proposed Chair’s team in June 2019 in partnership with an elementary school, shows that an AI machine (Pepper robot or Google Home) is more efficient at nudging than adults. HUMAAINE’s aim is to study these interactions and relationships, in order to audit and measure the potential influence of affective systems on humans, and finally to go towards a conception of "ethical systems by design" and to propose evaluation measures. The planned scientific work focuses on the detection of social emotions in human voice, and on the study of audio and spoken language manipulations (nudges), intended to induce changes in the behavior of the human interlocutor. This work also uses the contributions of behavioral economics highlighted by the recent 2017 Nobel Laureate Richard Thaler, in our case, applied to human-machine interactions. The roap map for the Chair’s work includes experimental studies to evaluate ethical aspects and confidence in the Human-Machine couple, as well as by demystification of these technologies among the general public which naturally tends towards anthropomorphism. This project combines the scientific research in artificial intelligence with the implementation of an innovative methodology to evaluate and improve the ethics of the HM affective interaction, despite the current opacity of the AI systems. This project pushes forward a strong interdisciplinary collaboration already existing between affective computing, behavioral economics, linguistics, and natural language processing. The researchers will disseminate the results of the chair through their Master courses and workshops with Collège des Bernardins. HUMAAINE is supported by CNRS-LIMSI, Cognition Institute (Institut Carnot), and by the future of society (Harvard Initiative for Learning and Teaching), ENSC (Bordeaux) and presents great synergy with DATAIA Institute. The Domicile foundation, IRCEM and the Collège des Bernardins are the first to support HUMAAINE, followed by industials (CareCever, Renault, etc.) and other foundations (MAIF, Anne de Gaulle). We are also building cooperation with international research teams with Japan (Osaka Univ.), Germany (DFKI), and also Canada (Observatory/MILA).
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
MetaDaTV (End-To-End TV Metadata)
assignment_turned_inProjectFrom 2015Partners:LIMSI, Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
LIMSI,
Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
Funder: French National Research Agency (ANR) Project Code: ANR-14-CE24-0024
Funder Contribution: 50,939 EUR
From the original idea to the actual distribution on TV, VOD and DVD, the production process and subsequent “life” of a TV program are divided into numerous stages, involving numerous actors (e.g. distribution, production, direction, screenwriting, casting, post-production or dubbing) and thus leading to the generation of a huge amount of heterogeneous metadata. However, only a few metadata eventually survive the tortuous production and distribution processes, making their integration difficult into novel TV-centric products. Even for TV productions where one actor manages the whole production pipeline (e.g. Canal+ in France or BBC in the UK), most of the metadata do get lost at one point or another. The MetaDaTV network proposal aims at initiating a European research community around metadata associated with TV productions (such as dramas, documentaries or TV films) and at gathering interested partners from all over Europe toward a joint European project submission (Horizon 2020).
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
ODESSA (Online Diarization Enhanced by recent Speaker identification and Sequential learning Approaches)
assignment_turned_inProjectFrom 2016Partners:EURECOM, LIMSI, Institut de recherche Idiap, Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
EURECOM,
LIMSI,
Institut de recherche Idiap,
Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
Funder: French National Research Agency (ANR) Project Code: ANR-15-CE39-0010
Funder Contribution: 308,406 EUR
Speaker diarization is an unsupervised process that aims at identifying each speaker within an audio stream and determining when each speaker is active. It considers that the number of speakers, their identities and their speech turns are all unknown. Speaker diarization has become an important key technology in many domains such as content-based information retrieval, voice biometrics, forensics or social-behavioural analysis. Examples of applications of speaker diarization include speech and speaker indexing, speaker recognition (in the presence of multiple speakers), speaker role detection, speech-to-text transcription, speech-to-speech translation and document content structuring. Although speaker diarization has been studied for almost two decades, current state-of-the-art systems suffer from many limitations. Such systems are extremely domain-dependent: for instance, a speaker diarization system trained on radio/TV broadcast news experiences drastically degraded performance when tested on a different type of recordings such as radio/TV debates, meetings, lectures, conversational telephone speech or conversational voice-over-ip speech. Overlap speech, spontaneous speaking style, background noise, music and other non-speech sources (laugh, applause, etc.) are all nuisance factors that badly affect the quality of speaker diarization. Furthermore, most existing work addresses the problem of offline speaker diarization, that is, the system has full access to the entire audio recording beforehand and no real time processing is required. Therefore, the multi-pass processing over the same data is feasible and a bunch of elegant machine learning tools can be used. Nevertheless, these compromises are not admissible in real-time applications mainly when it comes to public security and fight against terrorism and cyber-criminality. Moreover, after an initial step of segmentation into speech turns, most approaches address speaker diarization as a bag-of-speech-turns clustering problem and do not take into account the inherent temporal structure of interactions between speakers. One goal of the project is to integrate this information and rely on structured prediction techniques to improve over standard hierarchical clustering methods. Since our main application is related to the fight against cyber-criminality and public security, designing an online speaker diarization system is necessary. Therefore, the focus on industrial research will be supplemented by addressing more fundamental research issues related to structured prediction and methods such as conditional random fields. Speaker diarization is inherently related to speaker recognition. In the recent years, state-of-the-art speaker recognition systems have shown good improvement, thanks to the emergence of new recognition paradigms such as i-vectors and deep learning, new session compensation techniques such as probabilistic linear discriminant analysis, and new score normalization techniques such as adaptive symmetric score normalization. However, existing speaker diarization systems did not take full advantages of those new techniques. Therefore, one goal of the project is to adapt those techniques for speaker diarization, and thus fill the research gap in the current literature. To evaluate the proposed algorithms and to ensure their genericness, different existing databases will be considered such as NIST SRE 2008 summed-channel telephone data, NIST RT 2003-2004 conversational telephone data, REPERE TV broadcast data and AMI meeting corpus. Furthermore, we are aiming to collect a medium-size database that suits our main application of fight against cyber-criminality.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
KEEPHA (Knowledge-enhanced information extraction across languages for pharmacovigilance)
assignment_turned_inProjectFrom 2021Partners:German Research Center for Artificial Intelligence, Speech and Language Technology Lab, LIMSI, Nara Institute of Science and Technology, Graduate School of Science and Technology, Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
German Research Center for Artificial Intelligence, Speech and Language Technology Lab,
LIMSI,
Nara Institute of Science and Technology, Graduate School of Science and Technology,
Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur
Funder: French National Research Agency (ANR) Project Code: ANR-20-IADJ-0005
Funder Contribution: 248,477 EUR
Nowadays scientific knowledge can be published digitally within many different forms and sources, such as encyclopedias, scientific papers, regulatory documents, but also structured knowledge sources like ontologies or knowledge bases. Beside that also news articles, blog posts, forums or social media can contain relevant information or can be used for research. All this is published everyday in a large number of different languages. The volume and speed of production of digital content has become too fast however in some domains for humans to be able to keep up with them and maintain an up-to-date view of current scientific evidence. In MEDLINE for instance every year close to one million new articles are included. The present project aims to design Artificial Intelligence (AI) methods that automatically digest these different types of text sources and jointly extract such knowledge and observations in order to populate existing knowledge bases. Our project showcases these methods in the domain of pharmacovigilance, which endeavors to maintain up-to-date knowledge on adverse drug reactions (ADRs) for the benefit of public health. In this domain, authoritative sources include scientific journals and drug labels while elementary observations are reported in patient records and social media. Current mainstream information extraction methods use self-supervised extraction of word representations from large text corpora and tend to neglect existing knowledge on the target domain. In contrast, the present project aims to integrate existing knowledge into the word representation acquisition and information extraction processes to improve the extraction of new information and knowledge. This is all the more needed to address less formal sources and hence more challenging sources such as social media. Additionally, it will take advantage of the existence of similar information published in multiple languages to pool knowledge across countries. Literature mining can boost the collection of both current knowledge and additional elementary observations, resulting in automatically maintained digital encyclopedias in the form of knowledge graphs usable for both machine inference and human display. We believe this may further apply to various scientific fields such as global warming that need to collect and integrate elementary observations into current knowledge. Language barriers hamper the free flow of knowledge and thought across languages. Relevant findings need to be articulated across these barriers, which requires time and effort to collect and translate into the respective languages. In the not too distant future, tools will assist researchers and other citizens in finding and linking information distributed across sources and languages. In this project, we will help to improve such technologies and will demonstrate them for adverse drug reactions. This cross-language dimension obtains a clear benefit from the proposed trilateral collaboration. To strengthen our collaboration and mutual knowledge, we plan internships for early career researchers at each of the other two partner teams under joint supervision, as well as plenary, jointly taught training actions, to provide them with a shared international exposure and training and build the ambassadors of tomorrow's partnerships. The consortium is composed of three internationally recognized teams specialized in natural language processing. NAIST (JP) has created the de-facto natural language processing tools for Japanese, and produced a number of document and text analysis tools for extracting knowledge from scholarly documents. DFKI (DE) has a strong background in corpus generation, general information extraction and biomedical text processing. LIMSI (FR) has a long and strong experience in corpus annotation, hybrid information extraction and biomedical language processing, including for pharmacovigilance from patient forums.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

chevron_left
1
2
3
4
5
chevron_right

Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur

Laboratoire dInformatique pour la Mécanique et les Sciences de lIngénieur

37 Projects, page 1 of 8

CABeRneT (Automatic Understanding of Biomedical Texts for Translational Research)

HUMAAINE (HUman-MAchine Affective INteraction & Ethics)

MetaDaTV (End-To-End TV Metadata)

ODESSA (Online Diarization Enhanced by recent Speaker identification and Sequential learning Approaches)

KEEPHA (Knowledge-enhanced information extraction across languages for pharmacovigilance)

Loading