Explore projects
-
Updated
-
LBM IFS pour solides elastiques Reprise des travaux de SM dans le but d'alleer la charge memoire point de depart : le code prealablement developpe par EL. Le passage au 3D sera plus facile
Updated -
Python framework to identify and rank crisis-related tweets based on their informativeness.
Updated -
Updated
-
Updated
-
This competition proposes to improve / denoise OCR-ed texts, on a testbed of more than 20 million characters form English, French, German, Finish, Spanish, Dutch, Czech, Bulgarian, Slovak and Polish.
Updated -
Updated
-
Updated
-
Updated
-
Updated
-
-
Updated
-
Updated
-
Annotation tool to check whether the annotation of a Corpus (document_tracking_resources) are correct and truthful.
Archived 0Updated -
Process documents in order to extract tokens, lemmas and named entities from texts. This software depends on spaCy (https://spacy.io/) in order to extract text features and recognise the inner elements.
Archived 0Updated -
This software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.
Archived 0Updated -
This software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).
Archived 0Updated -
This software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).
This is useful to reproduce common errors found in historical documents when historical data is missing.
Archived 0Updated -
Implementation of algorithms to detect and track events reported in the news. It provides two alternatives, one supervised, the other unsupervised to track events in the texts.
Archived 0Updated