Explore GitLab
Discover projects, groups and snippets. Share your projects with others
-
-
Updated
-
Updated
-
Updated
-
Stage L3i 2018 - Beacons Estimote. Application finale de lecture des beacons sous iOS. Réalisé par M. BREUER Dylan
Updated -
Textual Search Engine Infrastructure based on ElasticSearch (https://www.elastic.co/fr/elasticsearch/) and Lucene (https://lucene.apache.org/). Includes the import scripts to load datasets into the index.
Updated -
This software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).
This is useful to reproduce common errors found in historical documents when historical data is missing.
Updated -
-
This software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.
Updated -
Updated
-
This competition proposes to improve / denoise OCR-ed texts, on a testbed of more than 20 million characters form English, French, German, Finish, Spanish, Dutch, Czech, Bulgarian, Slovak and Polish.
Updated -
This software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).
Updated -
Updated
-
Updated
-
Host all codes developped by Corentin during his Master 2 thesis at Observatoire Pelagis
Updated -
Updated
-
-
Updated