Explore projects
-
-
Updated
-
Updated
-
Updated
-
This software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.
Archived 0Updated -
Projet Open Source de visualisation interactive du registre des traitements de l'agglomération de La Rochelle.
Updated -
Requests to collect documents relating real-world events (themselves described using wikivents) stored in a global index (provided by database_infrastructure_text_mining).
Archived 0Updated -
-
Methods to take into account digit preference (heaping) in count data of wildlife
Updated -
Updated
-
This competition proposes to improve / denoise OCR-ed texts, on a testbed of more than 20 million characters form English, French, German, Finish, Spanish, Dutch, Czech, Bulgarian, Slovak and Polish.
Updated -
-
This software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).
Archived 0Updated