Explore projects
-
-
Updated
-
ClimaBat experimental mockup - scaled-down street canyons in real-life climatic conditions
Updated -
This software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).
Archived 0Updated -
This software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.
Archived 0Updated -
Requests to collect documents relating real-world events (themselves described using wikivents) stored in a global index (provided by database_infrastructure_text_mining).
Archived 0Updated -
-
-
A Python package to process and represent events from ontologies and semi-structured databases such as Wikidata and Wikipedia.
Archived 0Updated -
-
-
-
Resources and Python API to manipulate datasets of news documents. It manipulates data in the .pickle format with the help of pandas and numpy. It can perform operations on the datasets.
Archived 0Updated -
Command Line Tools to manipulate the document_tracking architecture. It allows to train the Miranda algorithm, to use it and the alternative one, the K-Means implementation. It also provides a tool to evaluate the results.
Archived 0Updated -
-
Updated
-
Updated
-
Textual Search Engine Infrastructure based on ElasticSearch (https://www.elastic.co/fr/elasticsearch/) and Lucene (https://lucene.apache.org/). Includes the import scripts to load datasets into the index.
Archived 0Updated