Explore projects

View la-rochelle-agent project

ekipe / la-rochelle-agent

Archived 1

Updated Jun 05, 2025

Archived 1 0 4 19

Updated Jun 05, 2025
View annotate_events_with_wikidata_identifiers project

Thèse Guillaume Bernard / Développement / from events to documents / annotate_events_with_wikidata_identifiers
GNU General Public License v3.0 or later

Annotation tool to check whether the annotation of a Corpus (document_tracking_resources) are correct and truthful.

Archived 0

Updated Aug 30, 2022

Archived 0 0 0 0

Updated Aug 30, 2022
View docker-symfony4-2019 project

ntrugeon / docker-symfony4-2019

Archived 0

Updated Jan 09, 2020

Archived 0 0 0 1

Updated Jan 09, 2020
View documents_tracking_resources project

Thèse Guillaume Bernard / Développement / from documents to events / documents_tracking_resources
GNU General Public License v3.0 or later

Resources and Python API to manipulate datasets of news documents. It manipulates data in the .pickle format with the help of pandas and numpy. It can perform operations on the datasets.

Archived 0

Updated Sep 21, 2022

Archived 0 0 0 0

Updated Sep 21, 2022
View docker-symfony-wp-2020 project

ntrugeon / docker-symfony-wp-2020

Archived 0

Updated Jan 29, 2021

Archived 0 0 0 0

Updated Jan 29, 2021
View wikivents project

Thèse Guillaume Bernard / Développement / from events to documents / wikivents-projects / wikivents
GNU General Public License v3.0 or later

A Python package to process and represent events from ontologies and semi-structured databases such as Wikidata and Wikipedia.

Archived 0

Updated Oct 30, 2023

Archived 0 0 0

Updated Oct 30, 2023
View docker-symfony-wp-2021 project

ntrugeon / docker-symfony-wp-2021

Archived 0

Updated Jan 12, 2022

Archived 0 1 0 0

Updated Jan 12, 2022
View request_documents_based_on_events_they_report project

Thèse Guillaume Bernard / Développement / from events to documents / request_documents_based_on_events_they_report
GNU General Public License v3.0 or later

Requests to collect documents relating real-world events (themselves described using wikivents) stored in a global index (provided by database_infrastructure_text_mining).

Archived 0

Updated Oct 30, 2023

Archived 0 0 0 0

Updated Oct 30, 2023
View Projet Linx project

Bilal Belmokhtar / Projet Linx

Projet basculé sur gitlab-dsi (https://gitlab-dsi.univ-lr.fr/dsi-soft/audiovisuel/linx)

Archived 0

Updated Jul 19, 2022

Archived 0 0 0 0

Updated Jul 19, 2022
View lauraM1 project

Matthieu Authier / lauraM1

Stage de Master 1 de Laura Clain
encadrants: Benoît Simon Bouhet et Matthieu Authier

Archived 0

Updated Nov 27, 2023

Archived 0 0 0 0

Updated Nov 27, 2023
View DEVOPS4 - API project

ECIA1_1 / DEVOPS4 - API

Archived 0

Updated Apr 10, 2019

Archived 0 0 0 0

Updated Apr 10, 2019
View news_tracking project

Thèse Guillaume Bernard / Développement / from documents to events / news_tracking
GNU General Public License v3.0 or later

Command Line Tools to manipulate the document_tracking architecture. It allows to train the Miranda algorithm, to use it and the alternative one, the K-Means implementation. It also provides a tool to evaluate the results.

Archived 0

Updated Sep 21, 2022

Archived 0 0

Updated Sep 21, 2022
View document_processing project

Thèse Guillaume Bernard / Développement / from documents to events / document_processing
GNU General Public License v3.0 or later

Process documents in order to extract tokens, lemmas and named entities from texts. This software depends on spaCy (https://spacy.io/) in order to extract text features and recognise the inner elements.

Archived 0

Updated Aug 31, 2022

Archived 0 0 0 0

Updated Aug 31, 2022
View synthesise_ocr_and_segmentation_errors_in_texts project

Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / synthesise_ocr_and_segmentation_errors_in_texts
GNU General Public License v3.0 or later

This software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).

This is useful to reproduce common errors found in historical documents when historical data is missing.

Archived 0

Updated Sep 21, 2022

Archived 0 0 0

Updated Sep 21, 2022
View document_tracking project

Thèse Guillaume Bernard / Développement / from documents to events / document_tracking
GNU General Public License v3.0 or later

Implementation of algorithms to detect and track events reported in the news. It provides two alternatives, one supervised, the other unsupervised to track events in the texts.

document ana... document tra... document sim...

Archived 0

Updated Sep 21, 2022

Archived 0 0 0 0

Updated Sep 21, 2022
View QSR2023_M6 project

Matthieu Authier / QSR2023_M6

Archived 0

Updated Jan 12, 2023

Archived 0 0 0 0

Updated Jan 12, 2023
View database_infrastructure_text_mining project

Thèse Guillaume Bernard / Développement / from events to documents / database_infrastructure_text_mining
GNU General Public License v3.0 or later

Textual Search Engine Infrastructure based on ElasticSearch (https://www.elastic.co/fr/elasticsearch/) and Lucene (https://lucene.apache.org/). Includes the import scripts to load datasets into the index.

Archived 0

Updated Oct 30, 2023

Archived 0 0 0 0

Updated Oct 30, 2023
View compute_tf_idf_weights project

Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / compute_tf_idf_weights
GNU General Public License v3.0 or later

This software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).

Archived 0

Updated Aug 31, 2022

Archived 0 0 0 0

Updated Aug 31, 2022
View docker-symfony4 project

ntrugeon / docker-symfony4
GNU Lesser General Public License v2.1 only

Archived 0

Updated May 27, 2019

Archived 0 0 0 0

Updated May 27, 2019
View compute_dense_vectors project

Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / compute_dense_vectors
GNU General Public License v3.0 or later

This software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.

Archived 0

Updated Aug 31, 2022

Archived 0 0 0 0

Updated Aug 31, 2022