Explore projects
-
Thèse Guillaume Bernard / Développement / from documents to events / news_tracking
GNU General Public License v3.0 or laterCommand Line Tools to manipulate the document_tracking architecture. It allows to train the Miranda algorithm, to use it and the alternative one, the K-Means implementation. It also provides a tool to evaluate the results.
Archived 0Updated -
Updated
-
Thèse Guillaume Bernard / Développement / from documents to events / document_tracking
GNU General Public License v3.0 or laterImplementation of algorithms to detect and track events reported in the news. It provides two alternatives, one supervised, the other unsupervised to track events in the texts.
Archived 0Updated -
galactic / public / src / io / data / text
BSD 3-Clause "New" or "Revised" LicenseA text data reader for GALACTIC.
Updated -
Thèse Guillaume Bernard / Développement / from documents to events / document_processing
GNU General Public License v3.0 or laterProcess documents in order to extract tokens, lemmas and named entities from texts. This software depends on spaCy (https://spacy.io/) in order to extract text features and recognise the inner elements.
Archived 0Updated -
galactic / public / src / io / data / toml
BSD 3-Clause "New" or "Revised" LicenseA toml data reader for GALACTIC.
Updated -
Updated
-
Updated
-
galactic / public / src / io / data / slf
BSD 3-Clause "New" or "Revised" LicenseA SLF data reader from the Galicia project for GALACTIC
Updated -
galactic / public / src / algebras / kernel
BSD 3-Clause "New" or "Revised" LicenseGALACTIC core kernel library
Updated -
galactic / public / src / io / data / json
BSD 3-Clause "New" or "Revised" LicenseA JSON data reader for GALACTIC
Updated -
Thèse Guillaume Bernard / Développement / from events to documents / wikivents-projects / wikivents
GNU General Public License v3.0 or laterA Python package to process and represent events from ontologies and semi-structured databases such as Wikidata and Wikipedia.
Archived 0Updated -
galactic / public / src / io / data / core
BSD 3-Clause "New" or "Revised" LicenseUpdated -
Updated
-
Updated
-
Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / synthesise_ocr_and_segmentation_errors_in_texts
GNU General Public License v3.0 or laterThis software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).
This is useful to reproduce common errors found in historical documents when historical data is missing.
Archived 0Updated -
Updated
-
Updated