galactic / public / src / apps / cli / framework / core
BSD 3-Clause "New" or "Revised" LicenseGALACTIC core framework
galactic / public / src / apps / cli / framework / io / data
BSD 3-Clause "New" or "Revised" LicenseGALACTIC framework io data plugin
Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / synthesise_ocr_and_segmentation_errors_in_texts
GNU General Public License v3.0 or laterThis software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).
This is useful to reproduce common errors found in historical documents when historical data is missing.
Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / compute_tf_idf_weights
GNU General Public License v3.0 or laterThis software is used to compute TF IDF weighting from texts that are based on the document_tracking_resources format. Vectors and weightings are computed thanks to a resource file that contains a representation of the language used in the same context as the text to weight (news features to weight texts published in the news).
Thèse Guillaume Bernard / Développement / from documents to events / documents_tracking_resources
GNU General Public License v3.0 or laterResources and Python API to manipulate datasets of news documents. It manipulates data in the .pickle format with the help of pandas and numpy. It can perform operations on the datasets.
galactic / public / src / io / data / slf
BSD 3-Clause "New" or "Revised" LicenseA SLF data reader from the Galicia project for GALACTIC
galactic / public / src / io / data / toml
BSD 3-Clause "New" or "Revised" LicenseA toml data reader for GALACTIC.
Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / compute_dense_vectors
GNU General Public License v3.0 or laterThis software is used to compute dense vectorisations (sentence embeddings) of sequences of sentences of natural text. It is able to handle multilingual documents until the model used is a multilingual one. This relies on the S-BERT architecture, software and models (https://www.sbert.net/). It computes dense vector representations for tokens, lemmas, entities, etc. of your datasets.
galactic / public / src / io / data / text
BSD 3-Clause "New" or "Revised" LicenseA text data reader for GALACTIC.
galactic / public / src / io / data / csv
BSD 3-Clause "New" or "Revised" LicenseA csv data reader for GALACTIC
galactic / public / src / helpers / core
BSD 3-Clause "New" or "Revised" LicenseGALACTIC helpers core library