Explore projects
-
Thèse Guillaume Bernard / Développement / from events to documents / request_documents_based_on_events_they_report
GNU General Public License v3.0 or laterRequests to collect documents relating real-world events (themselves described using wikivents) stored in a global index (provided by database_infrastructure_text_mining).
Archived 0Updated -
Archived 0Updated
-
Thèse Guillaume Bernard / Développement / from events to documents / wikivents-projects / wikivents
GNU General Public License v3.0 or laterA Python package to process and represent events from ontologies and semi-structured databases such as Wikidata and Wikipedia.
Archived 0Updated -
Thèse Guillaume Bernard / Développement / from events to documents / database_infrastructure_text_mining
GNU General Public License v3.0 or laterTextual Search Engine Infrastructure based on ElasticSearch (https://www.elastic.co/fr/elasticsearch/) and Lucene (https://lucene.apache.org/). Includes the import scripts to load datasets into the index.
Archived 0Updated -
Archived 0Updated
-
Thèse Guillaume Bernard / Développement / from events to documents / annotate_events_with_wikidata_identifiers
GNU General Public License v3.0 or laterAnnotation tool to check whether the annotation of a Corpus (document_tracking_resources) are correct and truthful.
Archived 0Updated -
Thèse Guillaume Bernard / Jeux de données / dataset_manipulation_tools / synthesise_ocr_and_segmentation_errors_in_texts
GNU General Public License v3.0 or laterThis software enables to damage texts written in any natural language by applying OCR degradation (phantom characters, character degradation, etc.) and by over-segmenting texts (this means splitting regularly the texts in equal parts).
This is useful to reproduce common errors found in historical documents when historical data is missing.
Archived 0Updated