The building lifecycle
A Remède database generation can take a while… Let’s see what happend step by step in the generation.
What is a Remède database generation
When you generate a new fresh Remède database, you build an Sqlite database which includes all the dictionary’s words, and their metadata, stored in a JSON format, specified as the Remède document format.
Generate the database step by step
Learn how to generate Remède database by yourself.
Generation used to require to execute a lot of python scripts. But now, only two steps are required to generate a database.
pre_generate_ressources.py
generate multiple useful resources (mots.txt
andipa.json
, fromIPA.txt
); see Datasetgenerate.py
generate the Sqlite database which contains all the Remède documents for each letter of the alphabet (see generate.py)
All the scripts are stored in
scripts
folder and must be executed from project root.
generate.py
A script to iterate words and build their Remède document.
How it works ?
- It iterates over 1 000 000 words (from
data/mots.txt
) - For each word, it retrieves its definition using
api-definition
and more information with extern services… - It generates its Remède document
- It inserts into the Sqlite database: the word, its sanitized form, its phoneme, its
JSON
format (and more metadata required for powerful and advanced features)- Metadata like if the last phoneme is
feminine
, the number ofsyllables
or if the word can have anelide
are taken from Open Lexicon database (Drime project) or calculated with less precision by us…flowchart TB words[(Word\ndatabase)] --> Loop Loop(Parser loop) --> def[Definition API] Loop --> syn[aynonymo.fr] Loop --> ant[antonymes.org] Loop .-> conj[conjuguons.fr] def --> doc[[Remède document]] syn --> doc conj .-> doc ant --> doc openlexicon[(Open Lexicon\nStats)] -- Syllables, elides and feminines\n stats merged and added to --> db doc -- Is inserted into --> db[(Remède\nSqlite Database)]
A lifecycle schema of
parse.py
- Metadata like if the last phoneme is