The building lifecycle
A Remède database generation can take a while... Let's see what happend step by step in the generation.
Last updated
Was this helpful?
A Remède database generation can take a while... Let's see what happend step by step in the generation.
Last updated
Was this helpful?
When you generate a new fresh Remède database, you build an Sqlite database which includes all the dictionary's words, and their metadata, stored in a JSON format, specified as the Remède document format.
Learn how to generate Remède database by yourself.
Generation used to require to execute a lot of python scripts. But now, only two steps are required to generate a database.
pre_generate_ressources.py
generate multiple useful resources (mots.txt
and ipa.json
, from IPA.txt
); see
generate.py
generate the Sqlite database which contains all the for each letter of the alphabet (see )
All the scripts are stored in scripts
folder and must be executed from project root.
A script to iterate words and build their Remède document.
How it works ?
It iterates over 1 000 000 words (from data/mots.txt
)
For each word, it retrieves its definition using and more information with extern services...
It generates its
It inserts into the Sqlite database: the word, its sanitized form, its phoneme, its JSON
format (and more metadata required for powerful and advanced features)
Metadata like if the last phoneme is feminine
, the number of syllables
or if the word can have an elide
are taken from database ( project) or calculated with less precision by us...
A lifecycle schema of parse.py