The building lifecycle
A Remède database generation can take a while… Let’s see what happend step by step in the generation.
What is a Remède database generation
When you generate a new fresh Remède database, you build JSON files for each letter (data/REMEDE_a.json
) but also the Sqlite database (data/remede.db
).
Generate the database step by step
Learn how to generate Remède database by yourself.
Generation require to execute multiple python scripts…
pre_generate_ressources.py
generate multiple useful resources (mots.txt
andipa.json
, fromIPA.txt
); see Datasetparse.py
generate a JSON file which contains all the Remède documents for each letter of the alphabet (see parse.py)generate_sqlite.py
generate the Sqlite database, from the previously generated JSON filesgenerate_index.py
generate the wordlist table (an index table) and push it to the Sqlite databasebuild_rimes.py
add the rimes to the dictionary (see Rimes)
All the scripts are stored in
scripts
folder and must be executed from project root.
parse.py
A script to iterate words and build their Remède document.
How it works ?
- It iterates over 250 000 words (from
data/mots.txt
) - For each word, it retrieves its definition using
api-definition
and more information with extern services… - It generates its Remède document
- It saves it under
JSON
format
flowchart TB
words[(Word\ndatabase)] --> Loop
Loop(Parser loop) --> def[Definition API]
Loop --> syn[aynonymo.fr]
Loop --> ant[antonymes.org]
Loop .-> conj[conjuguons.fr]
def --> doc[[Remède document]]
syn --> doc
conj .-> doc
ant --> doc
doc --> json[[Remède\nJSON]]
json --> db[(Remède\nDatabase)]
drime[(Drime database)] -- Reorganised and added to --> db
A lifecycle schema of parse.py