githubEdit

The building lifecycle

A Remède database generation can take a while... Let's see what happend step by step in the generation.

What is a Remède database generation

When you generate a new fresh Remède database, you build an Sqlite database which includes all the dictionary's words, and their metadata, stored in a JSON format, specified as the Remède document format.

Generate the database step by step

Learn how to generate Remède database by yourself.

Generation used to require to execute a lot of python scripts. But now, only two steps are required to generate a database.

  1. pre_generate_ressources.py generate multiple useful resources (mots.txt and ipa.json, from IPA.txt); see Datasetarrow-up-right

  2. generate.py generate the Sqlite database which contains all the Remède documentsarrow-up-right for each letter of the alphabet (see generate.py)

circle-exclamation

generate.py

A script to iterate words and build their Remède document.

How it works ?

  1. It iterates over 1 000 000 words (from data/mots.txt)

  2. For each word, it retrieves its definition using api-definition and more information with extern services...

  3. It inserts into the Sqlite database: the word, its sanitized form, its phoneme, its JSON format (and more metadata required for powerful and advanced features)

    • Metadata like if the last phoneme is feminine, the number of syllables or if the word can have an elide are taken from Open Lexiconarrow-up-right database (Drimearrow-up-right project) or calculated with less precision by us...

spinner

A lifecycle schema of parse.py

Last updated

Was this helpful?