feat: Add Spanish hyphenation support (#558)
## Summary * **What is the goal of this PR?** Add Spanish language hyphenation support to improve text rendering for Spanish books. * **What changes are included?** - Added Spanish hyphenation trie (`hyph-es.trie.h`) generated from Typst's hypher patterns - Registered `spanishHyphenator` in `LanguageRegistry.cpp` for language tag `es` - Added Spanish to the hyphenation evaluation test suite - Added Spanish test data file with 5000 test cases ## Additional Context * **Test Results:** Spanish hyphenation achieves 99.02% F1 Score (97.72% perfect matches out of 5000 test cases) * **Compatibility:** Works automatically for EPUBs with `<dc:language>es</dc:language>` (or es-ES, es-MX, etc.) <img width="115" height="189" alt="imagen" src="https://github.com/user-attachments/assets/9b92e7fc-b98d-48af-8d53-dfdc2e68abee" /> | Metric | Value | |--------|-------| | Perfect matches | 97.72% | | Overall Precision | 99.33% | | Overall Recall | 99.42% | | Overall F1 Score | 99.38% | --- ### AI Usage Did you use AI tools to help write this code? _**PARTIALLY**_ AI assisted with: - Guiding and compile - Preparing the PR description
This commit is contained in:
@@ -42,6 +42,7 @@ const std::vector<LanguageConfig> kSupportedLanguages = {
|
||||
{"french", "test/hyphenation_eval/resources/french_hyphenation_tests.txt", "fr"},
|
||||
{"german", "test/hyphenation_eval/resources/german_hyphenation_tests.txt", "de"},
|
||||
{"russian", "test/hyphenation_eval/resources/russian_hyphenation_tests.txt", "ru"},
|
||||
{"spanish", "test/hyphenation_eval/resources/spanish_hyphenation_tests.txt", "es"},
|
||||
};
|
||||
|
||||
std::vector<size_t> expectedPositionsFromAnnotatedWord(const std::string& annotated) {
|
||||
|
||||
5012
test/hyphenation_eval/resources/spanish_hyphenation_tests.txt
Normal file
5012
test/hyphenation_eval/resources/spanish_hyphenation_tests.txt
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user