crosspoint-reader/lib/Epub/Epub/hyphenation/LanguageHyphenator.h
Arthur Tazhitdinov 8824c87490
feat: dict based Hyphenation (#305)
## Summary

* Adds (optional) Hyphenation for English, French, German, Russian
languages

## Additional Context

* Included hyphenation dictionaries add approximately 280kb to the flash
usage (German alone takes 200kb)
* Trie encoded dictionaries are adopted from hypher project
(https://github.com/typst/hypher)
* Soft hyphens (and other explicit hyphens) take precedence over
dict-based hyphenation. Overall, the hyphenation rules are quite
aggressive, as I believe it makes more sense on our smaller screen.

---------

Co-authored-by: Dave Allie <dave@daveallie.com>
2026-01-19 12:56:26 +00:00

24 lines
895 B
C++

#pragma once
#include "LiangHyphenation.h"
// Generic Liang-backed hyphenator that stores pattern metadata plus language-specific helpers.
class LanguageHyphenator {
public:
LanguageHyphenator(const SerializedHyphenationPatterns& patterns, bool (*isLetterFn)(uint32_t),
uint32_t (*toLowerFn)(uint32_t), size_t minPrefix = LiangWordConfig::kDefaultMinPrefix,
size_t minSuffix = LiangWordConfig::kDefaultMinSuffix)
: patterns_(patterns), config_(isLetterFn, toLowerFn, minPrefix, minSuffix) {}
std::vector<size_t> breakIndexes(const std::vector<CodepointInfo>& cps) const {
return liangBreakIndexes(cps, patterns_, config_);
}
size_t minPrefix() const { return config_.minPrefix; }
size_t minSuffix() const { return config_.minSuffix; }
protected:
const SerializedHyphenationPatterns& patterns_;
LiangWordConfig config_;
};