## Summary * Adds (optional) Hyphenation for English, French, German, Russian languages ## Additional Context * Included hyphenation dictionaries add approximately 280kb to the flash usage (German alone takes 200kb) * Trie encoded dictionaries are adopted from hypher project (https://github.com/typst/hypher) * Soft hyphens (and other explicit hyphens) take precedence over dict-based hyphenation. Overall, the hyphenation rules are quite aggressive, as I believe it makes more sense on our smaller screen. --------- Co-authored-by: Dave Allie <dave@daveallie.com>
24 lines
895 B
C++
24 lines
895 B
C++
#pragma once
|
|
|
|
#include "LiangHyphenation.h"
|
|
|
|
// Generic Liang-backed hyphenator that stores pattern metadata plus language-specific helpers.
|
|
class LanguageHyphenator {
|
|
public:
|
|
LanguageHyphenator(const SerializedHyphenationPatterns& patterns, bool (*isLetterFn)(uint32_t),
|
|
uint32_t (*toLowerFn)(uint32_t), size_t minPrefix = LiangWordConfig::kDefaultMinPrefix,
|
|
size_t minSuffix = LiangWordConfig::kDefaultMinSuffix)
|
|
: patterns_(patterns), config_(isLetterFn, toLowerFn, minPrefix, minSuffix) {}
|
|
|
|
std::vector<size_t> breakIndexes(const std::vector<CodepointInfo>& cps) const {
|
|
return liangBreakIndexes(cps, patterns_, config_);
|
|
}
|
|
|
|
size_t minPrefix() const { return config_.minPrefix; }
|
|
size_t minSuffix() const { return config_.minSuffix; }
|
|
|
|
protected:
|
|
const SerializedHyphenationPatterns& patterns_;
|
|
LiangWordConfig config_;
|
|
};
|