# Dictionary Feature Bug Fixes (Round 3) **Date:** 2026-02-12 **Branch:** mod/add-dictionary ## Task Fix three issues reported after round 2 of dictionary fixes. ## Changes Made ### 1. Fix: Definitions truncated for some words (Dictionary.cpp) **Root cause:** The `asciiCaseCmp` case-insensitive match introduced in round 2 returns the *first* case variant found in the index. In StarDict order, "Professor" (capitalized) sorts before "professor" (lowercase). If the dictionary has separate entries for each — e.g., "Professor" as a title (short definition) and "professor" as the common noun (full multi-page definition) — the shorter entry is returned. **Fix:** The linear scan in `searchIndex` now remembers the first case-insensitive match as a fallback, but continues scanning adjacent entries (case variants are always adjacent in StarDict order). If an exact case-sensitive match is found, it's used immediately. Otherwise, the first case-insensitive match is used. This ensures `cleanWord("professor")` → `"professor"` finds the full lowercase entry, not the shorter capitalized one. **Files:** `src/util/Dictionary.cpp` ### 2. Fix: Non-renderable foreign script characters in definitions (DictionaryDefinitionActivity) **Root cause:** Dictionary definitions include text from other languages (Chinese, Greek, Arabic, Cyrillic, etc.) as etymological references or examples. These characters aren't in the e-ink bitmap font and render as empty boxes. This is the same class of issue as the IPA pronunciation fix from round 2, but affecting inline content within definitions. **Fix:** - Added `isRenderableCodepoint(uint32_t cp)` static helper that whitelists character ranges the e-ink font supports: - U+0000–U+024F: Basic Latin through Latin Extended-B (ASCII + accented chars) - U+0300–U+036F: Combining Diacritical Marks - U+2000–U+206F: General Punctuation (dashes, quotes, bullets, ellipsis) - U+20A0–U+20CF: Currency Symbols - U+2100–U+214F: Letterlike Symbols - U+2190–U+21FF: Arrows - Replaced the byte-by-byte character append in `parseHtml()` with a UTF-8-aware decoder that reads multi-byte sequences, decodes the codepoint, and only appends renderable characters. Invalid or non-renderable characters are silently skipped. **Files:** `src/activities/reader/DictionaryDefinitionActivity.h`, `src/activities/reader/DictionaryDefinitionActivity.cpp` ### 3. Fix: Revert to standard-height hints, keep overlap hiding (DictionaryWordSelectActivity) **What changed:** Reverted from 22px thin custom hints back to the standard 40px theme-style buttons (rounded corners with `cornerRadius=6`, `SMALL_FONT_ID` text, matching `LyraTheme::drawButtonHints` exactly). The overlap detection is preserved. **Key design choice:** Instead of calling `GUI.drawButtonHints()` (which always clears all 4 button areas, erasing page content even for hidden buttons), the method draws each button individually in portrait mode. Hidden buttons are skipped entirely (`continue`), so the page content and word highlight underneath remain visible. Non-hidden buttons get the full theme treatment: white fill + rounded rect border + centered text. **Files:** `src/activities/reader/DictionaryWordSelectActivity.cpp` ## Follow-up Items - The `isRenderableCodepoint` whitelist is conservative — if the font gains additional glyph coverage (e.g., Greek letters for math), the whitelist can be extended - Entity-decoded characters bypass the codepoint filter since they're appended as raw bytes; this is fine for the current entity set (all produce ASCII or General Punctuation characters)