# Dictionary Feature Bug Fixes (Round 3)

**Date:** 2026-02-12  
**Branch:** mod/add-dictionary

## Task

Fix three issues reported after round 2 of dictionary fixes.

## Changes Made

### 1. Fix: Definitions truncated for some words (Dictionary.cpp)

**Root cause:** The `asciiCaseCmp` case-insensitive match introduced in round 2 returns the *first* case variant found in the index. In StarDict order, "Professor" (capitalized) sorts before "professor" (lowercase). If the dictionary has separate entries for each — e.g., "Professor" as a title (short definition) and "professor" as the common noun (full multi-page definition) — the shorter entry is returned.

**Fix:** The linear scan in `searchIndex` now remembers the first case-insensitive match as a fallback, but continues scanning adjacent entries (case variants are always adjacent in StarDict order). If an exact case-sensitive match is found, it's used immediately. Otherwise, the first case-insensitive match is used. This ensures `cleanWord("professor")` → `"professor"` finds the full lowercase entry, not the shorter capitalized one.

**Files:** `src/util/Dictionary.cpp`

### 2. Fix: Non-renderable foreign script characters in definitions (DictionaryDefinitionActivity)

**Root cause:** Dictionary definitions include text from other languages (Chinese, Greek, Arabic, Cyrillic, etc.) as etymological references or examples. These characters aren't in the e-ink bitmap font and render as empty boxes. This is the same class of issue as the IPA pronunciation fix from round 2, but affecting inline content within definitions.

**Fix:**
- Added `isRenderableCodepoint(uint32_t cp)` static helper that whitelists character ranges the e-ink font supports:
  - U+0000–U+024F: Basic Latin through Latin Extended-B (ASCII + accented chars)
  - U+0300–U+036F: Combining Diacritical Marks
  - U+2000–U+206F: General Punctuation (dashes, quotes, bullets, ellipsis)
  - U+20A0–U+20CF: Currency Symbols
  - U+2100–U+214F: Letterlike Symbols
  - U+2190–U+21FF: Arrows
- Replaced the byte-by-byte character append in `parseHtml()` with a UTF-8-aware decoder that reads multi-byte sequences, decodes the codepoint, and only appends renderable characters. Invalid or non-renderable characters are silently skipped.

**Files:** `src/activities/reader/DictionaryDefinitionActivity.h`, `src/activities/reader/DictionaryDefinitionActivity.cpp`

### 3. Fix: Revert to standard-height hints, keep overlap hiding (DictionaryWordSelectActivity)

**What changed:** Reverted from 22px thin custom hints back to the standard 40px theme-style buttons (rounded corners with `cornerRadius=6`, `SMALL_FONT_ID` text, matching `LyraTheme::drawButtonHints` exactly). The overlap detection is preserved.

**Key design choice:** Instead of calling `GUI.drawButtonHints()` (which always clears all 4 button areas, erasing page content even for hidden buttons), the method draws each button individually in portrait mode. Hidden buttons are skipped entirely (`continue`), so the page content and word highlight underneath remain visible. Non-hidden buttons get the full theme treatment: white fill + rounded rect border + centered text.

**Files:** `src/activities/reader/DictionaryWordSelectActivity.cpp`

## Follow-up Items

- The `isRenderableCodepoint` whitelist is conservative — if the font gains additional glyph coverage (e.g., Greek letters for math), the whitelist can be extended
- Entity-decoded characters bypass the codepoint filter since they're appended as raw bytes; this is fine for the current entity set (all produce ASCII or General Punctuation characters)