lib/EpdFont/EpdFont.cpp

#include "EpdFont.h"

#include <Utf8.h>

#include <algorithm>

void EpdFont::getTextBounds(const char* string, const int startX, const int startY, int* minX, int* minY, int* maxX,
                            int* maxY) const {
  *minX = startX;
  *minY = startY;
  *maxX = startX;
  *maxY = startY;

  if (*string == '\0') {
    return;
  }

  int cursorX = startX;
  const int cursorY = startY;
  int lastBaseX = startX;
  int lastBaseAdvance = 0;
  int lastBaseTop = 0;
  bool hasBaseGlyph = false;
  constexpr int MIN_COMBINING_GAP_PX = 1;
  uint32_t cp;
  while ((cp = utf8NextCodepoint(reinterpret_cast<const uint8_t**>(&string)))) {
    const EpdGlyph* glyph = getGlyph(cp);
    if (!glyph) {
      // TODO: Better handle this?
      continue;
    }

    const bool isCombining = utf8IsCombiningMark(cp);
    int raiseBy = 0;
    if (isCombining && hasBaseGlyph) {
      const int currentGap = glyph->top - glyph->height - lastBaseTop;
      if (currentGap < MIN_COMBINING_GAP_PX) {
        raiseBy = MIN_COMBINING_GAP_PX - currentGap;
      }
    }

    const int glyphBaseX = (isCombining && hasBaseGlyph) ? (lastBaseX + lastBaseAdvance / 2) : cursorX;
    const int glyphBaseY = cursorY - raiseBy;

    *minX = std::min(*minX, glyphBaseX + glyph->left);
    *maxX = std::max(*maxX, glyphBaseX + glyph->left + glyph->width);
    *minY = std::min(*minY, glyphBaseY + glyph->top - glyph->height);
    *maxY = std::max(*maxY, glyphBaseY + glyph->top);

    if (!isCombining) {
      lastBaseX = cursorX;
      lastBaseAdvance = glyph->advanceX;
      lastBaseTop = glyph->top;
      hasBaseGlyph = true;
      cursorX += glyph->advanceX;
    }
  }
}

void EpdFont::getTextDimensions(const char* string, int* w, int* h) const {
  int minX = 0, minY = 0, maxX = 0, maxY = 0;

  getTextBounds(string, 0, 0, &minX, &minY, &maxX, &maxY);

  *w = maxX - minX;
  *h = maxY - minY;
}

const EpdGlyph* EpdFont::getGlyph(const uint32_t cp) const {
  const EpdUnicodeInterval* intervals = data->intervals;
  const int count = data->intervalCount;

  if (count == 0) return nullptr;

  // Binary search for O(log n) lookup instead of O(n)
  // Critical for Korean fonts with many unicode intervals
  int left = 0;
  int right = count - 1;

  while (left <= right) {
    const int mid = left + (right - left) / 2;
    const EpdUnicodeInterval* interval = &intervals[mid];

    if (cp < interval->first) {
      right = mid - 1;
    } else if (cp > interval->last) {
      left = mid + 1;
    } else {
      // Found: cp >= interval->first && cp <= interval->last
      return &data->glyph[interval->offset + (cp - interval->first)];
    }
  }
  if (cp != REPLACEMENT_GLYPH) {
    return getGlyph(REPLACEMENT_GLYPH);
  }
  return nullptr;
}
Public release 2025-12-03 22:00:29 +11:00			`#include "EpdFont.h"`

			`#include <Utf8.h>`

Small cleanups from https://github.com/juicecultus/crosspoint-reader-x4 2025-12-30 23:18:51 +11:00			`#include <algorithm>`
Public release 2025-12-03 22:00:29 +11:00
			`void EpdFont::getTextBounds(const char* string, const int startX, const int startY, int* minX, int* minY, int* maxX,`
			`int* maxY) const {`
			`*minX = startX;`
			`*minY = startY;`
			`*maxX = startX;`
			`*maxY = startY;`

			`if (*string == '\0') {`
			`return;`
			`}`

			`int cursorX = startX;`
			`const int cursorY = startY;`
fix: Fix hyphenation and rendering of decomposed characters (#1037) ## Summary * This PR fixes decomposed diacritic handling end-to-end: - Hyphenation: normalize common Latin base+combining sequences to precomposed codepoints before Liang pattern matching, so decomposed words hyphenate correctly - Rendering: correct combining-mark placement logic so non-spacing marks are attached to the preceding base glyph in normal and rotated text rendering paths, with corresponding text-bounds consistency updates. - Hyphenation around non breaking space variants have been fixed (and extended) - Hyphenation of terms that already included of hyphens were fixed to include Liang pattern application (eg "US-Satellitensystem" was exclusively broken at the existing hyphen) ## Additional Context * Before <img width="800" height="480" alt="2" src="https://github.com/user-attachments/assets/b9c515c4-ab75-45cc-8b52-f4d86bce519d" /> * After <img width="480" height="800" alt="fix1" src="https://github.com/user-attachments/assets/4999f6a8-f51c-4c0a-b144-f153f77ddb57" /> <img width="800" height="480" alt="fix2" src="https://github.com/user-attachments/assets/7355126b-80c7-441f-b390-4e0897ee3fb6" /> * Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of common base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix. * Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y. Before: <img width="478" height="167" alt="Image" src="https://github.com/user-attachments/assets/f8db60d5-35b1-4477-96d0-5003b4e4a2a1" /> After: <img width="479" height="180" alt="Image" src="https://github.com/user-attachments/assets/1b48ef97-3a77-475a-8522-23f4aca8e904" /> * This should resolve the issues described in #998 --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _PARTIALLY_ 2026-02-22 03:11:07 +01:00			`int lastBaseX = startX;`
			`int lastBaseAdvance = 0;`
			`int lastBaseTop = 0;`
			`bool hasBaseGlyph = false;`
			`constexpr int MIN_COMBINING_GAP_PX = 1;`
Public release 2025-12-03 22:00:29 +11:00			`uint32_t cp;`
			`while ((cp = utf8NextCodepoint(reinterpret_cast<const uint8_t**>(&string)))) {`
			`const EpdGlyph* glyph = getGlyph(cp);`
			`if (!glyph) {`
			`// TODO: Better handle this?`
			`continue;`
			`}`

fix: Fix hyphenation and rendering of decomposed characters (#1037) ## Summary * This PR fixes decomposed diacritic handling end-to-end: - Hyphenation: normalize common Latin base+combining sequences to precomposed codepoints before Liang pattern matching, so decomposed words hyphenate correctly - Rendering: correct combining-mark placement logic so non-spacing marks are attached to the preceding base glyph in normal and rotated text rendering paths, with corresponding text-bounds consistency updates. - Hyphenation around non breaking space variants have been fixed (and extended) - Hyphenation of terms that already included of hyphens were fixed to include Liang pattern application (eg "US-Satellitensystem" was exclusively broken at the existing hyphen) ## Additional Context * Before <img width="800" height="480" alt="2" src="https://github.com/user-attachments/assets/b9c515c4-ab75-45cc-8b52-f4d86bce519d" /> * After <img width="480" height="800" alt="fix1" src="https://github.com/user-attachments/assets/4999f6a8-f51c-4c0a-b144-f153f77ddb57" /> <img width="800" height="480" alt="fix2" src="https://github.com/user-attachments/assets/7355126b-80c7-441f-b390-4e0897ee3fb6" /> * Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of common base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix. * Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y. Before: <img width="478" height="167" alt="Image" src="https://github.com/user-attachments/assets/f8db60d5-35b1-4477-96d0-5003b4e4a2a1" /> After: <img width="479" height="180" alt="Image" src="https://github.com/user-attachments/assets/1b48ef97-3a77-475a-8522-23f4aca8e904" /> * This should resolve the issues described in #998 --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _PARTIALLY_ 2026-02-22 03:11:07 +01:00			`const bool isCombining = utf8IsCombiningMark(cp);`
			`int raiseBy = 0;`
			`if (isCombining && hasBaseGlyph) {`
			`const int currentGap = glyph->top - glyph->height - lastBaseTop;`
			`if (currentGap < MIN_COMBINING_GAP_PX) {`
			`raiseBy = MIN_COMBINING_GAP_PX - currentGap;`
			`}`
			`}`

			`const int glyphBaseX = (isCombining && hasBaseGlyph) ? (lastBaseX + lastBaseAdvance / 2) : cursorX;`
			`const int glyphBaseY = cursorY - raiseBy;`

			`minX = std::min(minX, glyphBaseX + glyph->left);`
			`maxX = std::max(maxX, glyphBaseX + glyph->left + glyph->width);`
			`minY = std::min(minY, glyphBaseY + glyph->top - glyph->height);`
			`maxY = std::max(maxY, glyphBaseY + glyph->top);`

			`if (!isCombining) {`
			`lastBaseX = cursorX;`
			`lastBaseAdvance = glyph->advanceX;`
			`lastBaseTop = glyph->top;`
			`hasBaseGlyph = true;`
			`cursorX += glyph->advanceX;`
			`}`
Public release 2025-12-03 22:00:29 +11:00			`}`
			`}`

			`void EpdFont::getTextDimensions(const char* string, int* w, int* h) const {`
			`int minX = 0, minY = 0, maxX = 0, maxY = 0;`

			`getTextBounds(string, 0, 0, &minX, &minY, &maxX, &maxY);`

			`*w = maxX - minX;`
			`*h = maxY - minY;`
			`}`

			`const EpdGlyph* EpdFont::getGlyph(const uint32_t cp) const {`
			`const EpdUnicodeInterval* intervals = data->intervals;`
Optimize glyph lookup with binary search (#125) Replace linear O(n) search with binary search O(log n) for unicode interval lookup. Korean fonts have many intervals (~30,000+ glyphs), so this improves text rendering performance during page navigation. ## Summary * What is the goal of this PR? (e.g., Fixes a bug in the user authentication module, Implements the new feature for file uploading.) Replace linear `O(n)` glyph lookup with binary search `O(log n)` to improve text rendering performance during page navigation. * What changes are included? - Modified `EpdFont::getGlyph()` to use binary search instead of linear search for unicode interval lookup - Added early return for empty interval count ## Additional Context * Add any other information that might be helpful for the reviewer (e.g., performance implications, potential risks, specific areas to focus on). - Performance implications: Fonts with many unicode intervals benefit the most. Korean fonts have ~30,000+ glyphs across multiple intervals, but any font with significant glyph coverage (CJK, extended Latin, emoji, etc.) will see improvement. - Complexity: from `O(n)` to `O(log n)` where n = number of unicode intervals. For fonts with 10+ intervals, this reduces lookup iterations significantly. - Risk: Low - the binary search logic is straightforward and the intervals are already sorted by unicode codepoint (required for the original early-exit optimization). 2025-12-26 09:46:17 +09:00			`const int count = data->intervalCount;`

			`if (count == 0) return nullptr;`

			`// Binary search for O(log n) lookup instead of O(n)`
			`// Critical for Korean fonts with many unicode intervals`
			`int left = 0;`
			`int right = count - 1;`

			`while (left <= right) {`
			`const int mid = left + (right - left) / 2;`
			`const EpdUnicodeInterval* interval = &intervals[mid];`

Public release 2025-12-03 22:00:29 +11:00			`if (cp < interval->first) {`
Optimize glyph lookup with binary search (#125) Replace linear O(n) search with binary search O(log n) for unicode interval lookup. Korean fonts have many intervals (~30,000+ glyphs), so this improves text rendering performance during page navigation. ## Summary * What is the goal of this PR? (e.g., Fixes a bug in the user authentication module, Implements the new feature for file uploading.) Replace linear `O(n)` glyph lookup with binary search `O(log n)` to improve text rendering performance during page navigation. * What changes are included? - Modified `EpdFont::getGlyph()` to use binary search instead of linear search for unicode interval lookup - Added early return for empty interval count ## Additional Context * Add any other information that might be helpful for the reviewer (e.g., performance implications, potential risks, specific areas to focus on). - Performance implications: Fonts with many unicode intervals benefit the most. Korean fonts have ~30,000+ glyphs across multiple intervals, but any font with significant glyph coverage (CJK, extended Latin, emoji, etc.) will see improvement. - Complexity: from `O(n)` to `O(log n)` where n = number of unicode intervals. For fonts with 10+ intervals, this reduces lookup iterations significantly. - Risk: Low - the binary search logic is straightforward and the intervals are already sorted by unicode codepoint (required for the original early-exit optimization). 2025-12-26 09:46:17 +09:00			`right = mid - 1;`
			`} else if (cp > interval->last) {`
			`left = mid + 1;`
			`} else {`
			`// Found: cp >= interval->first && cp <= interval->last`
			`return &data->glyph[interval->offset + (cp - interval->first)];`
Public release 2025-12-03 22:00:29 +11:00			`}`
			`}`
refactor: Simplify REPLACEMENT_GLYPH fallback (#1119) ## Summary What is the goal of this PR? Consolidated repeated logic to fall back to REPLACEMENT_GLYPH. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _NO_ 2026-02-23 06:32:50 -06:00			`if (cp != REPLACEMENT_GLYPH) {`
			`return getGlyph(REPLACEMENT_GLYPH);`
			`}`
Public release 2025-12-03 22:00:29 +11:00			`return nullptr;`
			`}`