feat: port upstream KOReader sync PRs (#1185, #1217, #1090)

Port three unmerged upstream PRs with adaptations for the fork's callback-based ActivityWithSubactivity architecture: - PR #1185: Cache KOReader document hash using mtime fingerprint + file size validation to avoid repeated MD5 computation on sync. - PR #1217: Proper KOReader XPath synchronisation via new ChapterXPathIndexer (Expat-based on-demand XHTML parsing) with XPath-first mapping and percentage fallback in ProgressMapper. - PR #1090: Push Progress & Sleep menu option with PUSH_ONLY sync mode. Adapted to fork's callback pattern with deferFinish() for thread-safe completion. Modified to sleep silently on any failure (hash, upload, no credentials) rather than returning to reader. Made-with: Cursor
2026-03-02 05:19:14 -05:00
parent 42011d5977
commit 3628d8eb37
14 changed files with 1031 additions and 44 deletions
--- a/docs/contributing/koreader-sync-xpath-mapping.md
+++ b/docs/contributing/koreader-sync-xpath-mapping.md
@@ -0,0 +1,99 @@
+# KOReader Sync XPath Mapping
+
+This note documents how CrossPoint maps reading positions to and from KOReader sync payloads.
+
+## Problem
+
+CrossPoint internally stores position as:
+
+- `spineIndex` (chapter index, 0-based)
+- `pageNumber` + `totalPages`
+
+KOReader sync payload stores:
+
+- `progress` (XPath-like location)
+- `percentage` (overall progress)
+
+A direct 1:1 mapping is not guaranteed because page layout differs between engines/devices.
+
+## DocFragment Index Convention
+
+KOReader uses **1-based** XPath predicates throughout, following standard XPath conventions.
+The first EPUB spine item is `DocFragment[1]`, the second is `DocFragment[2]`, and so on.
+
+CrossPoint stores spine items as 0-based indices internally. The conversion is:
+
+- **Generating XPath (to KOReader):** `DocFragment[spineIndex + 1]`
+- **Parsing XPath (from KOReader):** `spineIndex = DocFragment[N] - 1`
+
+Reference: [koreader/koreader#11585](https://github.com/koreader/koreader/issues/11585) confirms this
+via a KOReader contributor mapping spine items to DocFragment numbers.
+
+## Current Strategy
+
+### CrossPoint -> KOReader
+
+Implemented in `ProgressMapper::toKOReader`.
+
+1. Compute overall `percentage` from chapter/page.
+2. Attempt to compute a real element-level XPath via `ChapterXPathIndexer::findXPathForProgress`.
+3. If XPath extraction fails, fallback to synthetic chapter path:
+   - `/body/DocFragment[spineIndex + 1]/body`
+
+### KOReader -> CrossPoint
+
+Implemented in `ProgressMapper::toCrossPoint`.
+
+1. Attempt to parse `DocFragment[N]` from incoming XPath; convert N to 0-based `spineIndex = N - 1`.
+2. If valid, attempt XPath-to-offset mapping via `ChapterXPathIndexer::findProgressForXPath`.
+3. Convert resolved intra-spine progress to page estimate.
+4. If XPath path is invalid/unresolvable, fallback to percentage-based chapter/page estimation.
+
+## ChapterXPathIndexer Design
+
+The module reparses **one spine XHTML** on demand using Expat and builds temporary anchors:
+
+Source-of-truth note: XPath anchors are built from the original EPUB spine XHTML bytes (zip item contents), not from CrossPoint's distilled section render cache. This is intentional to preserve KOReader XPath compatibility.
+
+- anchor: `<xpath, textOffset>`
+- `textOffset` counts non-whitespace bytes
+- When multiple anchors exist for the same path, the one with the **smallest** textOffset is used
+  (start of element), not the latest periodic anchor.
+
+Forward lookup (CrossPoint → XPath): uses `upper_bound` to find the last anchor at or before the
+target text offset, ensuring the returned XPath corresponds to the element the user is currently
+inside rather than the next element.
+
+Matching for reverse lookup:
+
+1. exact path match — reported as `exact=yes`
+2. index-insensitive path match (`div[2]` vs `div[3]` tolerated) — reported as `exact=no`
+3. ancestor fallback — reported as `exact=no`
+
+If no match is found, caller must fallback to percentage.
+
+## Memory / Safety Constraints (ESP32-C3)
+
+The implementation intentionally avoids full DOM storage.
+
+- Parse one chapter only.
+- Keep anchors in transient vectors only for duration of call.
+- Free XML parser and chapter byte buffer on all success/failure paths.
+- No persistent cache structures are introduced by this module.
+
+## Known Limitations
+
+- Page number on reverse mapping is still an estimate (renderer differences).
+- XPath mapping intentionally uses original spine XHTML while pagination comes from distilled renderer output, so minor roundtrip page drift is expected.
+- Image-only/low-text chapters may yield coarse anchors.
+- Extremely malformed XHTML can force fallback behavior.
+
+## Operational Logging
+
+`ProgressMapper` logs mapping source in reverse direction:
+
+- `xpath` when XPath mapping path was used
+- `percentage` when fallback path was used
+
+It also logs exactness (`exact=yes/no`) for XPath matches. Note that `exact=yes` is only set for
+a full path match with correct indices; index-insensitive and ancestor matches always log `exact=no`.