Files
crosspoint-reader-mod/docs/contributing/koreader-sync-xpath-mapping.md
cottongin dfbc931c14 mod: Phase 1 - bring forward mod-exclusive files with ActivityManager migration
Brings ~55 mod-exclusive files to the upstream-based mod/master-resync branch:

Activities (migrated to new ActivityManager pattern):
- Clock/Time: SetTimeActivity, SetTimezoneOffsetActivity, NtpSyncActivity
- Dictionary: DictionaryDefinitionActivity, DictionarySuggestionsActivity,
  DictionaryWordSelectActivity, LookedUpWordsActivity
- Bookmark: EpubReaderBookmarkSelectionActivity
- Book management: BookManageMenuActivity, EndOfBookMenuActivity
- OPDS: OpdsServerListActivity, OpdsSettingsActivity
- Utility: DirectoryPickerActivity, NumericStepperActivity

Utilities (unchanged):
- BookManager, BookSettings, BookmarkStore, BootNtpSync
- Dictionary, LookupHistory, TimeSync, OpdsServerStore

Libraries: PlaceholderCover, TableData, ChapterXPathIndexer
Scripts: inject_mod_version, generate_book_icon, preview_placeholder_cover
Docs: KOReader sync XPath mapping

Migration changes:
- ActivityWithSubactivity -> Activity base class
- Callback constructors -> finish()/setResult() pattern
- enterNewActivity() -> startActivityForResult()
- Activity::RenderLock&& -> RenderLock&&

These files won't compile yet - they reference mod settings and I18n
strings that will be added in subsequent phases.

Made-with: Cursor
2026-03-07 15:10:00 -05:00

3.9 KiB

KOReader Sync XPath Mapping

This note documents how CrossPoint maps reading positions to and from KOReader sync payloads.

Problem

CrossPoint internally stores position as:

  • spineIndex (chapter index, 0-based)
  • pageNumber + totalPages

KOReader sync payload stores:

  • progress (XPath-like location)
  • percentage (overall progress)

A direct 1:1 mapping is not guaranteed because page layout differs between engines/devices.

DocFragment Index Convention

KOReader uses 1-based XPath predicates throughout, following standard XPath conventions. The first EPUB spine item is DocFragment[1], the second is DocFragment[2], and so on.

CrossPoint stores spine items as 0-based indices internally. The conversion is:

  • Generating XPath (to KOReader): DocFragment[spineIndex + 1]
  • Parsing XPath (from KOReader): spineIndex = DocFragment[N] - 1

Reference: koreader/koreader#11585 confirms this via a KOReader contributor mapping spine items to DocFragment numbers.

Current Strategy

CrossPoint -> KOReader

Implemented in ProgressMapper::toKOReader.

  1. Compute overall percentage from chapter/page.
  2. Attempt to compute a real element-level XPath via ChapterXPathIndexer::findXPathForProgress.
  3. If XPath extraction fails, fallback to synthetic chapter path:
    • /body/DocFragment[spineIndex + 1]/body

KOReader -> CrossPoint

Implemented in ProgressMapper::toCrossPoint.

  1. Attempt to parse DocFragment[N] from incoming XPath; convert N to 0-based spineIndex = N - 1.
  2. If valid, attempt XPath-to-offset mapping via ChapterXPathIndexer::findProgressForXPath.
  3. Convert resolved intra-spine progress to page estimate.
  4. If XPath path is invalid/unresolvable, fallback to percentage-based chapter/page estimation.

ChapterXPathIndexer Design

The module reparses one spine XHTML on demand using Expat and builds temporary anchors:

Source-of-truth note: XPath anchors are built from the original EPUB spine XHTML bytes (zip item contents), not from CrossPoint's distilled section render cache. This is intentional to preserve KOReader XPath compatibility.

  • anchor: <xpath, textOffset>
  • textOffset counts non-whitespace bytes
  • When multiple anchors exist for the same path, the one with the smallest textOffset is used (start of element), not the latest periodic anchor.

Forward lookup (CrossPoint → XPath): uses upper_bound to find the last anchor at or before the target text offset, ensuring the returned XPath corresponds to the element the user is currently inside rather than the next element.

Matching for reverse lookup:

  1. exact path match — reported as exact=yes
  2. index-insensitive path match (div[2] vs div[3] tolerated) — reported as exact=no
  3. ancestor fallback — reported as exact=no

If no match is found, caller must fallback to percentage.

Memory / Safety Constraints (ESP32-C3)

The implementation intentionally avoids full DOM storage.

  • Parse one chapter only.
  • Keep anchors in transient vectors only for duration of call.
  • Free XML parser and chapter byte buffer on all success/failure paths.
  • No persistent cache structures are introduced by this module.

Known Limitations

  • Page number on reverse mapping is still an estimate (renderer differences).
  • XPath mapping intentionally uses original spine XHTML while pagination comes from distilled renderer output, so minor roundtrip page drift is expected.
  • Image-only/low-text chapters may yield coarse anchors.
  • Extremely malformed XHTML can force fallback behavior.

Operational Logging

ProgressMapper logs mapping source in reverse direction:

  • xpath when XPath mapping path was used
  • percentage when fallback path was used

It also logs exactness (exact=yes/no) for XPath matches. Note that exact=yes is only set for a full path match with correct indices; index-insensitive and ancestor matches always log exact=no.