Daniel Chelling 83315b6179
perf: optimize large EPUB indexing from O(n^2) to O(n) (#458)
## Summary

Optimizes EPUB metadata indexing for large books (2000+ chapters) from
~30 minutes to ~50 seconds by replacing O(n²) algorithms with O(n log n)
hash-indexed lookups.

Fixes #134

## Problem

Three phases had O(n²) complexity due to nested loops:

| Phase | Operation | Before (2768 chapters) |
|-------|-----------|------------------------|
| OPF Pass | For each spine ref, scan all manifest items | ~25 min |
| TOC Pass | For each TOC entry, scan all spine items | ~5 min |
| buildBookBin | For each spine item, scan ZIP central directory | ~8.4
min |

Total: **~30+ minutes** for first-time indexing of large EPUBs.

## Solution

Replace linear scans with sorted hash indexes + binary search:

- **OPF Pass**: Build `{hash(id), len, offset}` index from manifest,
binary search for each spine ref
- **TOC Pass**: Build `{hash(href), len, spineIndex}` index from spine,
binary search for each TOC entry
- **buildBookBin**: New `ZipFile::fillUncompressedSizes()` API - single
ZIP central directory scan with batch hash matching

All indexes use FNV-1a hashing with length as secondary key to minimize
collisions. Indexes are freed immediately after each phase.

## Results

**Shadow Slave EPUB (2768 chapters):**

| Phase | Before | After | Speedup |
|-------|--------|-------|---------|
| OPF pass | ~25 min | 10.8 sec | ~140x |
| TOC pass | ~5 min | 4.7 sec | ~60x |
| buildBookBin | 506 sec | 34.6 sec | ~15x |
| **Total** | **~30+ min** | **~50 sec** | **~36x** |

**Normal EPUB (87 chapters):** 1.7 sec - no regression.

## Memory

Peak temporary memory during indexing:
- OPF index: ~33KB (2770 items × 12 bytes)
- TOC index: ~33KB (2768 items × 12 bytes)
- ZIP batch: ~44KB (targets + sizes arrays)

All indexes cleared immediately after each phase. No OOM risk on
ESP32-C3.

## Note on Threshold

All optimizations are gated by `LARGE_SPINE_THRESHOLD = 400` to preserve
existing behavior for small books. However, the algorithms work
correctly for any book size and are faster even for small books:

| Book Size | Old O(n²) | New O(n log n) | Improvement |
|-----------|-----------|----------------|-------------|
| 10 ch | 100 ops | 50 ops | 2x |
| 100 ch | 10K ops | 800 ops | 12x |
| 400 ch | 160K ops | 4K ops | 40x |

If preferred, the threshold could be removed to use the optimized path
universally.

## Testing

- [x] Shadow Slave (2768 chapters): 50s first-time indexing, loads and
navigates correctly
- [x] Normal book (87 chapters): 1.7s indexing, no regression
- [x] Build passes
- [x] clang-format passes

## Files Changed

- `lib/Epub/Epub/parsers/ContentOpfParser.h/.cpp` - OPF manifest index
- `lib/Epub/Epub/BookMetadataCache.h/.cpp` - TOC index + batch size
lookup
- `lib/ZipFile/ZipFile.h/.cpp` - New `fillUncompressedSizes()` API
- `lib/Epub/Epub.cpp` - Timing logs

<details>
<summary><b>Algorithm Details</b> (click to expand)</summary>

### Phase 1: OPF Pass - Manifest to Spine Lookup

**Problem**: Each `<itemref idref="ch001">` in spine must find matching
`<item id="ch001" href="...">` in manifest.

```
OLD: For each of 2768 spine refs, scan all 2770 manifest items
     = 7.6M string comparisons

NEW: While parsing manifest, build index:
     { hash("ch001"), len=5, file_offset=120 }
     
     Sort index, then binary search for each spine ref:
     2768 × log₂(2770) ≈ 2768 × 11 = 30K comparisons
```

### Phase 2: TOC Pass - TOC Entry to Spine Index Lookup

**Problem**: Each TOC entry with `href="chapter0001.xhtml"` must find
its spine index.

```
OLD: For each of 2768 TOC entries, scan all 2768 spine entries
     = 7.6M string comparisons

NEW: At beginTocPass(), read spine once and build index:
     { hash("OEBPS/chapter0001.xhtml"), len=25, spineIndex=0 }
     
     Sort index, binary search for each TOC entry:
     2768 × log₂(2768) ≈ 30K comparisons
     
     Clear index at endTocPass() to free memory.
```

### Phase 3: buildBookBin - ZIP Size Lookup

**Problem**: Need uncompressed file size for each spine item (for
reading progress). Sizes are in ZIP central directory.

```
OLD: For each of 2768 spine items, scan ZIP central directory (2773 entries)
     = 7.6M filename reads + string comparisons
     Time: 506 seconds

NEW: 
  Step 1: Build targets from spine
          { hash("OEBPS/chapter0001.xhtml"), len=25, index=0 }
          Sort by (hash, len)
  
  Step 2: Single pass through ZIP central directory
          For each entry:
            - Compute hash ON THE FLY (no string allocation)
            - Binary search targets
            - If match: sizes[target.index] = uncompressedSize
  
  Step 3: Use sizes array directly (O(1) per spine item)
  
  Total: 2773 entries × log₂(2768) ≈ 33K comparisons
  Time: 35 seconds
```

### Why Hash + Length?

Using 64-bit FNV-1a hash + string length as a composite key:
- Collision probability: ~1 in 2⁶⁴ × typical_path_lengths
- No string storage needed in index (just 12-16 bytes per entry)
- Integer comparisons are faster than string comparisons
- Verification on match handles the rare collision case

</details>

---

_AI-assisted development. All changes tested on hardware._
2026-01-28 01:29:15 +11:00
2025-12-03 22:06:45 +11:00
2025-12-30 16:09:30 +11:00
2025-12-03 22:06:45 +11:00
2025-12-03 22:06:45 +11:00
2025-12-03 22:06:45 +11:00
2025-12-03 22:06:45 +11:00
2026-01-22 02:20:22 +11:00

CrossPoint Reader

Firmware for the Xteink X4 e-paper display reader (unaffiliated with Xteink). Built using PlatformIO and targeting the ESP32-C3 microcontroller.

CrossPoint Reader is a purpose-built firmware designed to be a drop-in, fully open-source replacement for the official Xteink firmware. It aims to match or improve upon the standard EPUB reading experience.

Motivation

E-paper devices are fantastic for reading, but most commercially available readers are closed systems with limited customisation. The Xteink X4 is an affordable, e-paper device, however the official firmware remains closed. CrossPoint exists partly as a fun side-project and partly to open up the ecosystem and truely unlock the device's potential.

CrossPoint Reader aims to:

  • Provide a fully open-source alternative to the official firmware.
  • Offer a document reader capable of handling EPUB content on constrained hardware.
  • Support customisable font, layout, and display options.
  • Run purely on the Xteink X4 hardware.

This project is not affiliated with Xteink; it's built as a community project.

Features & Usage

  • EPUB parsing and rendering (EPUB 2 and EPUB 3)
  • Image support within EPUB
  • Saved reading position
  • File explorer with file picker
    • Basic EPUB picker from root directory
    • Support nested folders
    • EPUB picker with cover art
  • Custom sleep screen
    • Cover sleep screen
  • Wifi book upload
  • Wifi OTA updates
  • Configurable font, layout, and display options
    • User provided fonts
    • Full UTF support
  • Screen rotation

Multi-language support: Read EPUBs in various languages, including English, Spanish, French, German, Italian, Portuguese, Russian, Ukrainian, Polish, Swedish, Norwegian, and more.

See the user guide for instructions on operating CrossPoint.

Installing

Web (latest firmware)

  1. Connect your Xteink X4 to your computer via USB-C
  2. Go to https://xteink.dve.al/ and click "Flash CrossPoint firmware"

To revert back to the official firmware, you can flash the latest official firmware from https://xteink.dve.al/, or swap back to the other partition using the "Swap boot partition" button here https://xteink.dve.al/debug.

Web (specific firmware version)

  1. Connect your Xteink X4 to your computer via USB-C
  2. Download the firmware.bin file from the release of your choice via the releases page
  3. Go to https://xteink.dve.al/ and flash the firmware file using the "OTA fast flash controls" section

To revert back to the official firmware, you can flash the latest official firmware from https://xteink.dve.al/, or swap back to the other partition using the "Swap boot partition" button here https://xteink.dve.al/debug.

Manual

See Development below.

Development

Prerequisites

  • PlatformIO Core (pio) or VS Code + PlatformIO IDE
  • Python 3.8+
  • USB-C cable for flashing the ESP32-C3
  • Xteink X4

Checking out the code

CrossPoint uses PlatformIO for building and flashing the firmware. To get started, clone the repository:

git clone --recursive https://github.com/daveallie/crosspoint-reader

# Or, if you've already cloned without --recursive:
git submodule update --init --recursive

Flashing your device

Connect your Xteink X4 to your computer via USB-C and run the following command.

pio run --target upload

Internals

CrossPoint Reader is pretty aggressive about caching data down to the SD card to minimise RAM usage. The ESP32-C3 only has ~380KB of usable RAM, so we have to be careful. A lot of the decisions made in the design of the firmware were based on this constraint.

Data caching

The first time chapters of a book are loaded, they are cached to the SD card. Subsequent loads are served from the cache. This cache directory exists at .crosspoint on the SD card. The structure is as follows:

.crosspoint/
├── epub_12471232/       # Each EPUB is cached to a subdirectory named `epub_<hash>`
│   ├── progress.bin     # Stores reading progress (chapter, page, etc.)
│   ├── cover.bmp        # Book cover image (once generated)
│   ├── book.bin         # Book metadata (title, author, spine, table of contents, etc.)
│   └── sections/        # All chapter data is stored in the sections subdirectory
│       ├── 0.bin        # Chapter data (screen count, all text layout info, etc.)
│       ├── 1.bin        #     files are named by their index in the spine
│       └── ...
│
└── epub_189013891/

Deleting the .crosspoint directory will clear the entire cache.

Due the way it's currently implemented, the cache is not automatically cleared when a book is deleted and moving a book file will use a new cache directory, resetting the reading progress.

For more details on the internal file structures, see the file formats document.

Contributing

Contributions are very welcome!

If you're looking for a way to help out, take a look at the ideas discussion board. If there's something there you'd like to work on, leave a comment so that we can avoid duplicated effort.

To submit a contribution:

  1. Fork the repo
  2. Create a branch (feature/dithering-improvement)
  3. Make changes
  4. Submit a PR

CrossPoint Reader is not affiliated with Xteink or any manufacturer of the X4 hardware.

Huge shoutout to diy-esp32-epub-reader by atomic14, which was a project I took a lot of inspiration from as I was making CrossPoint.

Description
Firmware for the Xteink X4 e-paper display reader
Readme MIT 133 MiB
Languages
C 95.1%
C++ 4.3%
Python 0.3%
HTML 0.2%