Files
crosspoint-reader-mod/lib/Epub/Epub/ParsedText.h

51 lines
2.5 KiB
C
Raw Normal View History

#pragma once
#include <EpdFontFamily.h>
#include <functional>
#include <memory>
#include <string>
#include <vector>
feat: Add CSS parsing and CSS support in EPUBs (#411) ## Summary * **What is the goal of this PR?** - Adds basic CSS parsing to EPUBs and determine the CSS rules when rendering to the screen so that text is styled correctly. Currently supports bold, underline, italics, margin, padding, and text alignment ## Additional Context - My main reason for wanting this is that the book I'm currently reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl series), relies _a lot_ on styled text for telling parts of the story. When text is bolded, it's supposed to be a message that's rendered "on-screen" in the story. When characters are "chatting" with each other, the text is bolded and their names are underlined. Plus, normal emphasis is provided with italicizing words here and there. So, this greatly improves my experience reading this book on the Xteink, and I figured it was useful enough for others too. - For transparency: I'm a software engineer, but I'm mostly frontend and TypeScript/JavaScript. It's been _years_ since I did any C/C++, so I would not be surprised if I'm doing something dumb along the way in this code. Please don't hesitate to ask for changes if something looks off. I heavily relied on Claude Code for help, and I had a lot of inspiration from how [microreader](https://github.com/CidVonHighwind/microreader) achieves their CSS parsing and styling. I did give this as good of a code review as I could and went through everything, and _it works on my machine_ 😄 ### Before ![IMG_6271](https://github.com/user-attachments/assets/dba7554d-efb6-4d13-88bc-8b83cd1fc615) ![IMG_6272](https://github.com/user-attachments/assets/61ba2de0-87c9-4f39-956f-013da4fe20a4) ### After ![IMG_6268](https://github.com/user-attachments/assets/ebe11796-cca9-4a46-b9c7-0709c7932818) ![IMG_6269](https://github.com/user-attachments/assets/e89c33dc-ff47-4bb7-855e-863fe44b3202) --- ### AI Usage Did you use AI tools to help write this code? **YES**, Claude Code
2026-02-05 05:28:10 -05:00
#include "blocks/BlockStyle.h"
#include "blocks/TextBlock.h"
class GfxRenderer;
class ParsedText {
perf: Replace std::list with std::vector in text layout (#1038) ## Summary _Revision to @blindbat's #802. Description comes from the original PR._ - Replace `std::list` with `std::vector` for word storage in `TextBlock` and `ParsedText` - Use index-based access (`words[i]`) instead of iterator advancement (`std::advance(it, n)`) - Remove the separate `continuesVec` copy that was built from `wordContinues` for O(1) access — now unnecessary since `std::vector<bool>` already provides O(1) indexing ## Why `std::list` allocates each node individually on the heap with 16 bytes of prev/next pointer overhead per node. For text layout with many small words, this means: - Scattered heap allocations instead of contiguous memory - Poor cache locality during iteration (each node can be anywhere in memory) - Per-node malloc/free overhead during construction and destruction `std::vector` stores elements contiguously, giving better cache performance during the tight rendering and layout loops. The `extractLine` function also benefits: list splice was O(1) but required maintaining three parallel iterators, while vector range construction with move iterators is simpler and still efficient for the small line-sized chunks involved. ## Files changed - `lib/Epub/Epub/blocks/TextBlock.h` / `.cpp` - `lib/Epub/Epub/ParsedText.h` / `.cpp` ## AI Usage YES ## Test plan - [ ] Open an EPUB with mixed formatting (bold, italic, underline) — verify text renders correctly - [ ] Open a book with justified text — verify word spacing is correct - [ ] Open a book with hyphenation enabled — verify words break correctly at hyphens - [ ] Navigate through pages rapidly — verify no rendering glitches or crashes - [ ] Open a book with long paragraphs — verify text layout matches pre-change behavior --------- Co-authored-by: Kuanysh Bekkulov <kbekkulov@gmail.com>
2026-02-21 22:28:56 -06:00
std::vector<std::string> words;
std::vector<EpdFontFamily::Style> wordStyles;
std::vector<bool> wordContinues; // true = word attaches to previous (no space before it)
feat: Add CSS parsing and CSS support in EPUBs (#411) ## Summary * **What is the goal of this PR?** - Adds basic CSS parsing to EPUBs and determine the CSS rules when rendering to the screen so that text is styled correctly. Currently supports bold, underline, italics, margin, padding, and text alignment ## Additional Context - My main reason for wanting this is that the book I'm currently reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl series), relies _a lot_ on styled text for telling parts of the story. When text is bolded, it's supposed to be a message that's rendered "on-screen" in the story. When characters are "chatting" with each other, the text is bolded and their names are underlined. Plus, normal emphasis is provided with italicizing words here and there. So, this greatly improves my experience reading this book on the Xteink, and I figured it was useful enough for others too. - For transparency: I'm a software engineer, but I'm mostly frontend and TypeScript/JavaScript. It's been _years_ since I did any C/C++, so I would not be surprised if I'm doing something dumb along the way in this code. Please don't hesitate to ask for changes if something looks off. I heavily relied on Claude Code for help, and I had a lot of inspiration from how [microreader](https://github.com/CidVonHighwind/microreader) achieves their CSS parsing and styling. I did give this as good of a code review as I could and went through everything, and _it works on my machine_ 😄 ### Before ![IMG_6271](https://github.com/user-attachments/assets/dba7554d-efb6-4d13-88bc-8b83cd1fc615) ![IMG_6272](https://github.com/user-attachments/assets/61ba2de0-87c9-4f39-956f-013da4fe20a4) ### After ![IMG_6268](https://github.com/user-attachments/assets/ebe11796-cca9-4a46-b9c7-0709c7932818) ![IMG_6269](https://github.com/user-attachments/assets/e89c33dc-ff47-4bb7-855e-863fe44b3202) --- ### AI Usage Did you use AI tools to help write this code? **YES**, Claude Code
2026-02-05 05:28:10 -05:00
BlockStyle blockStyle;
bool extraParagraphSpacing;
bool hyphenationEnabled;
void applyParagraphIndent();
std::vector<size_t> computeLineBreaks(const GfxRenderer& renderer, int fontId, int pageWidth,
std::vector<uint16_t>& wordWidths, std::vector<bool>& continuesVec);
std::vector<size_t> computeHyphenatedLineBreaks(const GfxRenderer& renderer, int fontId, int pageWidth,
std::vector<uint16_t>& wordWidths,
std::vector<bool>& continuesVec);
bool hyphenateWordAtIndex(size_t wordIndex, int availableWidth, const GfxRenderer& renderer, int fontId,
perf: Replace std::list with std::vector in text layout (#1038) ## Summary _Revision to @blindbat's #802. Description comes from the original PR._ - Replace `std::list` with `std::vector` for word storage in `TextBlock` and `ParsedText` - Use index-based access (`words[i]`) instead of iterator advancement (`std::advance(it, n)`) - Remove the separate `continuesVec` copy that was built from `wordContinues` for O(1) access — now unnecessary since `std::vector<bool>` already provides O(1) indexing ## Why `std::list` allocates each node individually on the heap with 16 bytes of prev/next pointer overhead per node. For text layout with many small words, this means: - Scattered heap allocations instead of contiguous memory - Poor cache locality during iteration (each node can be anywhere in memory) - Per-node malloc/free overhead during construction and destruction `std::vector` stores elements contiguously, giving better cache performance during the tight rendering and layout loops. The `extractLine` function also benefits: list splice was O(1) but required maintaining three parallel iterators, while vector range construction with move iterators is simpler and still efficient for the small line-sized chunks involved. ## Files changed - `lib/Epub/Epub/blocks/TextBlock.h` / `.cpp` - `lib/Epub/Epub/ParsedText.h` / `.cpp` ## AI Usage YES ## Test plan - [ ] Open an EPUB with mixed formatting (bold, italic, underline) — verify text renders correctly - [ ] Open a book with justified text — verify word spacing is correct - [ ] Open a book with hyphenation enabled — verify words break correctly at hyphens - [ ] Navigate through pages rapidly — verify no rendering glitches or crashes - [ ] Open a book with long paragraphs — verify text layout matches pre-change behavior --------- Co-authored-by: Kuanysh Bekkulov <kbekkulov@gmail.com>
2026-02-21 22:28:56 -06:00
std::vector<uint16_t>& wordWidths, bool allowFallbackBreaks);
void extractLine(size_t breakIndex, int pageWidth, const std::vector<uint16_t>& wordWidths,
const std::vector<bool>& continuesVec, const std::vector<size_t>& lineBreakIndices,
feat: Support for kerning and ligatures (#873) ## Summary **What is the goal of this PR?** Improved typesetting, including [kerning](https://en.wikipedia.org/wiki/Kerning) and [ligatures](https://en.wikipedia.org/wiki/Ligature_(writing)#Latin_alphabet). **What changes are included?** - The script to convert built-in fonts now adds kerning and ligature information to the generated font headers. - Epub page layout calculates proper kerning spaces and makes ligature substitutions according to the selected font. ![3U1B1808](https://github.com/user-attachments/assets/1accb16f-2f1a-41e5-adca-89f1f1348494) ![3U1B1810](https://github.com/user-attachments/assets/2f6bd007-490e-420f-b774-3380b4add7ea) ![3U1B1815](https://github.com/user-attachments/assets/1986bb77-2db0-46e2-a5d6-8315dae9eb19) ## Additional Context - I am not a typography expert. - The implementation has been reworked from the earlier version, so it is no longer necessary to omit Open Dyslexic, and kerning data now covers all fonts, styles, and codepoints for which we include bitmap data. - Claude Opus 4.6 helped with a lot of this. - There's an included test epub document with lots of kerning and ligature examples, shown in the photos. **_After some time to mature, I think this change is in decent shape to merge and get people testing._** After opening this PR I came across #660, which overlaps in adding ligature support. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**YES, Claude Opus 4.6**_ --------- Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-24 02:31:43 -06:00
const std::function<void(std::shared_ptr<TextBlock>)>& processLine, const GfxRenderer& renderer,
int fontId);
std::vector<uint16_t> calculateWordWidths(const GfxRenderer& renderer, int fontId);
public:
feat: Add CSS parsing and CSS support in EPUBs (#411) ## Summary * **What is the goal of this PR?** - Adds basic CSS parsing to EPUBs and determine the CSS rules when rendering to the screen so that text is styled correctly. Currently supports bold, underline, italics, margin, padding, and text alignment ## Additional Context - My main reason for wanting this is that the book I'm currently reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl series), relies _a lot_ on styled text for telling parts of the story. When text is bolded, it's supposed to be a message that's rendered "on-screen" in the story. When characters are "chatting" with each other, the text is bolded and their names are underlined. Plus, normal emphasis is provided with italicizing words here and there. So, this greatly improves my experience reading this book on the Xteink, and I figured it was useful enough for others too. - For transparency: I'm a software engineer, but I'm mostly frontend and TypeScript/JavaScript. It's been _years_ since I did any C/C++, so I would not be surprised if I'm doing something dumb along the way in this code. Please don't hesitate to ask for changes if something looks off. I heavily relied on Claude Code for help, and I had a lot of inspiration from how [microreader](https://github.com/CidVonHighwind/microreader) achieves their CSS parsing and styling. I did give this as good of a code review as I could and went through everything, and _it works on my machine_ 😄 ### Before ![IMG_6271](https://github.com/user-attachments/assets/dba7554d-efb6-4d13-88bc-8b83cd1fc615) ![IMG_6272](https://github.com/user-attachments/assets/61ba2de0-87c9-4f39-956f-013da4fe20a4) ### After ![IMG_6268](https://github.com/user-attachments/assets/ebe11796-cca9-4a46-b9c7-0709c7932818) ![IMG_6269](https://github.com/user-attachments/assets/e89c33dc-ff47-4bb7-855e-863fe44b3202) --- ### AI Usage Did you use AI tools to help write this code? **YES**, Claude Code
2026-02-05 05:28:10 -05:00
explicit ParsedText(const bool extraParagraphSpacing, const bool hyphenationEnabled = false,
const BlockStyle& blockStyle = BlockStyle())
: blockStyle(blockStyle), extraParagraphSpacing(extraParagraphSpacing), hyphenationEnabled(hyphenationEnabled) {}
~ParsedText() = default;
void addWord(std::string word, EpdFontFamily::Style fontStyle, bool underline = false, bool attachToPrevious = false);
feat: Add CSS parsing and CSS support in EPUBs (#411) ## Summary * **What is the goal of this PR?** - Adds basic CSS parsing to EPUBs and determine the CSS rules when rendering to the screen so that text is styled correctly. Currently supports bold, underline, italics, margin, padding, and text alignment ## Additional Context - My main reason for wanting this is that the book I'm currently reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl series), relies _a lot_ on styled text for telling parts of the story. When text is bolded, it's supposed to be a message that's rendered "on-screen" in the story. When characters are "chatting" with each other, the text is bolded and their names are underlined. Plus, normal emphasis is provided with italicizing words here and there. So, this greatly improves my experience reading this book on the Xteink, and I figured it was useful enough for others too. - For transparency: I'm a software engineer, but I'm mostly frontend and TypeScript/JavaScript. It's been _years_ since I did any C/C++, so I would not be surprised if I'm doing something dumb along the way in this code. Please don't hesitate to ask for changes if something looks off. I heavily relied on Claude Code for help, and I had a lot of inspiration from how [microreader](https://github.com/CidVonHighwind/microreader) achieves their CSS parsing and styling. I did give this as good of a code review as I could and went through everything, and _it works on my machine_ 😄 ### Before ![IMG_6271](https://github.com/user-attachments/assets/dba7554d-efb6-4d13-88bc-8b83cd1fc615) ![IMG_6272](https://github.com/user-attachments/assets/61ba2de0-87c9-4f39-956f-013da4fe20a4) ### After ![IMG_6268](https://github.com/user-attachments/assets/ebe11796-cca9-4a46-b9c7-0709c7932818) ![IMG_6269](https://github.com/user-attachments/assets/e89c33dc-ff47-4bb7-855e-863fe44b3202) --- ### AI Usage Did you use AI tools to help write this code? **YES**, Claude Code
2026-02-05 05:28:10 -05:00
void setBlockStyle(const BlockStyle& blockStyle) { this->blockStyle = blockStyle; }
BlockStyle& getBlockStyle() { return blockStyle; }
size_t size() const { return words.size(); }
bool isEmpty() const { return words.empty(); }
void layoutAndExtractLines(const GfxRenderer& renderer, int fontId, uint16_t viewportWidth,
const std::function<void(std::shared_ptr<TextBlock>)>& processLine,
bool includeLastLine = true);
};