perf: Optimize HTML entities lookup to O(log(n)) (#1194)
## Summary **What is the goal of this PR?** Replace the linear scan of `lookupHtmlEntity` with a simple binary search to improve lookup performance. **What changes are included?** `lib/Epub/Epub/Entities/htmlEntities.cpp`: - Sorted the `ENTITY_LOOKUP` array. - Added a compile-time assertion to guarantee the array remains sorted. - Rewrote `lookupHtmlEntity` to use a binary search. ## Additional Context Benchmarked on my x64 laptop (probably will be different on RISC-V) ``` === Benchmark (53 entities x 10000 iterations) === Version Total time Avg per lookup ---------------------------------------------- linear 236.97 ms total 447.11 ns/lookup binary search 22.09 ms total 41.68 ns/lookup === Summary === Binary search is 10.73x faster than linear scan. ``` This is a simplified alternative to #1180, focused on keeping the implementation clean, and maintainable. ### AI Usage Did you use AI tools to help write this code? _**< NO >**_ --------- Co-authored-by: Zach Nelson <zach@zdnelson.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
// from
|
||||
// based on
|
||||
// https://github.com/atomic14/diy-esp32-epub-reader/blob/2c2f57fdd7e2a788d14a0bcb26b9e845a47aac42/lib/Epub/RubbishHtmlParser/htmlEntities.cpp
|
||||
|
||||
#pragma once
|
||||
@@ -6,4 +6,4 @@
|
||||
|
||||
// Lookup a single HTML entity (including & and ;) and return its UTF-8 value
|
||||
// Returns nullptr if entity is not found
|
||||
const char* lookupHtmlEntity(const char* entity, int len);
|
||||
const char* lookupHtmlEntity(const char* entity, size_t len);
|
||||
|
||||
Reference in New Issue
Block a user