## Summary
Replace per-pixel getRenderMode() + rotateCoordinates() + bounds checks
with a DirectPixelWriter struct that pre-computes orientation and render
mode state once per row. Use bitwise ops instead of division/modulo for
cache pixel packing. Skip PNG cache allocation when buffer exceeds 48KB
(framebuffer size) since PNG decode is fast enough that caching provides
minimal benefit, and the large buffer competes with the 44KB PNG decoder
for heap.
## Additional Context
Measured improvements on ESP32-C3 @ 160MHz:
- JPEG decode: 5-7% faster (1:1 scale)
- PNG decode: 15-20% faster (1:1 scale)
- Cache renders: 3-6% faster across both formats
- Eliminates "Failed to allocate cache buffer" errors for large PNGs
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? _**< PARTIALLY >**_