feat: full integration — app.py wiring, scheduler startup, route registration, README
- Wire blueprints and scheduler into create_app() - Add start_scheduler param to skip scheduler in tests - Fix Setting.get/set to use modern db.session.get() - Remove unused imports from conftest and models - Add README with quick start and usage guide Made-with: Cursor
This commit is contained in:
2773
docs/superpowers/plans/2026-04-06-pi-weekly-newspaper.md
Normal file
2773
docs/superpowers/plans/2026-04-06-pi-weekly-newspaper.md
Normal file
File diff suppressed because it is too large
Load Diff
251
docs/superpowers/specs/2026-04-06-pi-weekly-newspaper-design.md
Normal file
251
docs/superpowers/specs/2026-04-06-pi-weekly-newspaper-design.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# Plymouth Independent Weekly Newspaper — Design Spec
|
||||
|
||||
## Goal
|
||||
|
||||
Publish a weekly ePub "newspaper" containing articles from the Plymouth Independent RSS feed, optimized for reading on an Xtreink X4 e-reader.
|
||||
|
||||
## Requirements Summary
|
||||
|
||||
- **Output:** ePub with articles as chapters, chronological order (Monday–Sunday ISO weeks)
|
||||
- **Offline:** All images downloaded and embedded
|
||||
- **E-reader formatting:** Images fit within 800x480 (landscape) or 480x800 (portrait) bounding box, aspect ratio preserved, baseline JPEG
|
||||
- **Interface:** Self-hosted Python web app, accessible via browser from MacBook and Android phone on local network
|
||||
- **Pipeline:** Periodic RSS fetch/cache, then manual or scheduled compile-and-publish
|
||||
- **Cover:** AI-generated via Pollinations.ai (primary), programmatic text fallback, selectable at publish time
|
||||
- **Article selection:** All articles included by default; user can exclude specific ones via UI before publishing
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Stack
|
||||
|
||||
| Component | Choice | Rationale |
|
||||
|---|---|---|
|
||||
| Web framework | Flask + Jinja2 | Lightweight, single-process |
|
||||
| ORM / DB | Flask-SQLAlchemy + SQLite | Zero-config, single-file DB |
|
||||
| Scheduler | APScheduler (BackgroundScheduler) | In-process, no external dependencies |
|
||||
| RSS parsing | feedparser | Standard Python RSS library |
|
||||
| ePub generation | ebooklib | Mature ePub 3 library |
|
||||
| Image processing | Pillow | Resize, format conversion, text rendering |
|
||||
| HTML parsing | beautifulsoup4 | Extract images from article HTML |
|
||||
| HTTP | requests | Feed + image downloads |
|
||||
| AI cover | Pollinations.ai | Free, no API key, URL-based |
|
||||
| Frontend | Plain HTML + Pico CSS + vanilla JS | No build step, mobile-friendly |
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
pi-weekly-newspaper/
|
||||
├── app.py # Entry point: Flask app + APScheduler setup
|
||||
├── config.py # Config (feed URL, check interval, image dims, etc.)
|
||||
├── requirements.txt
|
||||
├── src/
|
||||
│ ├── __init__.py
|
||||
│ ├── fetcher.py # RSS fetch, parse, cache articles to DB
|
||||
│ ├── images.py # Download images, resize, baseline JPEG conversion
|
||||
│ ├── epub_builder.py # Assemble ePub from cached articles + images
|
||||
│ ├── cover.py # Cover generation (Pollinations.ai + text fallback)
|
||||
│ ├── models.py # SQLAlchemy models (Article, Image, Issue, Settings)
|
||||
│ └── scheduler.py # APScheduler config, job management
|
||||
├── static/ # CSS, JS for web UI
|
||||
├── templates/ # Jinja2 templates for web UI
|
||||
├── data/
|
||||
│ ├── newspaper.db # SQLite database (created at runtime)
|
||||
│ ├── images/ # Downloaded/processed images (runtime)
|
||||
│ └── issues/ # Generated ePub files (runtime)
|
||||
└── README.md
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Fetch job** (periodic, default every 1 hour): RSS feed → parse → store new articles + metadata in SQLite → download & process images to `data/images/`
|
||||
2. **Publish action** (manual via UI, or auto-scheduled): query articles for target week → user reviews/excludes via UI → generate cover → assemble ePub → save to `data/issues/` → download link available
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### `articles`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `id` | INTEGER PK | Auto-increment |
|
||||
| `guid` | TEXT UNIQUE | RSS `<guid>`, deduplication key |
|
||||
| `title` | TEXT | Article title |
|
||||
| `author` | TEXT | `dc:creator` value |
|
||||
| `pub_date` | DATETIME | Publication timestamp |
|
||||
| `categories` | TEXT | JSON array of category strings |
|
||||
| `link` | TEXT | Original article URL |
|
||||
| `content_html` | TEXT | Full `content:encoded` HTML with local image refs |
|
||||
| `fetched_at` | DATETIME | When we cached it |
|
||||
|
||||
### `images`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `id` | INTEGER PK | Auto-increment |
|
||||
| `article_id` | INTEGER FK | References `articles.id` |
|
||||
| `original_url` | TEXT | Source URL from the article HTML |
|
||||
| `local_path` | TEXT | Path to processed file in `data/images/` |
|
||||
| `width` | INTEGER | Final width after resize |
|
||||
| `height` | INTEGER | Final height after resize |
|
||||
|
||||
### `issues`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `id` | INTEGER PK | Auto-increment |
|
||||
| `week_start` | DATE | Monday of the ISO week |
|
||||
| `week_end` | DATE | Sunday of the ISO week |
|
||||
| `cover_method` | TEXT | `"ai"` or `"text"` |
|
||||
| `cover_path` | TEXT | Path to cover image |
|
||||
| `epub_path` | TEXT | Path to generated `.epub` |
|
||||
| `article_ids` | TEXT | JSON array of included article IDs |
|
||||
| `excluded_article_ids` | TEXT | JSON array of excluded article IDs |
|
||||
| `created_at` | DATETIME | When the issue was generated |
|
||||
| `status` | TEXT | `"draft"` / `"published"` |
|
||||
|
||||
### `settings`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `key` | TEXT PK | Setting name |
|
||||
| `value` | TEXT | JSON-encoded value |
|
||||
|
||||
Used for: feed URL, fetch interval, auto-publish config, image constraints. Read on startup to restore scheduler state.
|
||||
|
||||
---
|
||||
|
||||
## Module Details
|
||||
|
||||
### `fetcher.py` — RSS Fetch & Article Caching
|
||||
|
||||
1. Fetch RSS feed via `feedparser` + `requests`
|
||||
2. Deduplicate by `guid` — skip articles already in DB
|
||||
3. Parse each new `<item>`: title, author, pub_date, categories, link, content_html
|
||||
4. **Save article record to SQLite first** (to obtain `article_id`)
|
||||
5. Extract image URLs from `content:encoded` HTML using `BeautifulSoup` with `html.parser`
|
||||
6. Download & process each image via `images.py` — store to `data/images/{url_hash}.jpg` (deduped by URL hash across all articles)
|
||||
7. Create `images` DB records linking `article_id` to each processed image
|
||||
8. Rewrite `<img src>` attributes in stored `content_html` to point to local paths
|
||||
9. Update the article record with the rewritten `content_html`
|
||||
|
||||
**Edge cases:**
|
||||
- Feed unavailable: log warning, retry next cycle, no crash
|
||||
- Duplicate images across articles (same URL): download once, reference by URL hash
|
||||
- Images that 404: log warning, skip image, article still included
|
||||
- Malformed HTML: `BeautifulSoup` with `html.parser` is tolerant
|
||||
|
||||
### `images.py` — Image Processing
|
||||
|
||||
1. Download image from URL via `requests`
|
||||
2. Check if `data/images/{url_hash}.jpg` already exists — if so, return cached path (dedup)
|
||||
3. Open with Pillow
|
||||
4. Determine orientation: if width >= height → landscape bounding box (800x480), else portrait (480x800)
|
||||
5. Resize to fit within bounding box, preserving aspect ratio:
|
||||
- If image is **larger** than the box: use `Image.thumbnail()` to scale down
|
||||
- If image is **smaller** than the box: use `Image.resize()` with `LANCZOS` to scale up, so it renders at a reasonable size on the e-reader
|
||||
6. Save as baseline JPEG (`progressive=False`)
|
||||
7. Return local path and final dimensions
|
||||
|
||||
### `epub_builder.py` — ePub Assembly
|
||||
|
||||
1. Query articles for target ISO week (Monday–Sunday), minus excluded ones
|
||||
2. Sort chronologically by `pub_date`
|
||||
3. Build ePub structure with `ebooklib`:
|
||||
- **Metadata:** title ("Plymouth Independent — Week of Apr 7–13, 2026"), language (en)
|
||||
- **Cover:** generated JPEG as ePub cover image
|
||||
- **Table of Contents:** article titles linked to chapters
|
||||
- **Chapters:** one per article, chronological
|
||||
4. Each chapter:
|
||||
- `<h1>` article title
|
||||
- Author/date byline, category tags
|
||||
- Article HTML with image `src` rewritten to ePub-internal references
|
||||
- All referenced images embedded as ePub items
|
||||
5. Stylesheet: minimal CSS for e-ink — no colors, high contrast, images `max-width: 100%; display: block`
|
||||
6. Output: `data/issues/plymouth-independent-2026-W15.epub`
|
||||
|
||||
### `cover.py` — Cover Generation
|
||||
|
||||
**AI mode (Pollinations.ai):**
|
||||
1. Build a prompt from the week's top headlines: "Newspaper front page illustration for Plymouth Massachusetts local news, featuring: [top 3 titles], classic newspaper style"
|
||||
2. Fetch from `https://image.pollinations.ai/prompt/{encoded_prompt}?width=800&height=480`
|
||||
3. Resize/fit to 800x480 bounding box, baseline JPEG
|
||||
4. Overlay masthead text ("Plymouth Independent") and date range using Pillow `ImageDraw`
|
||||
|
||||
**Text fallback mode:**
|
||||
1. Create 800x480 Pillow image with white background
|
||||
2. Draw bold "Plymouth Independent" masthead
|
||||
3. Date range subtitle
|
||||
4. List top article headlines
|
||||
5. Save as baseline JPEG
|
||||
|
||||
Both modes produce a single baseline JPEG within e-reader constraints.
|
||||
|
||||
### `scheduler.py` — Background Scheduling
|
||||
|
||||
- APScheduler `BackgroundScheduler`, started on app launch
|
||||
- Two jobs:
|
||||
1. **RSS fetch:** `IntervalTrigger`, default every 1 hour
|
||||
2. **Auto-publish** (optional): `CronTrigger`, configurable day/time
|
||||
- Schedule config persisted to `settings` table in SQLite
|
||||
- On startup: read settings from DB, restore scheduler jobs
|
||||
- Web UI can pause/resume/reconfigure jobs live
|
||||
|
||||
---
|
||||
|
||||
## Web UI
|
||||
|
||||
Five views, all server-rendered with Jinja2. Responsive layout via Pico CSS.
|
||||
|
||||
### Dashboard (`/`)
|
||||
- Scheduler status (running/paused, next fetch, interval)
|
||||
- Quick stats: articles this week, total cached, latest issue
|
||||
- Buttons: "Fetch Now", "New Issue"
|
||||
|
||||
### Articles (`/articles`)
|
||||
- Table of cached articles, filterable by week and category
|
||||
- Columns: title, author, date, categories, thumbnail
|
||||
- When preparing an issue: checkboxes for include/exclude
|
||||
|
||||
### Publish (`/publish`)
|
||||
- Select target week (defaults to current ISO week)
|
||||
- Article list with include/exclude toggles (all on by default)
|
||||
- Cover method picker: "AI Cover" / "Text Cover"
|
||||
- "Generate Issue" button
|
||||
- Progress: synchronous POST request with a CSS spinner overlay; generation typically takes 5–15 seconds (dominated by Pollinations.ai round-trip if using AI cover)
|
||||
- On completion: page reloads with download link and cover preview
|
||||
|
||||
### Settings (`/settings`)
|
||||
- RSS feed URL
|
||||
- Fetch interval (hours)
|
||||
- Auto-publish: toggle + day/time + default cover method
|
||||
- Image resize constraints
|
||||
|
||||
### Issues Archive (`/issues`)
|
||||
- List of past issues: date range, article count, cover thumbnail
|
||||
- Download link per issue
|
||||
- "Regenerate" button
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Behavior |
|
||||
|---|---|
|
||||
| RSS feed down | Log warning, skip cycle, retry next interval |
|
||||
| Image download fails | Log warning, skip image, include article without it |
|
||||
| Pollinations.ai fails | Log error, fall back to text cover automatically |
|
||||
| ePub generation fails | Show error in UI with details, don't save partial issue |
|
||||
| DB locked (concurrent access) | SQLite WAL mode for better concurrency; scheduler and web requests share the same process |
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements (Out of Scope for V1)
|
||||
|
||||
- Full web scraping of article pages for richer content
|
||||
- Email delivery of issues
|
||||
- Multiple RSS feed support
|
||||
- Reading progress tracking
|
||||
- Dark mode cover variants
|
||||
Reference in New Issue
Block a user