- Wire blueprints and scheduler into create_app() - Add start_scheduler param to skip scheduler in tests - Fix Setting.get/set to use modern db.session.get() - Remove unused imports from conftest and models - Add README with quick start and usage guide Made-with: Cursor
10 KiB
10 KiB
Plymouth Independent Weekly Newspaper — Design Spec
Goal
Publish a weekly ePub "newspaper" containing articles from the Plymouth Independent RSS feed, optimized for reading on an Xtreink X4 e-reader.
Requirements Summary
- Output: ePub with articles as chapters, chronological order (Monday–Sunday ISO weeks)
- Offline: All images downloaded and embedded
- E-reader formatting: Images fit within 800x480 (landscape) or 480x800 (portrait) bounding box, aspect ratio preserved, baseline JPEG
- Interface: Self-hosted Python web app, accessible via browser from MacBook and Android phone on local network
- Pipeline: Periodic RSS fetch/cache, then manual or scheduled compile-and-publish
- Cover: AI-generated via Pollinations.ai (primary), programmatic text fallback, selectable at publish time
- Article selection: All articles included by default; user can exclude specific ones via UI before publishing
Architecture
Stack
| Component | Choice | Rationale |
|---|---|---|
| Web framework | Flask + Jinja2 | Lightweight, single-process |
| ORM / DB | Flask-SQLAlchemy + SQLite | Zero-config, single-file DB |
| Scheduler | APScheduler (BackgroundScheduler) | In-process, no external dependencies |
| RSS parsing | feedparser | Standard Python RSS library |
| ePub generation | ebooklib | Mature ePub 3 library |
| Image processing | Pillow | Resize, format conversion, text rendering |
| HTML parsing | beautifulsoup4 | Extract images from article HTML |
| HTTP | requests | Feed + image downloads |
| AI cover | Pollinations.ai | Free, no API key, URL-based |
| Frontend | Plain HTML + Pico CSS + vanilla JS | No build step, mobile-friendly |
Project Structure
pi-weekly-newspaper/
├── app.py # Entry point: Flask app + APScheduler setup
├── config.py # Config (feed URL, check interval, image dims, etc.)
├── requirements.txt
├── src/
│ ├── __init__.py
│ ├── fetcher.py # RSS fetch, parse, cache articles to DB
│ ├── images.py # Download images, resize, baseline JPEG conversion
│ ├── epub_builder.py # Assemble ePub from cached articles + images
│ ├── cover.py # Cover generation (Pollinations.ai + text fallback)
│ ├── models.py # SQLAlchemy models (Article, Image, Issue, Settings)
│ └── scheduler.py # APScheduler config, job management
├── static/ # CSS, JS for web UI
├── templates/ # Jinja2 templates for web UI
├── data/
│ ├── newspaper.db # SQLite database (created at runtime)
│ ├── images/ # Downloaded/processed images (runtime)
│ └── issues/ # Generated ePub files (runtime)
└── README.md
Data Flow
- Fetch job (periodic, default every 1 hour): RSS feed → parse → store new articles + metadata in SQLite → download & process images to
data/images/ - Publish action (manual via UI, or auto-scheduled): query articles for target week → user reviews/excludes via UI → generate cover → assemble ePub → save to
data/issues/→ download link available
Data Model
articles
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PK | Auto-increment |
guid |
TEXT UNIQUE | RSS <guid>, deduplication key |
title |
TEXT | Article title |
author |
TEXT | dc:creator value |
pub_date |
DATETIME | Publication timestamp |
categories |
TEXT | JSON array of category strings |
link |
TEXT | Original article URL |
content_html |
TEXT | Full content:encoded HTML with local image refs |
fetched_at |
DATETIME | When we cached it |
images
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PK | Auto-increment |
article_id |
INTEGER FK | References articles.id |
original_url |
TEXT | Source URL from the article HTML |
local_path |
TEXT | Path to processed file in data/images/ |
width |
INTEGER | Final width after resize |
height |
INTEGER | Final height after resize |
issues
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PK | Auto-increment |
week_start |
DATE | Monday of the ISO week |
week_end |
DATE | Sunday of the ISO week |
cover_method |
TEXT | "ai" or "text" |
cover_path |
TEXT | Path to cover image |
epub_path |
TEXT | Path to generated .epub |
article_ids |
TEXT | JSON array of included article IDs |
excluded_article_ids |
TEXT | JSON array of excluded article IDs |
created_at |
DATETIME | When the issue was generated |
status |
TEXT | "draft" / "published" |
settings
| Column | Type | Notes |
|---|---|---|
key |
TEXT PK | Setting name |
value |
TEXT | JSON-encoded value |
Used for: feed URL, fetch interval, auto-publish config, image constraints. Read on startup to restore scheduler state.
Module Details
fetcher.py — RSS Fetch & Article Caching
- Fetch RSS feed via
feedparser+requests - Deduplicate by
guid— skip articles already in DB - Parse each new
<item>: title, author, pub_date, categories, link, content_html - Save article record to SQLite first (to obtain
article_id) - Extract image URLs from
content:encodedHTML usingBeautifulSoupwithhtml.parser - Download & process each image via
images.py— store todata/images/{url_hash}.jpg(deduped by URL hash across all articles) - Create
imagesDB records linkingarticle_idto each processed image - Rewrite
<img src>attributes in storedcontent_htmlto point to local paths - Update the article record with the rewritten
content_html
Edge cases:
- Feed unavailable: log warning, retry next cycle, no crash
- Duplicate images across articles (same URL): download once, reference by URL hash
- Images that 404: log warning, skip image, article still included
- Malformed HTML:
BeautifulSoupwithhtml.parseris tolerant
images.py — Image Processing
- Download image from URL via
requests - Check if
data/images/{url_hash}.jpgalready exists — if so, return cached path (dedup) - Open with Pillow
- Determine orientation: if width >= height → landscape bounding box (800x480), else portrait (480x800)
- Resize to fit within bounding box, preserving aspect ratio:
- If image is larger than the box: use
Image.thumbnail()to scale down - If image is smaller than the box: use
Image.resize()withLANCZOSto scale up, so it renders at a reasonable size on the e-reader
- If image is larger than the box: use
- Save as baseline JPEG (
progressive=False) - Return local path and final dimensions
epub_builder.py — ePub Assembly
- Query articles for target ISO week (Monday–Sunday), minus excluded ones
- Sort chronologically by
pub_date - Build ePub structure with
ebooklib:- Metadata: title ("Plymouth Independent — Week of Apr 7–13, 2026"), language (en)
- Cover: generated JPEG as ePub cover image
- Table of Contents: article titles linked to chapters
- Chapters: one per article, chronological
- Each chapter:
<h1>article title- Author/date byline, category tags
- Article HTML with image
srcrewritten to ePub-internal references - All referenced images embedded as ePub items
- Stylesheet: minimal CSS for e-ink — no colors, high contrast, images
max-width: 100%; display: block - Output:
data/issues/plymouth-independent-2026-W15.epub
cover.py — Cover Generation
AI mode (Pollinations.ai):
- Build a prompt from the week's top headlines: "Newspaper front page illustration for Plymouth Massachusetts local news, featuring: [top 3 titles], classic newspaper style"
- Fetch from
https://image.pollinations.ai/prompt/{encoded_prompt}?width=800&height=480 - Resize/fit to 800x480 bounding box, baseline JPEG
- Overlay masthead text ("Plymouth Independent") and date range using Pillow
ImageDraw
Text fallback mode:
- Create 800x480 Pillow image with white background
- Draw bold "Plymouth Independent" masthead
- Date range subtitle
- List top article headlines
- Save as baseline JPEG
Both modes produce a single baseline JPEG within e-reader constraints.
scheduler.py — Background Scheduling
- APScheduler
BackgroundScheduler, started on app launch - Two jobs:
- RSS fetch:
IntervalTrigger, default every 1 hour - Auto-publish (optional):
CronTrigger, configurable day/time
- RSS fetch:
- Schedule config persisted to
settingstable in SQLite - On startup: read settings from DB, restore scheduler jobs
- Web UI can pause/resume/reconfigure jobs live
Web UI
Five views, all server-rendered with Jinja2. Responsive layout via Pico CSS.
Dashboard (/)
- Scheduler status (running/paused, next fetch, interval)
- Quick stats: articles this week, total cached, latest issue
- Buttons: "Fetch Now", "New Issue"
Articles (/articles)
- Table of cached articles, filterable by week and category
- Columns: title, author, date, categories, thumbnail
- When preparing an issue: checkboxes for include/exclude
Publish (/publish)
- Select target week (defaults to current ISO week)
- Article list with include/exclude toggles (all on by default)
- Cover method picker: "AI Cover" / "Text Cover"
- "Generate Issue" button
- Progress: synchronous POST request with a CSS spinner overlay; generation typically takes 5–15 seconds (dominated by Pollinations.ai round-trip if using AI cover)
- On completion: page reloads with download link and cover preview
Settings (/settings)
- RSS feed URL
- Fetch interval (hours)
- Auto-publish: toggle + day/time + default cover method
- Image resize constraints
Issues Archive (/issues)
- List of past issues: date range, article count, cover thumbnail
- Download link per issue
- "Regenerate" button
Error Handling
| Scenario | Behavior |
|---|---|
| RSS feed down | Log warning, skip cycle, retry next interval |
| Image download fails | Log warning, skip image, include article without it |
| Pollinations.ai fails | Log error, fall back to text cover automatically |
| ePub generation fails | Show error in UI with details, don't save partial issue |
| DB locked (concurrent access) | SQLite WAL mode for better concurrency; scheduler and web requests share the same process |
Future Enhancements (Out of Scope for V1)
- Full web scraping of article pages for richer content
- Email delivery of issues
- Multiple RSS feed support
- Reading progress tracking
- Dark mode cover variants