Files
pi-weekly-newspaper/docs/superpowers/specs/2026-04-06-pi-weekly-newspaper-design.md
cottongin 5c924f7dba feat: full integration — app.py wiring, scheduler startup, route registration, README
- Wire blueprints and scheduler into create_app()
- Add start_scheduler param to skip scheduler in tests
- Fix Setting.get/set to use modern db.session.get()
- Remove unused imports from conftest and models
- Add README with quick start and usage guide

Made-with: Cursor
2026-04-06 15:22:38 -04:00

10 KiB
Raw Blame History

Plymouth Independent Weekly Newspaper — Design Spec

Goal

Publish a weekly ePub "newspaper" containing articles from the Plymouth Independent RSS feed, optimized for reading on an Xtreink X4 e-reader.

Requirements Summary

  • Output: ePub with articles as chapters, chronological order (MondaySunday ISO weeks)
  • Offline: All images downloaded and embedded
  • E-reader formatting: Images fit within 800x480 (landscape) or 480x800 (portrait) bounding box, aspect ratio preserved, baseline JPEG
  • Interface: Self-hosted Python web app, accessible via browser from MacBook and Android phone on local network
  • Pipeline: Periodic RSS fetch/cache, then manual or scheduled compile-and-publish
  • Cover: AI-generated via Pollinations.ai (primary), programmatic text fallback, selectable at publish time
  • Article selection: All articles included by default; user can exclude specific ones via UI before publishing

Architecture

Stack

Component Choice Rationale
Web framework Flask + Jinja2 Lightweight, single-process
ORM / DB Flask-SQLAlchemy + SQLite Zero-config, single-file DB
Scheduler APScheduler (BackgroundScheduler) In-process, no external dependencies
RSS parsing feedparser Standard Python RSS library
ePub generation ebooklib Mature ePub 3 library
Image processing Pillow Resize, format conversion, text rendering
HTML parsing beautifulsoup4 Extract images from article HTML
HTTP requests Feed + image downloads
AI cover Pollinations.ai Free, no API key, URL-based
Frontend Plain HTML + Pico CSS + vanilla JS No build step, mobile-friendly

Project Structure

pi-weekly-newspaper/
├── app.py                  # Entry point: Flask app + APScheduler setup
├── config.py               # Config (feed URL, check interval, image dims, etc.)
├── requirements.txt
├── src/
│   ├── __init__.py
│   ├── fetcher.py          # RSS fetch, parse, cache articles to DB
│   ├── images.py           # Download images, resize, baseline JPEG conversion
│   ├── epub_builder.py     # Assemble ePub from cached articles + images
│   ├── cover.py            # Cover generation (Pollinations.ai + text fallback)
│   ├── models.py           # SQLAlchemy models (Article, Image, Issue, Settings)
│   └── scheduler.py        # APScheduler config, job management
├── static/                 # CSS, JS for web UI
├── templates/              # Jinja2 templates for web UI
├── data/
│   ├── newspaper.db        # SQLite database (created at runtime)
│   ├── images/             # Downloaded/processed images (runtime)
│   └── issues/             # Generated ePub files (runtime)
└── README.md

Data Flow

  1. Fetch job (periodic, default every 1 hour): RSS feed → parse → store new articles + metadata in SQLite → download & process images to data/images/
  2. Publish action (manual via UI, or auto-scheduled): query articles for target week → user reviews/excludes via UI → generate cover → assemble ePub → save to data/issues/ → download link available

Data Model

articles

Column Type Notes
id INTEGER PK Auto-increment
guid TEXT UNIQUE RSS <guid>, deduplication key
title TEXT Article title
author TEXT dc:creator value
pub_date DATETIME Publication timestamp
categories TEXT JSON array of category strings
link TEXT Original article URL
content_html TEXT Full content:encoded HTML with local image refs
fetched_at DATETIME When we cached it

images

Column Type Notes
id INTEGER PK Auto-increment
article_id INTEGER FK References articles.id
original_url TEXT Source URL from the article HTML
local_path TEXT Path to processed file in data/images/
width INTEGER Final width after resize
height INTEGER Final height after resize

issues

Column Type Notes
id INTEGER PK Auto-increment
week_start DATE Monday of the ISO week
week_end DATE Sunday of the ISO week
cover_method TEXT "ai" or "text"
cover_path TEXT Path to cover image
epub_path TEXT Path to generated .epub
article_ids TEXT JSON array of included article IDs
excluded_article_ids TEXT JSON array of excluded article IDs
created_at DATETIME When the issue was generated
status TEXT "draft" / "published"

settings

Column Type Notes
key TEXT PK Setting name
value TEXT JSON-encoded value

Used for: feed URL, fetch interval, auto-publish config, image constraints. Read on startup to restore scheduler state.


Module Details

fetcher.py — RSS Fetch & Article Caching

  1. Fetch RSS feed via feedparser + requests
  2. Deduplicate by guid — skip articles already in DB
  3. Parse each new <item>: title, author, pub_date, categories, link, content_html
  4. Save article record to SQLite first (to obtain article_id)
  5. Extract image URLs from content:encoded HTML using BeautifulSoup with html.parser
  6. Download & process each image via images.py — store to data/images/{url_hash}.jpg (deduped by URL hash across all articles)
  7. Create images DB records linking article_id to each processed image
  8. Rewrite <img src> attributes in stored content_html to point to local paths
  9. Update the article record with the rewritten content_html

Edge cases:

  • Feed unavailable: log warning, retry next cycle, no crash
  • Duplicate images across articles (same URL): download once, reference by URL hash
  • Images that 404: log warning, skip image, article still included
  • Malformed HTML: BeautifulSoup with html.parser is tolerant

images.py — Image Processing

  1. Download image from URL via requests
  2. Check if data/images/{url_hash}.jpg already exists — if so, return cached path (dedup)
  3. Open with Pillow
  4. Determine orientation: if width >= height → landscape bounding box (800x480), else portrait (480x800)
  5. Resize to fit within bounding box, preserving aspect ratio:
    • If image is larger than the box: use Image.thumbnail() to scale down
    • If image is smaller than the box: use Image.resize() with LANCZOS to scale up, so it renders at a reasonable size on the e-reader
  6. Save as baseline JPEG (progressive=False)
  7. Return local path and final dimensions

epub_builder.py — ePub Assembly

  1. Query articles for target ISO week (MondaySunday), minus excluded ones
  2. Sort chronologically by pub_date
  3. Build ePub structure with ebooklib:
    • Metadata: title ("Plymouth Independent — Week of Apr 713, 2026"), language (en)
    • Cover: generated JPEG as ePub cover image
    • Table of Contents: article titles linked to chapters
    • Chapters: one per article, chronological
  4. Each chapter:
    • <h1> article title
    • Author/date byline, category tags
    • Article HTML with image src rewritten to ePub-internal references
    • All referenced images embedded as ePub items
  5. Stylesheet: minimal CSS for e-ink — no colors, high contrast, images max-width: 100%; display: block
  6. Output: data/issues/plymouth-independent-2026-W15.epub

cover.py — Cover Generation

AI mode (Pollinations.ai):

  1. Build a prompt from the week's top headlines: "Newspaper front page illustration for Plymouth Massachusetts local news, featuring: [top 3 titles], classic newspaper style"
  2. Fetch from https://image.pollinations.ai/prompt/{encoded_prompt}?width=800&height=480
  3. Resize/fit to 800x480 bounding box, baseline JPEG
  4. Overlay masthead text ("Plymouth Independent") and date range using Pillow ImageDraw

Text fallback mode:

  1. Create 800x480 Pillow image with white background
  2. Draw bold "Plymouth Independent" masthead
  3. Date range subtitle
  4. List top article headlines
  5. Save as baseline JPEG

Both modes produce a single baseline JPEG within e-reader constraints.

scheduler.py — Background Scheduling

  • APScheduler BackgroundScheduler, started on app launch
  • Two jobs:
    1. RSS fetch: IntervalTrigger, default every 1 hour
    2. Auto-publish (optional): CronTrigger, configurable day/time
  • Schedule config persisted to settings table in SQLite
  • On startup: read settings from DB, restore scheduler jobs
  • Web UI can pause/resume/reconfigure jobs live

Web UI

Five views, all server-rendered with Jinja2. Responsive layout via Pico CSS.

Dashboard (/)

  • Scheduler status (running/paused, next fetch, interval)
  • Quick stats: articles this week, total cached, latest issue
  • Buttons: "Fetch Now", "New Issue"

Articles (/articles)

  • Table of cached articles, filterable by week and category
  • Columns: title, author, date, categories, thumbnail
  • When preparing an issue: checkboxes for include/exclude

Publish (/publish)

  • Select target week (defaults to current ISO week)
  • Article list with include/exclude toggles (all on by default)
  • Cover method picker: "AI Cover" / "Text Cover"
  • "Generate Issue" button
  • Progress: synchronous POST request with a CSS spinner overlay; generation typically takes 515 seconds (dominated by Pollinations.ai round-trip if using AI cover)
  • On completion: page reloads with download link and cover preview

Settings (/settings)

  • RSS feed URL
  • Fetch interval (hours)
  • Auto-publish: toggle + day/time + default cover method
  • Image resize constraints

Issues Archive (/issues)

  • List of past issues: date range, article count, cover thumbnail
  • Download link per issue
  • "Regenerate" button

Error Handling

Scenario Behavior
RSS feed down Log warning, skip cycle, retry next interval
Image download fails Log warning, skip image, include article without it
Pollinations.ai fails Log error, fall back to text cover automatically
ePub generation fails Show error in UI with details, don't save partial issue
DB locked (concurrent access) SQLite WAL mode for better concurrency; scheduler and web requests share the same process

Future Enhancements (Out of Scope for V1)

  • Full web scraping of article pages for richer content
  • Email delivery of issues
  • Multiple RSS feed support
  • Reading progress tracking
  • Dark mode cover variants