Files
NtR-soudcloud-fetcher/docs/plans/2026-03-12-ntr-soundcloud-fetcher-design.md
cottongin 22f6b5cbca design: sync unliked tracks — remove from show and re-compact positions
Nick is the host; if he unlikes a track, the service should respect
that and remove it from the show playlist. Positions re-compact after
removal. The tracks table retains the record for historical reference.

Made-with: Cursor
2026-03-12 01:13:19 -04:00

8.6 KiB

NtR SoundCloud Fetcher — Design Document

Date: 2026-03-12 Status: Approved

Purpose

A service that periodically fetches SoundCloud likes from NicktheRat's profile, builds weekly playlists aligned to his Wednesday 22:00 ET show schedule, and exposes them via a JSON API for an IRC bot to query track info by position number (!1, !2, etc.).

Architecture

Single Python process with three internal responsibilities:

  1. API server — FastAPI on a configurable port, serves playlist data as JSON.
  2. Poller — async background task that fetches Nick's SoundCloud likes every hour.
  3. Supervisor — monitors the poller task, restarts it on failure without affecting the API.

The poller and API run as independent asyncio tasks. If the poller crashes, the supervisor catches the exception, logs it, waits a backoff period, and restarts the poller. The API continues serving from the last-known-good SQLite data.

External process management (systemd with Restart=on-failure) handles whole-process crashes. The service does not try to be its own process manager.

Startup sequence

  1. Open/create SQLite database, run migrations.
  2. Check if the current week's playlist exists. If not (or stale), do an immediate fetch.
  3. Start the API server.
  4. Start the poller on its hourly schedule.

Data Model (SQLite)

tracks

Canonical store of every SoundCloud track Nick has liked.

Column Type Notes
id INTEGER PK SoundCloud track ID (not auto-increment)
title TEXT Track title
artist TEXT track.user.username from the API
permalink_url TEXT Full SoundCloud URL
artwork_url TEXT Nullable
duration_ms INTEGER Duration in milliseconds
license TEXT e.g. cc-by-sa
liked_at TEXT ISO 8601 — when Nick liked it
raw_json TEXT Full track JSON blob

shows

One row per weekly show.

Column Type Notes
id INTEGER PK Auto-increment
week_start TEXT ISO 8601 UTC of the Wednesday 22:00 ET boundary that opens this week
week_end TEXT ISO 8601 UTC of the next Wednesday 22:00 ET boundary
created_at TEXT When this row was created

show_tracks

Join table linking tracks to shows with position.

Column Type Notes
show_id INTEGER FK References shows.id
track_id INTEGER FK References tracks.id
position INTEGER 1-indexed — maps to !1, !2, etc.
UNIQUE (show_id, track_id)

Position assignment: likes sorted by liked_at ascending (oldest first), positions assigned 1, 2, 3... New likes mid-week get the next position; existing positions never shift.

If Nick unlikes a track, the poller removes it from the show and re-compacts positions (e.g. if track at position 2 is removed, position 3 becomes 2). The tracks table retains the record for historical reference, but the show_tracks link is deleted.

API Endpoints

All endpoints return JSON. Base URL: http://localhost:{port}.

Current Week

Method Path Description
GET /playlist Current week's full playlist
GET /playlist/{position} Single track by position

GET /playlist response shape:

{
  "show_id": 12,
  "week_start": "2026-03-12T02:00:00Z",
  "week_end": "2026-03-19T02:00:00Z",
  "tracks": [
    {
      "position": 1,
      "title": "Running Through My Mind",
      "artist": "Purrple Panther",
      "permalink_url": "https://soundcloud.com/...",
      "artwork_url": "https://...",
      "duration_ms": 202909,
      "liked_at": "2026-03-09T02:24:30Z"
    }
  ]
}

GET /playlist/{position} returns a single track object. 404 if the position doesn't exist.

History

Method Path Description
GET /shows List all shows, newest first. Supports ?limit= and ?offset=.
GET /shows/{show_id} Full playlist for a specific show

Admin (bearer token required)

Method Path Description
POST /admin/refresh Trigger immediate SoundCloud fetch. {"full": true} re-fetches the entire week; default is incremental.
POST /admin/tracks Add a track to the current show. Body: {"soundcloud_url": "..."} or {"track_id": 12345, "position": 5}
DELETE /admin/tracks/{track_id} Remove a track from the current show. Remaining positions re-compact.
PUT /admin/tracks/{track_id}/position Move a track to a different position. Body: {"position": 3}

Health

Method Path Description
GET /health Poller status, last fetch time, current week track count

Admin endpoints are protected by a bearer token (NTR_ADMIN_TOKEN). Read endpoints have no auth (localhost only).

Poller & Fetcher Logic

Polling cycle

  1. Compute current week's boundary window (Wednesday 22:00 ET -> next Wednesday 22:00 ET, converted to UTC accounting for DST via zoneinfo).
  2. Ensure a shows row exists for this window.
  3. Fetch likes from SoundCloud.
  4. Sync the show playlist: upsert new tracks, remove unliked tracks, re-compact positions.
  5. Sleep for the configured interval.

Incremental fetching

The poller does not re-fetch all likes every hour. It uses cursor-seeking:

  • First fetch for a new week: craft a synthetic cursor at the week's end boundary, paginate backward until hitting the week's start boundary.
  • Subsequent fetches: craft a cursor at "now", paginate backward until hitting a track already in the database. Most hourly polls fetch a single page or zero pages.
  • Full refresh (POST /admin/refresh with {"full": true}): re-fetches the entire week from scratch, same as the first-fetch path.

client_id management

  • Extract from soundcloud.com HTML (__sc_hydration -> apiClient -> id) on startup.
  • Cache in memory (not persisted — rotates too frequently).
  • On any 401 response, re-extract and retry.
  • If re-extraction fails, log the error and let the next tick retry.

Retry & backoff

Each SoundCloud HTTP call: 3 attempts, exponential backoff (2s, 4s, 8s). 401s trigger client_id refresh before retry (doesn't count against attempts). Request timeout: 15 seconds.

Error scenarios

Scenario Behavior
SoundCloud 401 Refresh client_id, retry
SoundCloud 429 Back off, retry next tick
SoundCloud 5xx Retry with backoff, skip tick after 3 failures
Network timeout Same as 5xx
client_id extraction failure Log error, skip tick, retry next hour
Poller task crash Supervisor restarts after 30s backoff
Nick unlikes a track Track removed from show, positions re-compacted

Project Structure

NtR-soundcloud-fetcher/
├── docs/
│   ├── soundcloud-likes-api.md
│   └── plans/
│       └── 2026-03-12-ntr-soundcloud-fetcher-design.md
├── src/
│   └── ntr_fetcher/
│       ├── __init__.py
│       ├── main.py          # entry point — starts API + poller
│       ├── config.py         # settings from env vars / .env
│       ├── db.py             # SQLite connection, migrations, queries
│       ├── models.py         # dataclasses for Track, Show, ShowTrack
│       ├── soundcloud.py     # client_id extraction, likes fetching
│       ├── poller.py         # polling loop + supervisor
│       ├── api.py            # FastAPI routes
│       └── week.py           # week boundary computation (ET → UTC)
├── tests/
├── pyproject.toml
└── README.md

Dependencies

Package Purpose
fastapi API framework
uvicorn ASGI server
httpx Async HTTP client for SoundCloud
pydantic Config + response models (bundled with FastAPI)

Standard library: sqlite3, zoneinfo, asyncio, dataclasses, json.

No ORM. Raw SQL via sqlite3, wrapped in asyncio.to_thread for async compatibility.

Python 3.11+.

Configuration

Environment variables, loaded by pydantic BaseSettings. Supports .env file.

Variable Default Description
NTR_PORT 8000 API listen port
NTR_HOST 127.0.0.1 API bind address
NTR_DB_PATH ./ntr_fetcher.db SQLite database file path
NTR_POLL_INTERVAL_SECONDS 3600 Polling interval
NTR_ADMIN_TOKEN (required) Bearer token for admin endpoints
NTR_SOUNDCLOUD_USER nicktherat SoundCloud username to track
NTR_SHOW_DAY 2 Day of week (0=Mon, 2=Wed)
NTR_SHOW_HOUR 22 Hour in Eastern Time