Design for a SoundCloud likes fetcher service that builds weekly playlists for Nick the Rat Radio and serves them via JSON API. Made-with: Cursor
8.5 KiB
NtR SoundCloud Fetcher — Design Document
Date: 2026-03-12 Status: Approved
Purpose
A service that periodically fetches SoundCloud likes from NicktheRat's profile, builds weekly playlists aligned to his Wednesday 22:00 ET show schedule, and exposes them via a JSON API for an IRC bot to query track info by position number (!1, !2, etc.).
Architecture
Single Python process with three internal responsibilities:
- API server — FastAPI on a configurable port, serves playlist data as JSON.
- Poller — async background task that fetches Nick's SoundCloud likes every hour.
- Supervisor — monitors the poller task, restarts it on failure without affecting the API.
The poller and API run as independent asyncio tasks. If the poller crashes, the supervisor catches the exception, logs it, waits a backoff period, and restarts the poller. The API continues serving from the last-known-good SQLite data.
External process management (systemd with Restart=on-failure) handles whole-process crashes. The service does not try to be its own process manager.
Startup sequence
- Open/create SQLite database, run migrations.
- Check if the current week's playlist exists. If not (or stale), do an immediate fetch.
- Start the API server.
- Start the poller on its hourly schedule.
Data Model (SQLite)
tracks
Canonical store of every SoundCloud track Nick has liked.
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PK | SoundCloud track ID (not auto-increment) |
title |
TEXT | Track title |
artist |
TEXT | track.user.username from the API |
permalink_url |
TEXT | Full SoundCloud URL |
artwork_url |
TEXT | Nullable |
duration_ms |
INTEGER | Duration in milliseconds |
license |
TEXT | e.g. cc-by-sa |
liked_at |
TEXT | ISO 8601 — when Nick liked it |
raw_json |
TEXT | Full track JSON blob |
shows
One row per weekly show.
| Column | Type | Notes |
|---|---|---|
id |
INTEGER PK | Auto-increment |
week_start |
TEXT | ISO 8601 UTC of the Wednesday 22:00 ET boundary that opens this week |
week_end |
TEXT | ISO 8601 UTC of the next Wednesday 22:00 ET boundary |
created_at |
TEXT | When this row was created |
show_tracks
Join table linking tracks to shows with position.
| Column | Type | Notes |
|---|---|---|
show_id |
INTEGER FK | References shows.id |
track_id |
INTEGER FK | References tracks.id |
position |
INTEGER | 1-indexed — maps to !1, !2, etc. |
| UNIQUE | (show_id, track_id) |
Position assignment: likes sorted by liked_at ascending (oldest first), positions assigned 1, 2, 3... New likes mid-week get the next position; existing positions never shift.
Once a track is assigned a position in a show, it stays even if Nick unlikes it. Admin endpoints exist for manual corrections.
API Endpoints
All endpoints return JSON. Base URL: http://localhost:{port}.
Current Week
| Method | Path | Description |
|---|---|---|
GET |
/playlist |
Current week's full playlist |
GET |
/playlist/{position} |
Single track by position |
GET /playlist response shape:
{
"show_id": 12,
"week_start": "2026-03-12T02:00:00Z",
"week_end": "2026-03-19T02:00:00Z",
"tracks": [
{
"position": 1,
"title": "Running Through My Mind",
"artist": "Purrple Panther",
"permalink_url": "https://soundcloud.com/...",
"artwork_url": "https://...",
"duration_ms": 202909,
"liked_at": "2026-03-09T02:24:30Z"
}
]
}
GET /playlist/{position} returns a single track object. 404 if the position doesn't exist.
History
| Method | Path | Description |
|---|---|---|
GET |
/shows |
List all shows, newest first. Supports ?limit= and ?offset=. |
GET |
/shows/{show_id} |
Full playlist for a specific show |
Admin (bearer token required)
| Method | Path | Description |
|---|---|---|
POST |
/admin/refresh |
Trigger immediate SoundCloud fetch. {"full": true} re-fetches the entire week; default is incremental. |
POST |
/admin/tracks |
Add a track to the current show. Body: {"soundcloud_url": "..."} or {"track_id": 12345, "position": 5} |
DELETE |
/admin/tracks/{track_id} |
Remove a track from the current show. Remaining positions re-compact. |
PUT |
/admin/tracks/{track_id}/position |
Move a track to a different position. Body: {"position": 3} |
Health
| Method | Path | Description |
|---|---|---|
GET |
/health |
Poller status, last fetch time, current week track count |
Admin endpoints are protected by a bearer token (NTR_ADMIN_TOKEN). Read endpoints have no auth (localhost only).
Poller & Fetcher Logic
Polling cycle
- Compute current week's boundary window (Wednesday 22:00 ET -> next Wednesday 22:00 ET, converted to UTC accounting for DST via
zoneinfo). - Ensure a
showsrow exists for this window. - Fetch likes from SoundCloud.
- Upsert new tracks, assign positions, update
show_tracks. - Sleep for the configured interval.
Incremental fetching
The poller does not re-fetch all likes every hour. It uses cursor-seeking:
- First fetch for a new week: craft a synthetic cursor at the week's end boundary, paginate backward until hitting the week's start boundary.
- Subsequent fetches: craft a cursor at "now", paginate backward until hitting a track already in the database. Most hourly polls fetch a single page or zero pages.
- Full refresh (
POST /admin/refreshwith{"full": true}): re-fetches the entire week from scratch, same as the first-fetch path.
client_id management
- Extract from
soundcloud.comHTML (__sc_hydration->apiClient->id) on startup. - Cache in memory (not persisted — rotates too frequently).
- On any 401 response, re-extract and retry.
- If re-extraction fails, log the error and let the next tick retry.
Retry & backoff
Each SoundCloud HTTP call: 3 attempts, exponential backoff (2s, 4s, 8s). 401s trigger client_id refresh before retry (doesn't count against attempts). Request timeout: 15 seconds.
Error scenarios
| Scenario | Behavior |
|---|---|
| SoundCloud 401 | Refresh client_id, retry |
| SoundCloud 429 | Back off, retry next tick |
| SoundCloud 5xx | Retry with backoff, skip tick after 3 failures |
| Network timeout | Same as 5xx |
client_id extraction failure |
Log error, skip tick, retry next hour |
| Poller task crash | Supervisor restarts after 30s backoff |
| Nick unlikes a track | Track stays in show — positions are stable |
Project Structure
NtR-soundcloud-fetcher/
├── docs/
│ ├── soundcloud-likes-api.md
│ └── plans/
│ └── 2026-03-12-ntr-soundcloud-fetcher-design.md
├── src/
│ └── ntr_fetcher/
│ ├── __init__.py
│ ├── main.py # entry point — starts API + poller
│ ├── config.py # settings from env vars / .env
│ ├── db.py # SQLite connection, migrations, queries
│ ├── models.py # dataclasses for Track, Show, ShowTrack
│ ├── soundcloud.py # client_id extraction, likes fetching
│ ├── poller.py # polling loop + supervisor
│ ├── api.py # FastAPI routes
│ └── week.py # week boundary computation (ET → UTC)
├── tests/
├── pyproject.toml
└── README.md
Dependencies
| Package | Purpose |
|---|---|
fastapi |
API framework |
uvicorn |
ASGI server |
httpx |
Async HTTP client for SoundCloud |
pydantic |
Config + response models (bundled with FastAPI) |
Standard library: sqlite3, zoneinfo, asyncio, dataclasses, json.
No ORM. Raw SQL via sqlite3, wrapped in asyncio.to_thread for async compatibility.
Python 3.11+.
Configuration
Environment variables, loaded by pydantic BaseSettings. Supports .env file.
| Variable | Default | Description |
|---|---|---|
NTR_PORT |
8000 |
API listen port |
NTR_HOST |
127.0.0.1 |
API bind address |
NTR_DB_PATH |
./ntr_fetcher.db |
SQLite database file path |
NTR_POLL_INTERVAL_SECONDS |
3600 |
Polling interval |
NTR_ADMIN_TOKEN |
(required) | Bearer token for admin endpoints |
NTR_SOUNDCLOUD_USER |
nicktherat |
SoundCloud username to track |
NTR_SHOW_DAY |
2 |
Day of week (0=Mon, 2=Wed) |
NTR_SHOW_HOUR |
22 |
Hour in Eastern Time |