From 48801afa76483b0f240d2e9882a4775642280403 Mon Sep 17 00:00:00 2001 From: cottongin Date: Thu, 12 Mar 2026 01:01:14 -0400 Subject: [PATCH] Add design doc and SoundCloud API reference Design for a SoundCloud likes fetcher service that builds weekly playlists for Nick the Rat Radio and serves them via JSON API. Made-with: Cursor --- .gitignore | 5 + ...026-03-12-ntr-soundcloud-fetcher-design.md | 224 ++++++++++++++++++ docs/soundcloud-likes-api.md | 1 + 3 files changed, 230 insertions(+) create mode 100644 .gitignore create mode 100644 docs/plans/2026-03-12-ntr-soundcloud-fetcher-design.md create mode 120000 docs/soundcloud-likes-api.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c651e27 --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +.DS_Store +*.db +.env +__pycache__/ +*.pyc diff --git a/docs/plans/2026-03-12-ntr-soundcloud-fetcher-design.md b/docs/plans/2026-03-12-ntr-soundcloud-fetcher-design.md new file mode 100644 index 0000000..fb0cfb9 --- /dev/null +++ b/docs/plans/2026-03-12-ntr-soundcloud-fetcher-design.md @@ -0,0 +1,224 @@ +# NtR SoundCloud Fetcher — Design Document + +> **Date**: 2026-03-12 +> **Status**: Approved + +## Purpose + +A service that periodically fetches SoundCloud likes from NicktheRat's profile, builds weekly playlists aligned to his Wednesday 22:00 ET show schedule, and exposes them via a JSON API for an IRC bot to query track info by position number (`!1`, `!2`, etc.). + +## Architecture + +Single Python process with three internal responsibilities: + +1. **API server** — FastAPI on a configurable port, serves playlist data as JSON. +2. **Poller** — async background task that fetches Nick's SoundCloud likes every hour. +3. **Supervisor** — monitors the poller task, restarts it on failure without affecting the API. + +The poller and API run as independent `asyncio` tasks. If the poller crashes, the supervisor catches the exception, logs it, waits a backoff period, and restarts the poller. The API continues serving from the last-known-good SQLite data. + +External process management (systemd with `Restart=on-failure`) handles whole-process crashes. The service does not try to be its own process manager. + +### Startup sequence + +1. Open/create SQLite database, run migrations. +2. Check if the current week's playlist exists. If not (or stale), do an immediate fetch. +3. Start the API server. +4. Start the poller on its hourly schedule. + +## Data Model (SQLite) + +### `tracks` + +Canonical store of every SoundCloud track Nick has liked. + +| Column | Type | Notes | +|--------|------|-------| +| `id` | INTEGER PK | SoundCloud track ID (not auto-increment) | +| `title` | TEXT | Track title | +| `artist` | TEXT | `track.user.username` from the API | +| `permalink_url` | TEXT | Full SoundCloud URL | +| `artwork_url` | TEXT | Nullable | +| `duration_ms` | INTEGER | Duration in milliseconds | +| `license` | TEXT | e.g. `cc-by-sa` | +| `liked_at` | TEXT | ISO 8601 — when Nick liked it | +| `raw_json` | TEXT | Full track JSON blob | + +### `shows` + +One row per weekly show. + +| Column | Type | Notes | +|--------|------|-------| +| `id` | INTEGER PK | Auto-increment | +| `week_start` | TEXT | ISO 8601 UTC of the Wednesday 22:00 ET boundary that opens this week | +| `week_end` | TEXT | ISO 8601 UTC of the next Wednesday 22:00 ET boundary | +| `created_at` | TEXT | When this row was created | + +### `show_tracks` + +Join table linking tracks to shows with position. + +| Column | Type | Notes | +|--------|------|-------| +| `show_id` | INTEGER FK | References `shows.id` | +| `track_id` | INTEGER FK | References `tracks.id` | +| `position` | INTEGER | 1-indexed — maps to `!1`, `!2`, etc. | +| UNIQUE | | `(show_id, track_id)` | + +Position assignment: likes sorted by `liked_at` ascending (oldest first), positions assigned 1, 2, 3... New likes mid-week get the next position; existing positions never shift. + +Once a track is assigned a position in a show, it stays even if Nick unlikes it. Admin endpoints exist for manual corrections. + +## API Endpoints + +All endpoints return JSON. Base URL: `http://localhost:{port}`. + +### Current Week + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/playlist` | Current week's full playlist | +| `GET` | `/playlist/{position}` | Single track by position | + +`GET /playlist` response shape: + +```json +{ + "show_id": 12, + "week_start": "2026-03-12T02:00:00Z", + "week_end": "2026-03-19T02:00:00Z", + "tracks": [ + { + "position": 1, + "title": "Running Through My Mind", + "artist": "Purrple Panther", + "permalink_url": "https://soundcloud.com/...", + "artwork_url": "https://...", + "duration_ms": 202909, + "liked_at": "2026-03-09T02:24:30Z" + } + ] +} +``` + +`GET /playlist/{position}` returns a single track object. 404 if the position doesn't exist. + +### History + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/shows` | List all shows, newest first. Supports `?limit=` and `?offset=`. | +| `GET` | `/shows/{show_id}` | Full playlist for a specific show | + +### Admin (bearer token required) + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/admin/refresh` | Trigger immediate SoundCloud fetch. `{"full": true}` re-fetches the entire week; default is incremental. | +| `POST` | `/admin/tracks` | Add a track to the current show. Body: `{"soundcloud_url": "..."}` or `{"track_id": 12345, "position": 5}` | +| `DELETE` | `/admin/tracks/{track_id}` | Remove a track from the current show. Remaining positions re-compact. | +| `PUT` | `/admin/tracks/{track_id}/position` | Move a track to a different position. Body: `{"position": 3}` | + +### Health + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/health` | Poller status, last fetch time, current week track count | + +Admin endpoints are protected by a bearer token (`NTR_ADMIN_TOKEN`). Read endpoints have no auth (localhost only). + +## Poller & Fetcher Logic + +### Polling cycle + +1. Compute current week's boundary window (Wednesday 22:00 ET -> next Wednesday 22:00 ET, converted to UTC accounting for DST via `zoneinfo`). +2. Ensure a `shows` row exists for this window. +3. Fetch likes from SoundCloud. +4. Upsert new tracks, assign positions, update `show_tracks`. +5. Sleep for the configured interval. + +### Incremental fetching + +The poller does not re-fetch all likes every hour. It uses cursor-seeking: + +- **First fetch for a new week**: craft a synthetic cursor at the week's end boundary, paginate backward until hitting the week's start boundary. +- **Subsequent fetches**: craft a cursor at "now", paginate backward until hitting a track already in the database. Most hourly polls fetch a single page or zero pages. +- **Full refresh** (`POST /admin/refresh` with `{"full": true}`): re-fetches the entire week from scratch, same as the first-fetch path. + +### `client_id` management + +- Extract from `soundcloud.com` HTML (`__sc_hydration` -> `apiClient` -> `id`) on startup. +- Cache in memory (not persisted — rotates too frequently). +- On any 401 response, re-extract and retry. +- If re-extraction fails, log the error and let the next tick retry. + +### Retry & backoff + +Each SoundCloud HTTP call: 3 attempts, exponential backoff (2s, 4s, 8s). 401s trigger `client_id` refresh before retry (doesn't count against attempts). Request timeout: 15 seconds. + +### Error scenarios + +| Scenario | Behavior | +|----------|----------| +| SoundCloud 401 | Refresh `client_id`, retry | +| SoundCloud 429 | Back off, retry next tick | +| SoundCloud 5xx | Retry with backoff, skip tick after 3 failures | +| Network timeout | Same as 5xx | +| `client_id` extraction failure | Log error, skip tick, retry next hour | +| Poller task crash | Supervisor restarts after 30s backoff | +| Nick unlikes a track | Track stays in show — positions are stable | + +## Project Structure + +``` +NtR-soundcloud-fetcher/ +├── docs/ +│ ├── soundcloud-likes-api.md +│ └── plans/ +│ └── 2026-03-12-ntr-soundcloud-fetcher-design.md +├── src/ +│ └── ntr_fetcher/ +│ ├── __init__.py +│ ├── main.py # entry point — starts API + poller +│ ├── config.py # settings from env vars / .env +│ ├── db.py # SQLite connection, migrations, queries +│ ├── models.py # dataclasses for Track, Show, ShowTrack +│ ├── soundcloud.py # client_id extraction, likes fetching +│ ├── poller.py # polling loop + supervisor +│ ├── api.py # FastAPI routes +│ └── week.py # week boundary computation (ET → UTC) +├── tests/ +├── pyproject.toml +└── README.md +``` + +## Dependencies + +| Package | Purpose | +|---------|---------| +| `fastapi` | API framework | +| `uvicorn` | ASGI server | +| `httpx` | Async HTTP client for SoundCloud | +| `pydantic` | Config + response models (bundled with FastAPI) | + +Standard library: `sqlite3`, `zoneinfo`, `asyncio`, `dataclasses`, `json`. + +No ORM. Raw SQL via `sqlite3`, wrapped in `asyncio.to_thread` for async compatibility. + +Python 3.11+. + +## Configuration + +Environment variables, loaded by pydantic `BaseSettings`. Supports `.env` file. + +| Variable | Default | Description | +|----------|---------|-------------| +| `NTR_PORT` | `8000` | API listen port | +| `NTR_HOST` | `127.0.0.1` | API bind address | +| `NTR_DB_PATH` | `./ntr_fetcher.db` | SQLite database file path | +| `NTR_POLL_INTERVAL_SECONDS` | `3600` | Polling interval | +| `NTR_ADMIN_TOKEN` | (required) | Bearer token for admin endpoints | +| `NTR_SOUNDCLOUD_USER` | `nicktherat` | SoundCloud username to track | +| `NTR_SHOW_DAY` | `2` | Day of week (0=Mon, 2=Wed) | +| `NTR_SHOW_HOUR` | `22` | Hour in Eastern Time | diff --git a/docs/soundcloud-likes-api.md b/docs/soundcloud-likes-api.md new file mode 120000 index 0000000..8dcb130 --- /dev/null +++ b/docs/soundcloud-likes-api.md @@ -0,0 +1 @@ +/Users/erikfredericks/dev-ai/one-offs/soundcloud-reverse-engineering/docs/soundcloud-likes-api.md \ No newline at end of file