Add design doc and SoundCloud API reference

Design for a SoundCloud likes fetcher service that builds weekly
playlists for Nick the Rat Radio and serves them via JSON API.

Made-with: Cursor
This commit is contained in:
cottongin
2026-03-12 01:01:14 -04:00
commit 48801afa76
3 changed files with 230 additions and 0 deletions

5
.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
.DS_Store
*.db
.env
__pycache__/
*.pyc

View File

@@ -0,0 +1,224 @@
# NtR SoundCloud Fetcher — Design Document
> **Date**: 2026-03-12
> **Status**: Approved
## Purpose
A service that periodically fetches SoundCloud likes from NicktheRat's profile, builds weekly playlists aligned to his Wednesday 22:00 ET show schedule, and exposes them via a JSON API for an IRC bot to query track info by position number (`!1`, `!2`, etc.).
## Architecture
Single Python process with three internal responsibilities:
1. **API server** — FastAPI on a configurable port, serves playlist data as JSON.
2. **Poller** — async background task that fetches Nick's SoundCloud likes every hour.
3. **Supervisor** — monitors the poller task, restarts it on failure without affecting the API.
The poller and API run as independent `asyncio` tasks. If the poller crashes, the supervisor catches the exception, logs it, waits a backoff period, and restarts the poller. The API continues serving from the last-known-good SQLite data.
External process management (systemd with `Restart=on-failure`) handles whole-process crashes. The service does not try to be its own process manager.
### Startup sequence
1. Open/create SQLite database, run migrations.
2. Check if the current week's playlist exists. If not (or stale), do an immediate fetch.
3. Start the API server.
4. Start the poller on its hourly schedule.
## Data Model (SQLite)
### `tracks`
Canonical store of every SoundCloud track Nick has liked.
| Column | Type | Notes |
|--------|------|-------|
| `id` | INTEGER PK | SoundCloud track ID (not auto-increment) |
| `title` | TEXT | Track title |
| `artist` | TEXT | `track.user.username` from the API |
| `permalink_url` | TEXT | Full SoundCloud URL |
| `artwork_url` | TEXT | Nullable |
| `duration_ms` | INTEGER | Duration in milliseconds |
| `license` | TEXT | e.g. `cc-by-sa` |
| `liked_at` | TEXT | ISO 8601 — when Nick liked it |
| `raw_json` | TEXT | Full track JSON blob |
### `shows`
One row per weekly show.
| Column | Type | Notes |
|--------|------|-------|
| `id` | INTEGER PK | Auto-increment |
| `week_start` | TEXT | ISO 8601 UTC of the Wednesday 22:00 ET boundary that opens this week |
| `week_end` | TEXT | ISO 8601 UTC of the next Wednesday 22:00 ET boundary |
| `created_at` | TEXT | When this row was created |
### `show_tracks`
Join table linking tracks to shows with position.
| Column | Type | Notes |
|--------|------|-------|
| `show_id` | INTEGER FK | References `shows.id` |
| `track_id` | INTEGER FK | References `tracks.id` |
| `position` | INTEGER | 1-indexed — maps to `!1`, `!2`, etc. |
| UNIQUE | | `(show_id, track_id)` |
Position assignment: likes sorted by `liked_at` ascending (oldest first), positions assigned 1, 2, 3... New likes mid-week get the next position; existing positions never shift.
Once a track is assigned a position in a show, it stays even if Nick unlikes it. Admin endpoints exist for manual corrections.
## API Endpoints
All endpoints return JSON. Base URL: `http://localhost:{port}`.
### Current Week
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/playlist` | Current week's full playlist |
| `GET` | `/playlist/{position}` | Single track by position |
`GET /playlist` response shape:
```json
{
"show_id": 12,
"week_start": "2026-03-12T02:00:00Z",
"week_end": "2026-03-19T02:00:00Z",
"tracks": [
{
"position": 1,
"title": "Running Through My Mind",
"artist": "Purrple Panther",
"permalink_url": "https://soundcloud.com/...",
"artwork_url": "https://...",
"duration_ms": 202909,
"liked_at": "2026-03-09T02:24:30Z"
}
]
}
```
`GET /playlist/{position}` returns a single track object. 404 if the position doesn't exist.
### History
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/shows` | List all shows, newest first. Supports `?limit=` and `?offset=`. |
| `GET` | `/shows/{show_id}` | Full playlist for a specific show |
### Admin (bearer token required)
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/admin/refresh` | Trigger immediate SoundCloud fetch. `{"full": true}` re-fetches the entire week; default is incremental. |
| `POST` | `/admin/tracks` | Add a track to the current show. Body: `{"soundcloud_url": "..."}` or `{"track_id": 12345, "position": 5}` |
| `DELETE` | `/admin/tracks/{track_id}` | Remove a track from the current show. Remaining positions re-compact. |
| `PUT` | `/admin/tracks/{track_id}/position` | Move a track to a different position. Body: `{"position": 3}` |
### Health
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Poller status, last fetch time, current week track count |
Admin endpoints are protected by a bearer token (`NTR_ADMIN_TOKEN`). Read endpoints have no auth (localhost only).
## Poller & Fetcher Logic
### Polling cycle
1. Compute current week's boundary window (Wednesday 22:00 ET -> next Wednesday 22:00 ET, converted to UTC accounting for DST via `zoneinfo`).
2. Ensure a `shows` row exists for this window.
3. Fetch likes from SoundCloud.
4. Upsert new tracks, assign positions, update `show_tracks`.
5. Sleep for the configured interval.
### Incremental fetching
The poller does not re-fetch all likes every hour. It uses cursor-seeking:
- **First fetch for a new week**: craft a synthetic cursor at the week's end boundary, paginate backward until hitting the week's start boundary.
- **Subsequent fetches**: craft a cursor at "now", paginate backward until hitting a track already in the database. Most hourly polls fetch a single page or zero pages.
- **Full refresh** (`POST /admin/refresh` with `{"full": true}`): re-fetches the entire week from scratch, same as the first-fetch path.
### `client_id` management
- Extract from `soundcloud.com` HTML (`__sc_hydration` -> `apiClient` -> `id`) on startup.
- Cache in memory (not persisted — rotates too frequently).
- On any 401 response, re-extract and retry.
- If re-extraction fails, log the error and let the next tick retry.
### Retry & backoff
Each SoundCloud HTTP call: 3 attempts, exponential backoff (2s, 4s, 8s). 401s trigger `client_id` refresh before retry (doesn't count against attempts). Request timeout: 15 seconds.
### Error scenarios
| Scenario | Behavior |
|----------|----------|
| SoundCloud 401 | Refresh `client_id`, retry |
| SoundCloud 429 | Back off, retry next tick |
| SoundCloud 5xx | Retry with backoff, skip tick after 3 failures |
| Network timeout | Same as 5xx |
| `client_id` extraction failure | Log error, skip tick, retry next hour |
| Poller task crash | Supervisor restarts after 30s backoff |
| Nick unlikes a track | Track stays in show — positions are stable |
## Project Structure
```
NtR-soundcloud-fetcher/
├── docs/
│ ├── soundcloud-likes-api.md
│ └── plans/
│ └── 2026-03-12-ntr-soundcloud-fetcher-design.md
├── src/
│ └── ntr_fetcher/
│ ├── __init__.py
│ ├── main.py # entry point — starts API + poller
│ ├── config.py # settings from env vars / .env
│ ├── db.py # SQLite connection, migrations, queries
│ ├── models.py # dataclasses for Track, Show, ShowTrack
│ ├── soundcloud.py # client_id extraction, likes fetching
│ ├── poller.py # polling loop + supervisor
│ ├── api.py # FastAPI routes
│ └── week.py # week boundary computation (ET → UTC)
├── tests/
├── pyproject.toml
└── README.md
```
## Dependencies
| Package | Purpose |
|---------|---------|
| `fastapi` | API framework |
| `uvicorn` | ASGI server |
| `httpx` | Async HTTP client for SoundCloud |
| `pydantic` | Config + response models (bundled with FastAPI) |
Standard library: `sqlite3`, `zoneinfo`, `asyncio`, `dataclasses`, `json`.
No ORM. Raw SQL via `sqlite3`, wrapped in `asyncio.to_thread` for async compatibility.
Python 3.11+.
## Configuration
Environment variables, loaded by pydantic `BaseSettings`. Supports `.env` file.
| Variable | Default | Description |
|----------|---------|-------------|
| `NTR_PORT` | `8000` | API listen port |
| `NTR_HOST` | `127.0.0.1` | API bind address |
| `NTR_DB_PATH` | `./ntr_fetcher.db` | SQLite database file path |
| `NTR_POLL_INTERVAL_SECONDS` | `3600` | Polling interval |
| `NTR_ADMIN_TOKEN` | (required) | Bearer token for admin endpoints |
| `NTR_SOUNDCLOUD_USER` | `nicktherat` | SoundCloud username to track |
| `NTR_SHOW_DAY` | `2` | Day of week (0=Mon, 2=Wed) |
| `NTR_SHOW_HOUR` | `22` | Hour in Eastern Time |

View File

@@ -0,0 +1 @@
/Users/erikfredericks/dev-ai/one-offs/soundcloud-reverse-engineering/docs/soundcloud-likes-api.md