Replaces room-monitor.js (REST polling) and player-count-checker.js (Puppeteer/CDP audience join) with a single EcastShardClient that connects as a shard via direct WebSocket. Defines new event contract, integration points, error handling, and reconnection strategy. Made-with: Cursor
13 KiB
Ecast Shard Monitor — Design Document
Date: 2026-03-20
Status: Approved
Replaces: room-monitor.js (REST polling for lock) + player-count-checker.js (Puppeteer audience join)
Problem
The current player count approach launches a headless Chrome instance via Puppeteer, navigates to jackbox.tv, joins as an audience member through the UI, and sniffs WebSocket frames via CDP. This is fragile, resource-heavy, and occupies an audience slot. The room monitor is a separate module that polls the REST API until the room locks, then hands off to the Puppeteer checker. Two modules, two connection strategies, a circular dependency workaround.
Solution
Replace both modules with a single EcastShardClient that connects to the Jackbox ecast server as a shard via a direct Node.js WebSocket. The shard role:
- Gets the full
heremap (authoritative player list with names and roles) - Receives real-time entity updates (room state, player joins, game end)
- Can query entities via
object/get - Does NOT count toward
maxPlayersor triggerfull: true - Does NOT require a browser
One REST call upfront validates the room and retrieves the host field needed for the WebSocket URL. After that, the shard connection handles everything.
Architecture
Lifecycle
Room code registered
│
▼
REST: GET /rooms/{code} ──── 404 ──→ Mark failed, stop
│
│ (get host, maxPlayers, locked, appTag)
▼
WSS: Connect as shard
wss://{host}/api/v2/rooms/{code}/play?role=shard&name=GamePicker&userId=gamepicker-{sessionId}&format=json
│
▼
client/welcome received
├── Parse `here` → initial player count (filter for `player` roles)
├── Parse `entities.room` → lobby state, gameCanStart, etc.
├── Store `secret` + `id` for reconnection
└── Broadcast initial state to our clients
│
▼
┌─── Event loop (listening for server messages) ───┐
│ │
│ `object` (key: textDescriptions) │
│ → Parse latestDescriptions for player joins │
│ → Broadcast `lobby.player-joined` to clients │
│ │
│ `object` (key: room) │
│ → Detect state transitions: │
│ lobbyState changes → broadcast lobby updates │
│ state: "Gameplay" → broadcast `game.started` │
│ gameFinished: true → broadcast `game.ended` │
│ gameResults → extract final player count │
│ │
│ `client/connected` (if delivered to shards) │
│ → Update here map, recount players │
│ │
│ WebSocket close/error │
│ → REST check: room exists? │
│ Yes → reconnect with secret/id │
│ No → game ended, finalize │
└────────────────────────────────────────────────────┘
Internal State
| Field | Type | Source |
|---|---|---|
playerCount |
number | here map filtered for player roles |
playerNames |
string[] | here map player role name fields |
lobbyState |
string | room entity lobbyState |
gameState |
string | room entity state ("Lobby", "Gameplay") |
gameStarted |
boolean | Derived from state === "Gameplay" |
gameFinished |
boolean | room entity gameFinished |
maxPlayers |
number | REST response + room entity |
secret / id |
string/number | client/welcome for reconnection |
Player Counting
The here map from client/welcome is the authoritative source. It lists all registered connections with their roles. Count entries where roles contains player. The shard itself is excluded (it has roles: {shard: {}}). The host (ID 1, roles: {host: {}}) is also excluded. Since Jackbox holds slots for disconnected players, here always reflects the true occupied slot count.
For subsequent joins after connect, textDescriptions entity updates provide join notifications. Since shards have here visibility, client/connected messages may also be delivered — both paths are handled, with here as source of truth.
WebSocket Events (Game Picker → Connected Clients)
room.connected
Shard successfully connected to the Jackbox room. Sent once on initial connect. Replaces the old audience.joined event.
{
"type": "room.connected",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"appTag": "drawful2international",
"maxPlayers": 8,
"playerCount": 2,
"players": ["Alice", "Bob"],
"lobbyState": "CanStart",
"gameState": "Lobby"
}
}
lobby.player-joined
A new player joined the lobby.
{
"type": "lobby.player-joined",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"playerName": "Charlie",
"playerCount": 3,
"players": ["Alice", "Bob", "Charlie"],
"maxPlayers": 8
}
}
lobby.updated
Lobby state changed (enough players to start, countdown started, etc.).
{
"type": "lobby.updated",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"lobbyState": "Countdown",
"gameCanStart": true,
"gameIsStarting": true,
"playerCount": 4
}
}
game.started
The game transitioned from Lobby to Gameplay.
{
"type": "game.started",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"playerCount": 4,
"players": ["Alice", "Bob", "Charlie", "Diana"],
"maxPlayers": 8
}
}
game.ended
The game finished.
{
"type": "game.ended",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"playerCount": 4,
"players": ["Alice", "Bob", "Charlie", "Diana"]
}
}
room.disconnected
Shard lost connection to the Jackbox room.
{
"type": "room.disconnected",
"timestamp": "...",
"data": {
"sessionId": 1,
"gameId": 5,
"roomCode": "LSBN",
"reason": "room_closed",
"finalPlayerCount": 4
}
}
Possible reason values: room_closed, room_not_found, connection_failed, role_rejected, manually_stopped.
Dropped Events
| Old event | Replacement |
|---|---|
audience.joined |
room.connected (richer payload) |
player-count.updated (automated) |
lobby.player-joined, game.started, game.ended carry playerCount |
The manual PATCH .../player-count endpoint keeps broadcasting player-count.updated for its specific use case.
DB Persistence
The session_games table columns player_count and player_count_check_status continue to be updated:
player_count— updated on each join and at game endplayer_count_check_status—'monitoring'(shard connected),'completed'(game ended with count),'failed'(couldn't connect),'stopped'(manual stop)
The old 'checking' status becomes 'monitoring'.
Integration Points
Files Deleted
backend/utils/player-count-checker.js— Puppeteer audience approachbackend/utils/room-monitor.js— REST polling for lock state
Files Created
backend/utils/ecast-shard-client.js—EcastShardClientclass + module exports:startMonitor,stopMonitor,cleanupAllShards
Files Modified
backend/utils/jackbox-api.js — Add getRoomInfo(roomCode) returning the full room response including host, appTag, audienceEnabled.
backend/routes/sessions.js — Replace imports:
// Old
const { stopPlayerCountCheck } = require('../utils/player-count-checker');
const { startRoomMonitor, stopRoomMonitor } = require('../utils/room-monitor');
// New
const { startMonitor, stopMonitor } = require('../utils/ecast-shard-client');
All call sites change from two-function calls to one:
| Route | Old | New |
|---|---|---|
POST /:id/games (with room_code) |
startRoomMonitor(...) |
startMonitor(...) |
PATCH .../status (away from playing) |
stopRoomMonitor(...) + stopPlayerCountCheck(...) |
stopMonitor(...) |
DELETE .../games/:gameId |
stopRoomMonitor(...) + stopPlayerCountCheck(...) |
stopMonitor(...) |
POST .../start-player-check |
startRoomMonitor(...) |
startMonitor(...) |
POST .../stop-player-check |
stopRoomMonitor(...) + stopPlayerCountCheck(...) |
stopMonitor(...) |
Endpoint paths stay the same for backwards compatibility.
backend/server.js — Wire cleanupAllShards() into SIGTERM/SIGINT handlers.
Error Handling and Reconnection
Connection Failures
-
REST validation fails (room not found, network error): Set status
'failed', broadcastroom.disconnectedwithreason: 'room_not_found'or'connection_failed'. No automatic retry. -
Shard WebSocket fails to connect: Retry up to 3 times with exponential backoff (2s, 4s, 8s). On exhaustion, set status
'failed', broadcastroom.disconnectedwithreason: 'connection_failed'. -
Ecast rejects the shard role (error opcode received): Set status
'failed', broadcastroom.disconnectedwithreason: 'role_rejected'. No retry.
Mid-Session Disconnections
-
WebSocket closes unexpectedly: REST check
GET /rooms/{code}:- Room exists → reconnect with stored
secret/id(up to 3 attempts, exponential backoff). Transparent to clients on success. - Room gone → finalize with last known count, status
'completed', broadcastgame.ended+room.disconnected.
- Room exists → reconnect with stored
-
Ecast error 2027 "room already closed": Same as room-gone path.
Manual Stop
stop-player-checkcalled or game status changes: Close WebSocket gracefully, set status'stopped'(unless already'completed'), broadcastroom.disconnectedwithreason: 'manually_stopped'.
Server Shutdown
SIGTERM/SIGINT:cleanupAllShards()closes all WebSocket connections. No DB updates on shutdown.
State Machine
startMonitor()
│
▼
┌───────────┐
┌────────│ not_started│
│ └───────────┘
│ │
REST fails REST succeeds
│ │
▼ ▼
┌────────┐ ┌────────────┐
│ failed │ │ monitoring │◄──── reconnect success
└────────┘ └─────┬──────┘
▲ │
│ ┌────┴─────┬──────────────┐
reconnect │ │ │
exhausted game ends WS drops manual stop
│ │ │ │
│ ▼ ▼ ▼
│ ┌──────────┐ REST check ┌─────────┐
│ │ completed │ │ │ stopped │
│ └──────────┘ │ └─────────┘
│ │
└──── room gone? ────┘
│
room exists?
│
reconnect...
Timeouts
| Concern | Value | Rationale |
|---|---|---|
| WebSocket connect timeout | 10s | Ecast servers respond fast |
| Reconnect backoff | 2s, 4s, 8s | Three attempts, ~14s total |
| Max reconnect attempts | 3 | Fail fast, user can retry manually |
| WebSocket inactivity timeout | None | Shard connections receive periodic shard/sync CRDT messages |
Dependencies
Added: ws (Node.js WebSocket library) — already a dependency (used by websocket-manager.js).
Removed: puppeteer — no longer needed for room monitoring.
Non-Goals
- Renaming REST endpoint paths (
start-player-check/stop-player-check) — kept for backwards compatibility - Auto-starting monitoring when room code is set via
PATCH .../room-code— kept as manual trigger only - Frontend
Picker.jsxchanges — tracked separately (existing bugs:message.eventvsmessage.type, subscribe without auth,'waiting'status that's never set)