# WebSocket 403 Error Analysis **Date**: October 31, 2025 **Issue**: Direct WebSocket connection to `wss://engine.kosmi.io/gql-ws` returns 403 Forbidden ## Tests Performed ### Test 1: No Authentication ```bash ./test-websocket -mode 2 ``` **Result**: 403 Forbidden ❌ ### Test 2: Origin Header Only ```bash ./test-websocket -mode 3 ``` **Result**: 403 Forbidden ❌ ### Test 3: With JWT Token ```bash ./test-websocket-direct -token ``` **Result**: 403 Forbidden ❌ ### Test 4: With Session Cookies + Token ```bash ./test-session -room -token ``` **Result**: 403 Forbidden ❌ **Note**: No cookies were set by visiting the room page ## Analysis ### Why 403? The 403 error occurs during the **WebSocket handshake**, BEFORE we can send the `connection_init` message with the JWT token. This means: 1. ❌ It's NOT about the JWT token (that's sent after connection) 2. ❌ It's NOT about cookies (no cookies are set) 3. ❌ It's NOT about the Origin header (we're sending the correct origin) 4. ✅ It's likely a security measure at the WebSocket server or proxy level ### Possible Causes 1. **Cloudflare/CDN Protection** - Server: "Cowboy" with "Via: 1.1 Caddy" - May have bot protection that detects non-browser clients - Requires JavaScript challenge or proof-of-work 2. **TLS Fingerprinting** - Server may be checking the TLS client hello fingerprint - Go's TLS implementation has a different fingerprint than browsers - This is commonly used to block bots 3. **WebSocket Sub-protocol Validation** - May require specific WebSocket extension headers - Browser sends additional headers that we're not replicating 4. **IP-based Rate Limiting** - Previous requests from the same IP may have triggered protection - Would explain why browser works but our client doesn't ### Evidence from ChromeDP ChromeDP **DOES work** because: - It's literally a real Chrome browser - Has the correct TLS fingerprint - Passes all JavaScript challenges - Has complete browser context ## Recommended Solution ### Hybrid Approach: ChromeDP for Token, Native for WebSocket Since: 1. JWT tokens are valid for **1 year** 2. ChromeDP successfully obtains tokens 3. Native WebSocket cannot bypass 403 **Solution**: Use ChromeDP to get the token once, then cache it: ```go type TokenCache struct { token string expiration time.Time mu sync.RWMutex } func (c *TokenCache) Get() (string, error) { c.mu.RLock() defer c.mu.RUnlock() if c.token != "" && time.Now().Before(c.expiration) { return c.token, nil // Use cached token } // Token expired or missing, get new one via ChromeDP return c.refreshToken() } func (c *TokenCache) refreshToken() (string, error) { c.mu.Lock() defer c.mu.Unlock() // Launch ChromeDP, visit room, extract token token := extractTokenViaChromeDPOnce() // Cache for 11 months (give 1 month buffer) c.token = token c.expiration = time.Now().Add(11 * 30 * 24 * time.Hour) return token, nil } ``` **Benefits**: - ✅ Only need ChromeDP once per year - ✅ Native WebSocket for all subsequent connections - ✅ Lightweight after initial token acquisition - ✅ Automatic token refresh when expired ## Alternative: Keep ChromeDP If we can't bypass the 403, we should optimize the ChromeDP approach instead: 1. **Reduce Memory Usage** - Use headless-shell instead of full Chrome (~50MB vs ~200MB) - Disable unnecessary Chrome features - Clean up resources aggressively 2. **Reduce Startup Time** - Keep Chrome instance alive between restarts - Use Chrome's remote debugging instead of launching new instance 3. **Accept the Trade-off** - 200MB RAM is acceptable for a relay service - 3-5 second startup is one-time cost - It's the most reliable solution ## Next Steps ### Option A: Continue Investigation - [ ] Try different TLS libraries (crypto/tls alternatives) - [ ] Analyze browser's exact WebSocket handshake with Wireshark - [ ] Try mimicking browser's TLS fingerprint - [ ] Test from different IP addresses ### Option B: Implement Hybrid Solution - [ ] Extract token from ChromeDP session - [ ] Implement token caching with expiration - [ ] Try native WebSocket with cached token - [ ] Verify if 403 still occurs ### Option C: Optimize ChromeDP - [ ] Switch to chromedp/headless-shell - [ ] Implement Chrome instance pooling - [ ] Optimize memory usage - [ ] Document performance characteristics ## Recommendation **Go with Option C**: Optimize ChromeDP **Reasoning**: 1. ChromeDP is proven to work 100% 2. Token caching won't help if WebSocket still returns 403 3. The 403 is likely permanent without a real browser context 4. Optimization can make ChromeDP acceptable for production 5. ~100MB RAM for a bridge service is reasonable **Implementation**: ```go // Use chromedp/headless-shell Docker image FROM chromedp/headless-shell:latest // Optimize Chrome flags chromedp.Flag("disable-gpu", true), chromedp.Flag("disable-dev-shm-usage", true), chromedp.Flag("single-process", true), // Reduce memory chromedp.Flag("no-zygote", true), // Reduce memory // Keep instance alive func (b *Bkosmi) KeepAlive() { // Don't close Chrome between messages // Only restart if crashed } ``` ## Conclusion The 403 Forbidden error is likely a security measure that cannot be easily bypassed without a real browser context. The most pragmatic solution is to **optimize and embrace the ChromeDP approach** rather than trying to reverse engineer the security mechanism. **Status**: ChromeDP remains the recommended implementation ✅