Files
IRC-kosmi-relay/WEBSOCKET_403_ANALYSIS.md
2025-10-31 16:17:04 -04:00

5.5 KiB

WebSocket 403 Error Analysis

Date: October 31, 2025
Issue: Direct WebSocket connection to wss://engine.kosmi.io/gql-ws returns 403 Forbidden

Tests Performed

Test 1: No Authentication

./test-websocket -mode 2

Result: 403 Forbidden

Test 2: Origin Header Only

./test-websocket -mode 3

Result: 403 Forbidden

Test 3: With JWT Token

./test-websocket-direct -token <CAPTURED_TOKEN>

Result: 403 Forbidden

Test 4: With Session Cookies + Token

./test-session -room <URL> -token <TOKEN>

Result: 403 Forbidden
Note: No cookies were set by visiting the room page

Analysis

Why 403?

The 403 error occurs during the WebSocket handshake, BEFORE we can send the connection_init message with the JWT token. This means:

  1. It's NOT about the JWT token (that's sent after connection)
  2. It's NOT about cookies (no cookies are set)
  3. It's NOT about the Origin header (we're sending the correct origin)
  4. It's likely a security measure at the WebSocket server or proxy level

Possible Causes

  1. Cloudflare/CDN Protection

    • Server: "Cowboy" with "Via: 1.1 Caddy"
    • May have bot protection that detects non-browser clients
    • Requires JavaScript challenge or proof-of-work
  2. TLS Fingerprinting

    • Server may be checking the TLS client hello fingerprint
    • Go's TLS implementation has a different fingerprint than browsers
    • This is commonly used to block bots
  3. WebSocket Sub-protocol Validation

    • May require specific WebSocket extension headers
    • Browser sends additional headers that we're not replicating
  4. IP-based Rate Limiting

    • Previous requests from the same IP may have triggered protection
    • Would explain why browser works but our client doesn't

Evidence from ChromeDP

ChromeDP DOES work because:

  • It's literally a real Chrome browser
  • Has the correct TLS fingerprint
  • Passes all JavaScript challenges
  • Has complete browser context

Hybrid Approach: ChromeDP for Token, Native for WebSocket

Since:

  1. JWT tokens are valid for 1 year
  2. ChromeDP successfully obtains tokens
  3. Native WebSocket cannot bypass 403

Solution: Use ChromeDP to get the token once, then cache it:

type TokenCache struct {
    token      string
    expiration time.Time
    mu         sync.RWMutex
}

func (c *TokenCache) Get() (string, error) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    
    if c.token != "" && time.Now().Before(c.expiration) {
        return c.token, nil // Use cached token
    }
    
    // Token expired or missing, get new one via ChromeDP
    return c.refreshToken()
}

func (c *TokenCache) refreshToken() (string, error) {
    c.mu.Lock()
    defer c.mu.Unlock()
    
    // Launch ChromeDP, visit room, extract token
    token := extractTokenViaChromeDPOnce()
    
    // Cache for 11 months (give 1 month buffer)
    c.token = token
    c.expiration = time.Now().Add(11 * 30 * 24 * time.Hour)
    
    return token, nil
}

Benefits:

  • Only need ChromeDP once per year
  • Native WebSocket for all subsequent connections
  • Lightweight after initial token acquisition
  • Automatic token refresh when expired

Alternative: Keep ChromeDP

If we can't bypass the 403, we should optimize the ChromeDP approach instead:

  1. Reduce Memory Usage

    • Use headless-shell instead of full Chrome (~50MB vs ~200MB)
    • Disable unnecessary Chrome features
    • Clean up resources aggressively
  2. Reduce Startup Time

    • Keep Chrome instance alive between restarts
    • Use Chrome's remote debugging instead of launching new instance
  3. Accept the Trade-off

    • 200MB RAM is acceptable for a relay service
    • 3-5 second startup is one-time cost
    • It's the most reliable solution

Next Steps

Option A: Continue Investigation

  • Try different TLS libraries (crypto/tls alternatives)
  • Analyze browser's exact WebSocket handshake with Wireshark
  • Try mimicking browser's TLS fingerprint
  • Test from different IP addresses

Option B: Implement Hybrid Solution

  • Extract token from ChromeDP session
  • Implement token caching with expiration
  • Try native WebSocket with cached token
  • Verify if 403 still occurs

Option C: Optimize ChromeDP

  • Switch to chromedp/headless-shell
  • Implement Chrome instance pooling
  • Optimize memory usage
  • Document performance characteristics

Recommendation

Go with Option C: Optimize ChromeDP

Reasoning:

  1. ChromeDP is proven to work 100%
  2. Token caching won't help if WebSocket still returns 403
  3. The 403 is likely permanent without a real browser context
  4. Optimization can make ChromeDP acceptable for production
  5. ~100MB RAM for a bridge service is reasonable

Implementation:

// Use chromedp/headless-shell Docker image
FROM chromedp/headless-shell:latest

// Optimize Chrome flags
chromedp.Flag("disable-gpu", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),  // Reduce memory
chromedp.Flag("no-zygote", true),       // Reduce memory

// Keep instance alive
func (b *Bkosmi) KeepAlive() {
    // Don't close Chrome between messages
    // Only restart if crashed
}

Conclusion

The 403 Forbidden error is likely a security measure that cannot be easily bypassed without a real browser context. The most pragmatic solution is to optimize and embrace the ChromeDP approach rather than trying to reverse engineer the security mechanism.

Status: ChromeDP remains the recommended implementation