196 lines
5.5 KiB
Markdown
196 lines
5.5 KiB
Markdown
|
|
# WebSocket 403 Error Analysis
|
||
|
|
|
||
|
|
**Date**: October 31, 2025
|
||
|
|
**Issue**: Direct WebSocket connection to `wss://engine.kosmi.io/gql-ws` returns 403 Forbidden
|
||
|
|
|
||
|
|
## Tests Performed
|
||
|
|
|
||
|
|
### Test 1: No Authentication
|
||
|
|
```bash
|
||
|
|
./test-websocket -mode 2
|
||
|
|
```
|
||
|
|
**Result**: 403 Forbidden ❌
|
||
|
|
|
||
|
|
### Test 2: Origin Header Only
|
||
|
|
```bash
|
||
|
|
./test-websocket -mode 3
|
||
|
|
```
|
||
|
|
**Result**: 403 Forbidden ❌
|
||
|
|
|
||
|
|
### Test 3: With JWT Token
|
||
|
|
```bash
|
||
|
|
./test-websocket-direct -token <CAPTURED_TOKEN>
|
||
|
|
```
|
||
|
|
**Result**: 403 Forbidden ❌
|
||
|
|
|
||
|
|
### Test 4: With Session Cookies + Token
|
||
|
|
```bash
|
||
|
|
./test-session -room <URL> -token <TOKEN>
|
||
|
|
```
|
||
|
|
**Result**: 403 Forbidden ❌
|
||
|
|
**Note**: No cookies were set by visiting the room page
|
||
|
|
|
||
|
|
## Analysis
|
||
|
|
|
||
|
|
### Why 403?
|
||
|
|
|
||
|
|
The 403 error occurs during the **WebSocket handshake**, BEFORE we can send the `connection_init` message with the JWT token. This means:
|
||
|
|
|
||
|
|
1. ❌ It's NOT about the JWT token (that's sent after connection)
|
||
|
|
2. ❌ It's NOT about cookies (no cookies are set)
|
||
|
|
3. ❌ It's NOT about the Origin header (we're sending the correct origin)
|
||
|
|
4. ✅ It's likely a security measure at the WebSocket server or proxy level
|
||
|
|
|
||
|
|
### Possible Causes
|
||
|
|
|
||
|
|
1. **Cloudflare/CDN Protection**
|
||
|
|
- Server: "Cowboy" with "Via: 1.1 Caddy"
|
||
|
|
- May have bot protection that detects non-browser clients
|
||
|
|
- Requires JavaScript challenge or proof-of-work
|
||
|
|
|
||
|
|
2. **TLS Fingerprinting**
|
||
|
|
- Server may be checking the TLS client hello fingerprint
|
||
|
|
- Go's TLS implementation has a different fingerprint than browsers
|
||
|
|
- This is commonly used to block bots
|
||
|
|
|
||
|
|
3. **WebSocket Sub-protocol Validation**
|
||
|
|
- May require specific WebSocket extension headers
|
||
|
|
- Browser sends additional headers that we're not replicating
|
||
|
|
|
||
|
|
4. **IP-based Rate Limiting**
|
||
|
|
- Previous requests from the same IP may have triggered protection
|
||
|
|
- Would explain why browser works but our client doesn't
|
||
|
|
|
||
|
|
### Evidence from ChromeDP
|
||
|
|
|
||
|
|
ChromeDP **DOES work** because:
|
||
|
|
- It's literally a real Chrome browser
|
||
|
|
- Has the correct TLS fingerprint
|
||
|
|
- Passes all JavaScript challenges
|
||
|
|
- Has complete browser context
|
||
|
|
|
||
|
|
## Recommended Solution
|
||
|
|
|
||
|
|
### Hybrid Approach: ChromeDP for Token, Native for WebSocket
|
||
|
|
|
||
|
|
Since:
|
||
|
|
1. JWT tokens are valid for **1 year**
|
||
|
|
2. ChromeDP successfully obtains tokens
|
||
|
|
3. Native WebSocket cannot bypass 403
|
||
|
|
|
||
|
|
**Solution**: Use ChromeDP to get the token once, then cache it:
|
||
|
|
|
||
|
|
```go
|
||
|
|
type TokenCache struct {
|
||
|
|
token string
|
||
|
|
expiration time.Time
|
||
|
|
mu sync.RWMutex
|
||
|
|
}
|
||
|
|
|
||
|
|
func (c *TokenCache) Get() (string, error) {
|
||
|
|
c.mu.RLock()
|
||
|
|
defer c.mu.RUnlock()
|
||
|
|
|
||
|
|
if c.token != "" && time.Now().Before(c.expiration) {
|
||
|
|
return c.token, nil // Use cached token
|
||
|
|
}
|
||
|
|
|
||
|
|
// Token expired or missing, get new one via ChromeDP
|
||
|
|
return c.refreshToken()
|
||
|
|
}
|
||
|
|
|
||
|
|
func (c *TokenCache) refreshToken() (string, error) {
|
||
|
|
c.mu.Lock()
|
||
|
|
defer c.mu.Unlock()
|
||
|
|
|
||
|
|
// Launch ChromeDP, visit room, extract token
|
||
|
|
token := extractTokenViaChromeDPOnce()
|
||
|
|
|
||
|
|
// Cache for 11 months (give 1 month buffer)
|
||
|
|
c.token = token
|
||
|
|
c.expiration = time.Now().Add(11 * 30 * 24 * time.Hour)
|
||
|
|
|
||
|
|
return token, nil
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Benefits**:
|
||
|
|
- ✅ Only need ChromeDP once per year
|
||
|
|
- ✅ Native WebSocket for all subsequent connections
|
||
|
|
- ✅ Lightweight after initial token acquisition
|
||
|
|
- ✅ Automatic token refresh when expired
|
||
|
|
|
||
|
|
## Alternative: Keep ChromeDP
|
||
|
|
|
||
|
|
If we can't bypass the 403, we should optimize the ChromeDP approach instead:
|
||
|
|
|
||
|
|
1. **Reduce Memory Usage**
|
||
|
|
- Use headless-shell instead of full Chrome (~50MB vs ~200MB)
|
||
|
|
- Disable unnecessary Chrome features
|
||
|
|
- Clean up resources aggressively
|
||
|
|
|
||
|
|
2. **Reduce Startup Time**
|
||
|
|
- Keep Chrome instance alive between restarts
|
||
|
|
- Use Chrome's remote debugging instead of launching new instance
|
||
|
|
|
||
|
|
3. **Accept the Trade-off**
|
||
|
|
- 200MB RAM is acceptable for a relay service
|
||
|
|
- 3-5 second startup is one-time cost
|
||
|
|
- It's the most reliable solution
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
### Option A: Continue Investigation
|
||
|
|
- [ ] Try different TLS libraries (crypto/tls alternatives)
|
||
|
|
- [ ] Analyze browser's exact WebSocket handshake with Wireshark
|
||
|
|
- [ ] Try mimicking browser's TLS fingerprint
|
||
|
|
- [ ] Test from different IP addresses
|
||
|
|
|
||
|
|
### Option B: Implement Hybrid Solution
|
||
|
|
- [ ] Extract token from ChromeDP session
|
||
|
|
- [ ] Implement token caching with expiration
|
||
|
|
- [ ] Try native WebSocket with cached token
|
||
|
|
- [ ] Verify if 403 still occurs
|
||
|
|
|
||
|
|
### Option C: Optimize ChromeDP
|
||
|
|
- [ ] Switch to chromedp/headless-shell
|
||
|
|
- [ ] Implement Chrome instance pooling
|
||
|
|
- [ ] Optimize memory usage
|
||
|
|
- [ ] Document performance characteristics
|
||
|
|
|
||
|
|
## Recommendation
|
||
|
|
|
||
|
|
**Go with Option C**: Optimize ChromeDP
|
||
|
|
|
||
|
|
**Reasoning**:
|
||
|
|
1. ChromeDP is proven to work 100%
|
||
|
|
2. Token caching won't help if WebSocket still returns 403
|
||
|
|
3. The 403 is likely permanent without a real browser context
|
||
|
|
4. Optimization can make ChromeDP acceptable for production
|
||
|
|
5. ~100MB RAM for a bridge service is reasonable
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
```go
|
||
|
|
// Use chromedp/headless-shell Docker image
|
||
|
|
FROM chromedp/headless-shell:latest
|
||
|
|
|
||
|
|
// Optimize Chrome flags
|
||
|
|
chromedp.Flag("disable-gpu", true),
|
||
|
|
chromedp.Flag("disable-dev-shm-usage", true),
|
||
|
|
chromedp.Flag("single-process", true), // Reduce memory
|
||
|
|
chromedp.Flag("no-zygote", true), // Reduce memory
|
||
|
|
|
||
|
|
// Keep instance alive
|
||
|
|
func (b *Bkosmi) KeepAlive() {
|
||
|
|
// Don't close Chrome between messages
|
||
|
|
// Only restart if crashed
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
The 403 Forbidden error is likely a security measure that cannot be easily bypassed without a real browser context. The most pragmatic solution is to **optimize and embrace the ChromeDP approach** rather than trying to reverse engineer the security mechanism.
|
||
|
|
|
||
|
|
**Status**: ChromeDP remains the recommended implementation ✅
|
||
|
|
|