Relocate 30 non-essential .md files (investigation notes, fix summaries, implementation details, status reports) from the project root into docs/ to reduce clutter. Core operational docs (README, quickstart guides, configuration references) remain in the root. Co-authored-by: Cursor <cursoragent@cursor.com>
6.4 KiB
Lessons Learned: WebSocket Interception in Headless Chrome
The Problem
When implementing the Kosmi bridge, we initially tried several approaches:
- Native Go WebSocket Client: Failed with 403 Forbidden due to missing session cookies
- HTTP POST with Polling: Worked for queries but not ideal for real-time subscriptions
- ChromeDP with Post-Load Injection: Connected but didn't capture messages
The Solution
The key insight came from examining the working Chrome extension's inject.js file. The solution required two critical components:
1. Hook the Raw WebSocket Constructor
Instead of trying to hook into Apollo Client or other high-level abstractions, we needed to hook the raw window.WebSocket constructor:
const OriginalWebSocket = window.WebSocket;
window.WebSocket = function(url, protocols) {
const socket = new OriginalWebSocket(url, protocols);
if (url.includes('engine.kosmi.io') || url.includes('gql-ws')) {
// Wrap addEventListener for 'message' events
const originalAddEventListener = socket.addEventListener.bind(socket);
socket.addEventListener = function(type, listener, options) {
if (type === 'message') {
const wrappedListener = function(event) {
// Capture the message
window.__KOSMI_MESSAGE_QUEUE__.push({
timestamp: Date.now(),
data: JSON.parse(event.data),
source: 'addEventListener'
});
return listener.call(this, event);
};
return originalAddEventListener(type, wrappedListener, options);
}
return originalAddEventListener(type, listener, options);
};
// Also wrap the onmessage property
let realOnMessage = null;
Object.defineProperty(socket, 'onmessage', {
get: function() { return realOnMessage; },
set: function(handler) {
realOnMessage = function(event) {
// Capture the message
window.__KOSMI_MESSAGE_QUEUE__.push({
timestamp: Date.now(),
data: JSON.parse(event.data),
source: 'onmessage'
});
if (handler) { handler.call(socket, event); }
};
},
configurable: true
});
}
return socket;
};
2. Inject Before Page Load
The most critical lesson: The WebSocket hook MUST be injected before any page JavaScript executes.
❌ Wrong Approach (Post-Load Injection)
// This doesn't work - WebSocket is already created!
chromedp.Run(ctx,
chromedp.Navigate(roomURL),
chromedp.WaitReady("body"),
chromedp.Evaluate(hookScript, nil), // Too late!
)
Why it fails: By the time the page loads and we inject the script, Kosmi has already created its WebSocket connection. Our hook never gets a chance to intercept it.
✅ Correct Approach (Pre-Load Injection)
// Inject BEFORE navigation using Page.addScriptToEvaluateOnNewDocument
chromedp.Run(ctx, chromedp.ActionFunc(func(ctx context.Context) error {
_, err := page.AddScriptToEvaluateOnNewDocument(hookScript).Do(ctx)
return err
}))
// Now navigate - the hook is already active!
chromedp.Run(ctx,
chromedp.Navigate(roomURL),
chromedp.WaitReady("body"),
)
Why it works: Page.addScriptToEvaluateOnNewDocument is a Chrome DevTools Protocol method that ensures the script runs before any page scripts. When Kosmi's JavaScript creates the WebSocket, our hook is already in place to intercept it.
Implementation in chromedp_client.go
The final implementation:
func (c *ChromeDPClient) injectWebSocketHookBeforeLoad() error {
script := c.getWebSocketHookScript()
return chromedp.Run(c.ctx, chromedp.ActionFunc(func(ctx context.Context) error {
// Use Page.addScriptToEvaluateOnNewDocument to inject before page load
_, err := page.AddScriptToEvaluateOnNewDocument(script).Do(ctx)
return err
}))
}
func (c *ChromeDPClient) Connect() error {
// ... context setup ...
// Inject hook BEFORE navigation
if err := c.injectWebSocketHookBeforeLoad(); err != nil {
return fmt.Errorf("failed to inject WebSocket hook: %w", err)
}
// Now navigate with hook already active
if err := chromedp.Run(ctx,
chromedp.Navigate(c.roomURL),
chromedp.WaitReady("body"),
); err != nil {
return fmt.Errorf("failed to navigate to room: %w", err)
}
// ... rest of connection logic ...
}
Verification
To verify the hook is working correctly, check for these log messages:
INFO Injecting WebSocket interceptor (runs before page load)...
INFO Navigating to Kosmi room: https://app.kosmi.io/room/@hyperspaceout
INFO Page loaded, checking if hook is active...
INFO ✓ WebSocket hook confirmed installed
INFO Status: WebSocket connection intercepted
If you see "No WebSocket connection detected yet", the hook was likely injected too late.
Key Takeaways
- Timing is Everything: WebSocket interception must happen before the WebSocket is created
- Use the Right CDP Method:
Page.addScriptToEvaluateOnNewDocumentis specifically designed for this use case - Hook at the Lowest Level: Hook
window.WebSocketconstructor, not higher-level abstractions - Wrap Both Event Mechanisms: Intercept both
addEventListenerandonmessageproperty - Test with Real Messages: The connection might succeed but messages won't appear if the hook isn't working
References
- Chrome DevTools Protocol: https://chromedevtools.github.io/devtools-protocol/
Page.addScriptToEvaluateOnNewDocument: https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument- chromedp documentation: https://pkg.go.dev/github.com/chromedp/chromedp
- Original Chrome extension:
.examples/chrome-extension/inject.js
Applying This Lesson to Other Projects
This pattern applies to any scenario where you need to intercept browser APIs in headless automation:
- Identify the API you need to intercept (WebSocket, fetch, XMLHttpRequest, etc.)
- Write a hook that wraps the constructor or method
- Inject using
Page.addScriptToEvaluateOnNewDocumentbefore navigation - Verify the hook is active before the page creates the objects you want to intercept
This approach is more reliable than browser extensions for server-side automation because:
- ✅ No browser extension installation required
- ✅ Works in headless mode
- ✅ Full control over the browser context
- ✅ Can run on servers without a display