Files
IRC-kosmi-relay/LESSONS_LEARNED.md

181 lines
6.4 KiB
Markdown
Raw Normal View History

2025-10-31 16:17:04 -04:00
# Lessons Learned: WebSocket Interception in Headless Chrome
## The Problem
When implementing the Kosmi bridge, we initially tried several approaches:
1. **Native Go WebSocket Client**: Failed with 403 Forbidden due to missing session cookies
2. **HTTP POST with Polling**: Worked for queries but not ideal for real-time subscriptions
3. **ChromeDP with Post-Load Injection**: Connected but didn't capture messages
## The Solution
The key insight came from examining the working Chrome extension's `inject.js` file. The solution required two critical components:
### 1. Hook the Raw WebSocket Constructor
Instead of trying to hook into Apollo Client or other high-level abstractions, we needed to hook the **raw `window.WebSocket` constructor**:
```javascript
const OriginalWebSocket = window.WebSocket;
window.WebSocket = function(url, protocols) {
const socket = new OriginalWebSocket(url, protocols);
if (url.includes('engine.kosmi.io') || url.includes('gql-ws')) {
// Wrap addEventListener for 'message' events
const originalAddEventListener = socket.addEventListener.bind(socket);
socket.addEventListener = function(type, listener, options) {
if (type === 'message') {
const wrappedListener = function(event) {
// Capture the message
window.__KOSMI_MESSAGE_QUEUE__.push({
timestamp: Date.now(),
data: JSON.parse(event.data),
source: 'addEventListener'
});
return listener.call(this, event);
};
return originalAddEventListener(type, wrappedListener, options);
}
return originalAddEventListener(type, listener, options);
};
// Also wrap the onmessage property
let realOnMessage = null;
Object.defineProperty(socket, 'onmessage', {
get: function() { return realOnMessage; },
set: function(handler) {
realOnMessage = function(event) {
// Capture the message
window.__KOSMI_MESSAGE_QUEUE__.push({
timestamp: Date.now(),
data: JSON.parse(event.data),
source: 'onmessage'
});
if (handler) { handler.call(socket, event); }
};
},
configurable: true
});
}
return socket;
};
```
### 2. Inject Before Page Load
The most critical lesson: **The WebSocket hook MUST be injected before any page JavaScript executes.**
#### ❌ Wrong Approach (Post-Load Injection)
```go
// This doesn't work - WebSocket is already created!
chromedp.Run(ctx,
chromedp.Navigate(roomURL),
chromedp.WaitReady("body"),
chromedp.Evaluate(hookScript, nil), // Too late!
)
```
**Why it fails**: By the time the page loads and we inject the script, Kosmi has already created its WebSocket connection. Our hook never gets a chance to intercept it.
#### ✅ Correct Approach (Pre-Load Injection)
```go
// Inject BEFORE navigation using Page.addScriptToEvaluateOnNewDocument
chromedp.Run(ctx, chromedp.ActionFunc(func(ctx context.Context) error {
_, err := page.AddScriptToEvaluateOnNewDocument(hookScript).Do(ctx)
return err
}))
// Now navigate - the hook is already active!
chromedp.Run(ctx,
chromedp.Navigate(roomURL),
chromedp.WaitReady("body"),
)
```
**Why it works**: `Page.addScriptToEvaluateOnNewDocument` is a Chrome DevTools Protocol method that ensures the script runs **before any page scripts**. When Kosmi's JavaScript creates the WebSocket, our hook is already in place to intercept it.
## Implementation in chromedp_client.go
The final implementation:
```go
func (c *ChromeDPClient) injectWebSocketHookBeforeLoad() error {
script := c.getWebSocketHookScript()
return chromedp.Run(c.ctx, chromedp.ActionFunc(func(ctx context.Context) error {
// Use Page.addScriptToEvaluateOnNewDocument to inject before page load
_, err := page.AddScriptToEvaluateOnNewDocument(script).Do(ctx)
return err
}))
}
func (c *ChromeDPClient) Connect() error {
// ... context setup ...
// Inject hook BEFORE navigation
if err := c.injectWebSocketHookBeforeLoad(); err != nil {
return fmt.Errorf("failed to inject WebSocket hook: %w", err)
}
// Now navigate with hook already active
if err := chromedp.Run(ctx,
chromedp.Navigate(c.roomURL),
chromedp.WaitReady("body"),
); err != nil {
return fmt.Errorf("failed to navigate to room: %w", err)
}
// ... rest of connection logic ...
}
```
## Verification
To verify the hook is working correctly, check for these log messages:
```
INFO Injecting WebSocket interceptor (runs before page load)...
INFO Navigating to Kosmi room: https://app.kosmi.io/room/@hyperspaceout
INFO Page loaded, checking if hook is active...
INFO ✓ WebSocket hook confirmed installed
INFO Status: WebSocket connection intercepted
```
If you see "No WebSocket connection detected yet", the hook was likely injected too late.
## Key Takeaways
1. **Timing is Everything**: WebSocket interception must happen before the WebSocket is created
2. **Use the Right CDP Method**: `Page.addScriptToEvaluateOnNewDocument` is specifically designed for this use case
3. **Hook at the Lowest Level**: Hook `window.WebSocket` constructor, not higher-level abstractions
4. **Wrap Both Event Mechanisms**: Intercept both `addEventListener` and `onmessage` property
5. **Test with Real Messages**: The connection might succeed but messages won't appear if the hook isn't working
## References
- Chrome DevTools Protocol: https://chromedevtools.github.io/devtools-protocol/
- `Page.addScriptToEvaluateOnNewDocument`: https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument
- chromedp documentation: https://pkg.go.dev/github.com/chromedp/chromedp
- Original Chrome extension: `.examples/chrome-extension/inject.js`
## Applying This Lesson to Other Projects
This pattern applies to any scenario where you need to intercept browser APIs in headless automation:
1. Identify the API you need to intercept (WebSocket, fetch, XMLHttpRequest, etc.)
2. Write a hook that wraps the constructor or method
3. Inject using `Page.addScriptToEvaluateOnNewDocument` **before navigation**
4. Verify the hook is active before the page creates the objects you want to intercept
This approach is more reliable than browser extensions for server-side automation because:
- ✅ No browser extension installation required
- ✅ Works in headless mode
- ✅ Full control over the browser context
- ✅ Can run on servers without a display