OpenAI Computer Use
The OpenAI Computer Use example show how to use OpenAI’s computer-use model with GenSX to control a web browser with natural language.
Workflow
The OpenAI Computer Use workflow consists of the following steps:
- Launch a browser session with Playwright (
<BrowserProvider>
) - Send an initial user prompt to the OpenAI computer-use model (
<OpenAIResponses>
) - Process any computer actions requested by the model
- Execute browser actions like clicking, scrolling, typing, etc. (
<UseBrowser>
) - Take a screenshot after each action and send it back to the model
- Execute browser actions like clicking, scrolling, typing, etc. (
- Optionally collect human feedback and continue the conversation (
<HumanFeedback>
) - Process subsequent model responses and browser actions until completion
Here’s an example trace of the workflow showing the actions taken at each step:
Running the example
From the root of the GensX Github Repo , run the following commands:
# Install dependencies
pnpm install
# Install playwright
npx playwright install
# Run the example
OPENAI_API_KEY=<your_api_key> pnpm start:example openai-computer-use
The default prompt is how long does it take to drive from seattle to portland? use google maps
but you can change this by editing the index.tsx
file. You can also control whether or not the example is multi-turn by toggling the allowHumanFeedback
prop. This is set to false
by default but you might what to change this to true
so you can continue the conversation with the model in the terminal.
When you run the example, you’ll see an output like the following:
🚀 Starting the computer use example
🎯 PROMPT: how long does it take to drive from seattle to portland? use google maps
💻 Action: screenshot
💻 Action: click at (188, 180) with button 'left'
💻 Action: type text 'Google Maps'
💻 Action: keypress 'ENTER'
💻 Action: wait
💻 Action: click at (233, 230) with button 'left'
💻 New tab opened
💻 Action: wait
💻 Action: click at (389, 38) with button 'left'
💻 Action: type text 'Seattle to Portland'
💻 Action: keypress 'ENTER'
💻 Action: wait
✅ Computer use complete
✨ Final response: The estimated driving time from Seattle to Portland on Google Maps is approximately 2 hours and 58 minutes via I-5 S, covering a distance of 174 miles. Would you like any more assistance with your route?
Key patterns
Browser automation
The example uses Playwright to control a web browser, creating a context that’s shared throughout the workflow. The BrowserProvider
component initializes a browser session and makes it available to child components:
const BrowserProvider = gensx.Component<BrowserProviderProps, never>(
"BrowserProvider",
async ({ initialUrl }) => {
const browser = await chromium.launch({
headless: false,
chromiumSandbox: true,
env: {},
args: ["--disable-extensions", "--disable-file-system"],
});
const page = await browser.newPage();
await page.setViewportSize({ width: 1024, height: 768 });
await page.goto(initialUrl);
return <BrowserContext.Provider value={{ page }} />;
},
);
Processing model actions
The ProcessComputerCalls
component handles the computer actions returned by the model. For each action, it:
- Extracts the action from the model response
- Executes the action on the browser using the
UseBrowser
component - Takes a screenshot of the result
- Sends the screenshot back to the model
- Processes the next model response
const ProcessComputerCalls = gensx.Component<
ProcessComputerCallsProps,
ProcessComputerCallsResult
>("ProcessComputerCalls", async ({ response }) => {
let currentResponse = response;
let computerCalls = currentResponse.output.filter(
(item) => item.type === "computer_call",
);
while (computerCalls.length > 0) {
// Execute browser action and take screenshot
// ...
// Send screenshot back to model
// ...
// Get updated response
// ...
}
return { updatedResponse: currentResponse };
});
Interactive feedback loop
The example supports an interactive conversation with the model, allowing you to provide feedback or additional instructions once the model finishes an initial turn:
// Start conversation loop with human feedback
let currentResponse = updatedResponse;
let continueConversation = true;
while (continueConversation) {
// Get human feedback
const { userMessage, shouldExit } = await HumanFeedback.run({
assistantMessage: currentResponse.output_text,
});
// Exit if requested
if (shouldExit) {
continueConversation = false;
continue;
}
// Send user message to model
// ...
// Process any computer calls in the response
// ...
}
Additional resources
Check out the other examples in the GenSX Github Repo .