仓库源文站点原文


title: "Unlock the power of Generative UI" description: "" added: "Apr 16 2024" tags: [AI, react]

updatedDate: "Oct 30 2024"

The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces. With the release of the AI SDK 3.0, developers can move beyond plaintext and markdown chatbots to give LLMs rich, component-based interfaces.

  1. With the introduction of Function Calling, you have been able to build applications that are able to fetch realtime data.
  2. By using React Server Components, you can now stream UI components directly from LLMs without the need for heavy client-side JavaScript.

The new APIs in the AI SDK 3.0 rely on React Server Components and React Server Actions which are currently implemented in Next.js. The AI SDK seamlessly integrates interface rendering capabilities through the ai/rsc package.

Server actions

import { createAI, getMutableAIState, render } from "ai/rsc";
import { z } from "zod";

function Spinner() {
  return <div>Loading...</div>;
}

// An example of a flight card component.
function FlightCard({ flightInfo }) {
  return (
    <div>
      <h2>Flight Information</h2>
      <p>Flight Number: {flightInfo.flightNumber}</p>
      <p>Departure: {flightInfo.departure}</p>
      <p>Arrival: {flightInfo.arrival}</p>
    </div>
  );
}

// An example of a function that fetches flight information from an external API.
async function getFlightInfo(flightNumber: string) {
  return {
    flightNumber,
    departure: 'New York',
    arrival: 'San Francisco',
  };
}

async function submitUserMessage(userInput: string) {
  'use server';

  const aiState = getMutableAIState<typeof AI>();

  // Update the AI state with the new user message.
  aiState.update([
    ...aiState.get(),
    {
      role: 'user',
      content: userInput,
    },
  ]);

  // The `render()` creates a generated, streamable UI.
  const ui = render({
    model: 'gpt-4-0125-preview',
    provider: openai,
    messages: [
      { role: 'system', content: 'You are a flight assistant' },
      ...aiState.get()
    ],
    // `text` is called when an AI returns a text response (as opposed to a tool call).
    // Its content is streamed from the LLM, so this function will be called
    // multiple times with `content` being incremental.
    text: ({ content, done }) => {
      // When it's the final content, mark the state as done and ready for the client to access.
      if (done) {
        aiState.done([
          ...aiState.get(),
          {
            role: "assistant",
            content
          }
        ]);
      }

      return <p>{content}</p>
    },
    tools: {
      get_flight_info: {
        description: 'Get the information for a flight',
        parameters: z.object({
          flightNumber: z.string().describe('the number of the flight')
        }).required(),
        render: async function* ({ flightNumber }) {
          // Show a spinner on the client while we wait for the response.
          yield <Spinner/>

          const flightInfo = await getFlightInfo(flightNumber)

          // Update the final AI state.
          aiState.done([
            ...aiState.get(),
            {
              role: "function",
              name: "get_flight_info",
              // Content can be any string to provide context to the LLM in the rest of the conversation.
              content: JSON.stringify(flightInfo),
            }
          ]);

          // Return the flight card to the client.
          return <FlightCard flightInfo={flightInfo} />
        }
      }
    }
  })

  return {
    id: Date.now(),
    display: ui
  };
}

// Define the initial state of the AI. It can be any JSON object.
const initialAIState: {
  role: 'user' | 'assistant' | 'system' | 'function';
  content: string;
  id?: string;
  name?: string;
}[] = [];

// The initial UI state that the client will keep track of, which contains the message IDs and their UI nodes.
const initialUIState: {
  id: number;
  display: React.ReactNode;
}[] = [];

// AI is a provider you wrap your application with so you can access AI and UI state in your components.
export const AI = createAI({
  actions: {
    submitUserMessage
  },
  initialUIState,
  initialAIState
});

Update: render has been deprecated in favor of streamUI. Similar to AI SDK Core APIs (like streamText and streamObject), streamUI provides a single function to call a model and allow it to respond with React Server Components.

Let's explain the above code in more detail.

The render function is a powerful helper function to create a streamable UIs from an LLM response.

The AI SDK introduces two new concepts: AIState and UIState.

Client components

"use client";

import { useState } from "react";
import { useActions, useUIState } from "ai/rsc";
import { nanoid } from "nanoid";

export default function Home() {
  const [input, setInput] = useState("");
  const [conversation, setConversation] = useUIState();
  const { submitUserMessage } = useActions();

  return (
    <div>
      <div>
        {conversation.map((message) => (
          <div key={message.id}>
            {message.role}: {message.display}
          </div>
        ))}
      </div>

      <form
        onSubmit={async (e) => {
          e.preventDefault();
          setInput("");
          setConversation((currentConversation) => [
            ...currentConversation,
            { id: nanoid(), role: "user", display: input },
          ]);

          const message = await submitUserMessage(input);

          setConversation((currentConversation) => [
            ...currentConversation,
            message,
          ]);
        }}
      >
        <input
          type="text"
          value={input}
          onChange={(event) => {
            setInput(event.target.value);
          }}
        />
        <button>Send Message</button>
      </form>
    </div>
  );
}

There is a 13 minute video on how to build LLM applications through the new Vercel AI SDK.

More tools examples

// https://github.com/browserbase/BrowseGPT
tools: {
  googleSearch: tool({
    description: 'Search Google for a query',
    parameters: z.object({
      toolName: z.string().describe('What the tool is doing'),
      query: z.string().describe('The exact and complete search query as provided by the user. Do not modify this in any way.'),
      sessionId: z.string().describe('The session ID to use for the search. If there is no session ID, create a new session with createSession Tool.'),
      debuggerFullscreenUrl: z.string().describe('The fullscreen debug URL to use for the search. If there is no debug URL, create a new session with createSession Tool.')
    }),
    execute: async ({ query, sessionId }) => {
      // import { chromium } from 'playwright'
      const browser = await chromium.connectOverCDP(
        `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}&sessionId=${sessionId}`
      );
      const defaultContext = browser.contexts()[0];
      const page = defaultContext.pages()[0];

      await page.goto(`https://www.google.com/search?q=${encodeURIComponent(query)}`);
      await page.waitForTimeout(500);
      await page.keyboard.press('Enter');
      await page.waitForLoadState('load', { timeout: 10000 });

      await page.waitForSelector('.g');

      const results = await page.evaluate(() => {
        const items = document.querySelectorAll('.g');
        return Array.from(items).map(item => {
          const title = item.querySelector('h3')?.textContent || '';
          const description = item.querySelector('.VwiC3b')?.textContent || '';
          return { title, description };
        });
      });

      const text = results.map(item => `${item.title}\n${item.description}`).join('\n\n');

      const response = await generateText({
        model: anthropic('claude-3-5-sonnet-20240620'),
        prompt: `Evaluate the following web page content: ${text}`,
      });

      return {
        toolName: 'Searching Google',
        content: response.text,
        dataCollected: true,
      };
    },
  }),
  getPageContent: tool({
    description: 'Get the content of a page using Playwright',
    parameters: z.object({
      toolName: z.string().describe('What the tool is doing'),
      url: z.string().describe('The url to get the content of'),
      sessionId: z.string().describe('The session ID to use for the search. If there is no session ID, create a new session with createSession Tool.'),
      debuggerFullscreenUrl: z.string().describe('The fullscreen debug URL to use for the search. If there is no debug URL, create a new session with createSession Tool.')
    }),
    execute: async ({ url, sessionId }) => {
      const browser = await chromium.connectOverCDP(
        `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}&sessionId=${sessionId}`
      );
      const defaultContext = browser.contexts()[0];
      const page = defaultContext.pages()[0];

      await page.goto(url);

      const content = await page.content();
      const dom = new JSDOM(content);
      const reader = new Readability(dom.window.document);
      const article = reader.parse();

      const text = `${article?.title || ''}\n${article?.textContent || ''}`;

      const response = await generateText({
        model: anthropic('claude-3-5-sonnet-20240620'),
        prompt: `Evaluate the following web page content: ${text}`,
      });

      return {
        toolName: 'Getting page content',
        content: response.text,
      };
    },
  }),
},

Readability.js from Mozilla is a standalone version of the readability library used for Firefox Reader View. To parse a document, you must create a new Readability object from a DOM document object, and then call parse(). This returned article object will contain: title, content, textContent, length, excerpt, etc.

More to explore