Streaming
Some Chat models provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.
Using .stream()
The easiest way to stream is to use the .stream()
method. This returns an readable stream that you can also iterate over:
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
We're unifying model params across all packages. We now suggest using model
instead of modelName
, and apiKey
for API keys.
import { ChatOpenAI } from "@langchain/openai";
const chat = new ChatOpenAI({
maxTokens: 25,
});
// Pass in a human message. Also accepts a raw string, which is automatically
// inferred to be a human message.
const stream = await chat.stream([["human", "Tell me a joke about bears."]]);
for await (const chunk of stream) {
console.log(chunk);
}
/*
AIMessageChunk {
content: '',
additional_kwargs: {}
}
AIMessageChunk {
content: 'Why',
additional_kwargs: {}
}
AIMessageChunk {
content: ' did',
additional_kwargs: {}
}
AIMessageChunk {
content: ' the',
additional_kwargs: {}
}
AIMessageChunk {
content: ' bear',
additional_kwargs: {}
}
AIMessageChunk {
content: ' bring',
additional_kwargs: {}
}
AIMessageChunk {
content: ' a',
additional_kwargs: {}
}
...
*/
API Reference:
- ChatOpenAI from
@langchain/openai
For models that do not support streaming, the entire response will be returned as a single chunk.
For convenience, you can also pipe a chat model into a StringOutputParser to extract just the raw string values from each chunk:
import { ChatOpenAI } from "@langchain/openai";
import { StringOutputParser } from "@langchain/core/output_parsers";
const parser = new StringOutputParser();
const model = new ChatOpenAI({ temperature: 0 });
const stream = await model.pipe(parser).stream("Hello there!");
for await (const chunk of stream) {
console.log(chunk);
}
/*
Hello
!
How
can
I
assist
you
today
?
*/
API Reference:
- ChatOpenAI from
@langchain/openai
- StringOutputParser from
@langchain/core/output_parsers
You can also do something similar to stream bytes directly (e.g. for returning a stream in an HTTP response) using the HttpResponseOutputParser:
import { ChatOpenAI } from "@langchain/openai";
import { HttpResponseOutputParser } from "langchain/output_parsers";
const handler = async () => {
const parser = new HttpResponseOutputParser();
const model = new ChatOpenAI({ temperature: 0 });
const stream = await model.pipe(parser).stream("Hello there!");
const httpResponse = new Response(stream, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
},
});
return httpResponse;
};
await handler();
API Reference:
- ChatOpenAI from
@langchain/openai
- HttpResponseOutputParser from
langchain/output_parsers
Using a callback handler
You can also use a CallbackHandler
like so:
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";
const chat = new ChatOpenAI({
maxTokens: 25,
streaming: true,
});
const response = await chat.invoke([new HumanMessage("Tell me a joke.")], {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});
console.log(response);
// { token: '' }
// { token: '\n\n' }
// { token: 'Why' }
// { token: ' don' }
// { token: "'t" }
// { token: ' scientists' }
// { token: ' trust' }
// { token: ' atoms' }
// { token: '?\n\n' }
// { token: 'Because' }
// { token: ' they' }
// { token: ' make' }
// { token: ' up' }
// { token: ' everything' }
// { token: '.' }
// { token: '' }
// AIMessage {
// text: "\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."
// }
API Reference:
- ChatOpenAI from
@langchain/openai
- HumanMessage from
@langchain/core/messages