How to create and query vector stores
Head to Integrations for documentation on built-in integrations with vectorstore providers.
This guide assumes familiarity with the following concepts:
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
This walkthrough uses a basic, unoptimized implementation called MemoryVectorStore
that stores embeddings in-memory and does an exact, linear search for the most similar embeddings.
LangChain contains many built-in integrations - see this section for more, or the full list of integrations.
Creating a new indexβ
Most of the time, you'll need to load and prepare the data you want to search over. Here's an example that loads a recent speech from a file:
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";
// Create docs with a loader
const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();
// Load the docs into the vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
docs,
new OpenAIEmbeddings()
);
// Search for the most similar document
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
- TextLoader from
langchain/document_loaders/fs/text
Most of the time, you'll need to split the loaded text as a preparation step. See this section to learn more about text splitters.
Creating a new index from textsβ
If you have already prepared the data you want to search over, you can initialize a vector store directly from text chunks:
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
const vectorStore = await MemoryVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new OpenAIEmbeddings()
);
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
Which one to pick?β
Here's a quick guide to help you pick the right vector store for your use case:
- If you're after something that can just run inside your Node.js application, in-memory, without any other servers to stand up, then go for HNSWLib, Faiss, LanceDB or CloseVector
- If you're looking for something that can run in-memory in browser-like environments, then go for MemoryVectorStore or CloseVector
- If you come from Python and you were looking for something similar to FAISS, try HNSWLib or Faiss
- If you're looking for an open-source full-featured vector database that you can run locally in a docker container, then go for Chroma
- If you're looking for an open-source vector database that offers low-latency, local embedding of documents and supports apps on the edge, then go for Zep
- If you're looking for an open-source production-ready vector database that you can run locally (in a docker container) or hosted in the cloud, then go for Weaviate.
- If you're using Supabase already then look at the Supabase vector store to use the same Postgres database for your embeddings too
- If you're looking for a production-ready vector store you don't have to worry about hosting yourself, then go for Pinecone
- If you are already utilizing SingleStore, or if you find yourself in need of a distributed, high-performance database, you might want to consider the SingleStore vector store.
- If you are looking for an online MPP (Massively Parallel Processing) data warehousing service, you might want to consider the AnalyticDB vector store.
- If you're in search of a cost-effective vector database that allows run vector search with SQL, look no further than MyScale.
- If you're in search of a vector database that you can load from both the browser and server side, check out CloseVector. It's a vector database that aims to be cross-platform.
- If you're looking for a scalable, open-source columnar database with excellent performance for analytical queries, then consider ClickHouse.
Next stepsβ
You've now learned how to load data into a vectorstore.
Next, check out the full tutorial on retrieval-augmented generation.