ElasticsearchEmbeddingsCache

The ElasticsearchEmbeddingsCache is a ByteStore implementation that uses your Elasticsearch instance for efficient storage and retrieval of embeddings.

First install the LangChain integration with Elasticsearch.

%pip install -U langchain-elasticsearch

it can be instantiated using CacheBackedEmbeddings.from_bytes_store method.

from langchain.embeddings import CacheBackedEmbeddings
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
from langchain_openai import OpenAIEmbeddings

underlying_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

store = ElasticsearchEmbeddingsCache(
    es_url="http://localhost:9200",
    index_name="llm-chat-cache",
    metadata={"project": "my_chatgpt_project"},
    namespace="my_chatgpt_project",
)

embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=OpenAIEmbeddings(),
    document_embedding_cache=store,
    query_embedding_cache=store,
)

API Reference:CacheBackedEmbeddings | ElasticsearchEmbeddingsCache | OpenAIEmbeddings

The index_name parameter can also accept aliases. This allows to use the ILM: Manage the index lifecycle that we suggest to consider for managing retention and controlling cache growth.

Look at the class docstring for all parameters.

Index the generated vectors

The cached vectors won't be searchable by default. The developer can customize the building of the Elasticsearch document in order to add indexed vector field.

This can be done by subclassing end overriding methods.

from typing import Any, Dict, List

from langchain_elasticsearch import ElasticsearchEmbeddingsCache


class SearchableElasticsearchStore(ElasticsearchEmbeddingsCache):
    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["vector"] = {
            "type": "dense_vector",
            "dims": 1536,
            "index": True,
            "similarity": "dot_product",
        }
        return mapping

    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:
        body = super().build_document(llm_input, vector)
        body["vector"] = vector
        return body

API Reference:ElasticsearchEmbeddingsCache

When overriding the mapping and the document building, please only make additive modifications, keeping the base mapping intact.

ElasticsearchEmbeddingsCache

Index the generated vectors

Was this page helpful?

You can also leave detailed feedback on GitHub.

ElasticsearchEmbeddingsCache

Index the generated vectors​

Was this page helpful?

You can also leave detailed feedback on GitHub.

Index the generated vectors