Skip to main content

Cassandra

Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.

CassandraByteStore needs the cassio package to be installed:

%pip install --upgrade --quiet  cassio

The Store takes the following parameters:

  • table: The table where to store the data.
  • session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.
  • keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.
  • setup_mode: (Optional) The mode used to create the Cassandra table (SYNC, ASYNC or OFF). Defaults to SYNC.

CassandraByteStore

The CassandraByteStore is an implementation of ByteStore that stores the data in your Cassandra instance. The store keys must be strings and will be mapped to the row_id column of the Cassandra table. The store bytes values are mapped to the body_blob column of the Cassandra table.

from langchain_community.storage import CassandraByteStore
API Reference:CassandraByteStore

Init from a cassandra driver Session

You need to create a cassandra.cluster.Session object, as described in the Cassandra driver documentation. The details vary (e.g. with network settings and authentication), but this might be something like:

from cassandra.cluster import Cluster

cluster = Cluster()
session = cluster.connect()

You need to provide the name of an existing keyspace of the Cassandra instance:

CASSANDRA_KEYSPACE = input("CASSANDRA_KEYSPACE = ")

Creating the store:

store = CassandraByteStore(
table="my_store",
session=session,
keyspace=CASSANDRA_KEYSPACE,
)

store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))
[b'v1', b'v2']

Init from cassio

It's also possible to use cassio to configure the session and keyspace.

import cassio

cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)

store = CassandraByteStore(
table="my_store",
)

store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))

Usage with CacheBackedEmbeddings

You may use the CassandraByteStore in conjunction with a CacheBackedEmbeddings to cache the result of embeddings computations.

from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings

cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)

store = CassandraByteStore(
table="my_store",
)

embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=OpenAIEmbeddings(), document_embedding_cache=store
)

Was this page helpful?


You can also leave detailed feedback on GitHub.