Cassandra
Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.
CassandraByteStore
needs the cassio
package to be installed:
%pip install --upgrade --quiet cassio
The Store takes the following parameters:
- table: The table where to store the data.
- session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.
- keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.
- setup_mode: (Optional) The mode used to create the Cassandra table (SYNC, ASYNC or OFF). Defaults to SYNC.
CassandraByteStore
The CassandraByteStore
is an implementation of ByteStore
that stores the data in your Cassandra instance.
The store keys must be strings and will be mapped to the row_id
column of the Cassandra table.
The store bytes
values are mapped to the body_blob
column of the Cassandra table.
from langchain_community.storage import CassandraByteStore
Init from a cassandra driver Session
You need to create a cassandra.cluster.Session
object, as described in the Cassandra driver documentation. The details vary (e.g. with network settings and authentication), but this might be something like:
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
You need to provide the name of an existing keyspace of the Cassandra instance:
CASSANDRA_KEYSPACE = input("CASSANDRA_KEYSPACE = ")
Creating the store:
store = CassandraByteStore(
table="my_store",
session=session,
keyspace=CASSANDRA_KEYSPACE,
)
store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))
[b'v1', b'v2']
Init from cassio
It's also possible to use cassio to configure the session and keyspace.
import cassio
cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)
store = CassandraByteStore(
table="my_store",
)
store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))
Usage with CacheBackedEmbeddings
You may use the CassandraByteStore
in conjunction with a CacheBackedEmbeddings
to cache the result of embeddings computations.
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)
store = CassandraByteStore(
table="my_store",
)
embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=OpenAIEmbeddings(), document_embedding_cache=store
)