Skip to content

langchain-cockroachdb

LangChain integration for CockroachDB with native vector support

Tests PyPI version Python 3.10+ License


Overview

This package provides LangChain abstractions backed by CockroachDB, leveraging CockroachDB's native VECTOR type and C-SPANN (CockroachDB SPANN) indexes for vector search at scale in a distributed, horizontally scalable database.

Key Features

🎯 Vector Store

  • Native Vector Support: Uses CockroachDB's native VECTOR type (not pgvector)
  • C-SPANN Indexes: CockroachDB's vector index algorithm optimized for distributed systems
  • Advanced Filtering: Rich metadata filtering with $and/$or/$gt/$in operators
  • Hybrid Search: Combine full-text search (TSVECTOR) with vector similarity
  • Query-Time Tuning: Adjust beam size for accuracy/speed tradeoff

🏗️ Reliability Features

  • Automatic Retry Logic: Handles 40001 serialization errors with exponential backoff
  • Isolation Level Support: Works with both SERIALIZABLE (default, recommended) and READ COMMITTED
  • Multi-Tenancy: Namespace-based isolation with C-SPANN prefix columns
  • Connection Pooling: Configurable connection pools with health checks
  • Horizontal Scalability: Designed for distributed deployments

💬 Chat History

  • Persistent Storage: Store conversation history in CockroachDB
  • Session Management: Organize by session/thread ID
  • LangChain Integration: Drop-in replacement for other chat history implementations

🧠 LangGraph Checkpointer

  • Short-Term Memory: Persist agent state across conversation turns
  • Human-in-the-Loop: Interrupt and resume workflows with durable state
  • Fault Tolerance: Recover from process restarts without losing progress
  • Sync & Async: Both CockroachDBSaver and AsyncCockroachDBSaver

🔄 Async & Sync APIs

  • Async-First: High-performance async operations for I/O concurrency
  • Sync Wrapper: Simple synchronous API for scripts and batch jobs
  • Connection Pooling: Efficient connection reuse across async operations

Quick Example

import asyncio
from langchain_cockroachdb import AsyncCockroachDBVectorStore, CockroachDBEngine
from langchain_openai import OpenAIEmbeddings

async def main():
    # Initialize engine
    engine = CockroachDBEngine.from_connection_string(
        "cockroachdb://user:pass@host:26257/db?sslmode=verify-full"
    )

    # Create table
    await engine.ainit_vectorstore_table(
        table_name="documents",
        vector_dimension=1536,
    )

    # Initialize vector store
    vectorstore = AsyncCockroachDBVectorStore(
        engine=engine,
        embeddings=OpenAIEmbeddings(),
        collection_name="documents",
    )

    # Add documents
    await vectorstore.aadd_texts([
        "CockroachDB is a distributed SQL database",
        "LangChain makes it easy to build LLM applications",
    ])

    # Search
    results = await vectorstore.asimilarity_search(
        "Tell me about databases",
        k=2
    )

    print(results)
    await engine.aclose()

asyncio.run(main())

Why CockroachDB?

  • Distributed by Design: Scale horizontally across regions
  • Native Vector Support: First-class VECTOR type and C-SPANN indexes
  • Strong Consistency: SERIALIZABLE by default, READ COMMITTED also supported
  • Cloud Native: Deploy anywhere (AWS, GCP, Azure, on-prem)
  • PostgreSQL Compatible: Familiar SQL with distributed superpowers

Getting Started

Choose your path:

  • Quick Start


    Get up and running in 5 minutes

    Quick Start

  • Guides


    Learn key concepts and patterns

    Guides

  • API Reference


    Detailed API documentation

    API Docs

  • Examples


    Working code examples

    Examples

LangChain Official Integration Docs

Community & Support

Contributing

Contributions welcome! This is an open-source project built for the community.

Contributing Guide

License

Apache License 2.0 - see LICENSE for details.


Built with ❤️ for the CockroachDB and LangChain communities