-
Notifications
You must be signed in to change notification settings - Fork 21.6k
Description
Checked other resources
- This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
Install deps:
pip install -U aiosqlite greenlet langchain langchain_community langchain_core langchain_text_splitters langchain_qdrant qdrant_clientRun code:
from langchain import indexes
from langchain_community.embeddings import FakeEmbeddings
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
client = QdrantClient(":memory:")
client.create_collection(
collection_name="some_collection",
vectors_config=VectorParams(size=256, distance=Distance.COSINE),
)
store = QdrantVectorStore(
client=client,
collection_name="some_collection",
embedding=FakeEmbeddings(size=256),
)
manager = indexes.SQLRecordManager("index", db_url="sqlite+aiosqlite:///db.sqlite", async_mode=True)
await manager.acreate_schema()
splitter = RecursiveCharacterTextSplitter(chunk_size=10, chunk_overlap=0)
document = Document(
page_content="\n".join(map(str, range(100))),
metadata={
"source": "some_url",
"title": "some_title",
},
)
chunks = await splitter.atransform_documents([document])
for _ in range(5):
stats = await indexes.aindex(
chunks,
manager,
store,
batch_size=8, # <--- !
cleanup="incremental",
source_id_key="source",
)
print(stats)
Error Message and Stack Trace (if applicable)
The previous example will print:
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}
{'num_added': 24, 'num_updated': 0, 'num_skipped': 8, 'num_deleted': 24}Description
Look at these numbers. First 8 chunks (=== batch_size) always skipped as they should.
But every chunk after the first batch is treated as "not found" and then deleted + embedded (CPU/GPU heavy) and inserted back.
System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041
Python Version: 3.12.10 (main, Apr 9 2025, 03:49:38) [Clang 20.1.0 ]
Package Information
langchain_core: 0.3.74
langchain: 0.3.27
langchain_community: 0.3.27
langsmith: 0.4.14
langchain_qdrant: 0.2.0
langchain_text_splitters: 0.3.9
Optional packages not installed
langserve
Other Dependencies
aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
fastembed: Installed. No version info available.
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
httpx<1,>=0.23.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.66: Installed. No version info available.
langchain-core<1.0.0,>=0.3.72: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.9: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.26: Installed. No version info available.
langsmith-pyo3>=0.1.0rc2;: Installed. No version info available.
langsmith>=0.1.125: Installed. No version info available.
langsmith>=0.1.17: Installed. No version info available.
langsmith>=0.3.45: Installed. No version info available.
numpy>=1.26.2;: Installed. No version info available.
numpy>=2.1.0;: Installed. No version info available.
openai-agents>=0.0.3;: Installed. No version info available.
opentelemetry-api>=1.30.0;: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http>=1.30.0;: Installed. No version info available.
opentelemetry-sdk>=1.30.0;: Installed. No version info available.
orjson>=3.9.14;: Installed. No version info available.
packaging>=23.2: Installed. No version info available.
pydantic: 2.11.7
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3,>=1: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest>=7.0.0;: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
qdrant-client: 1.15.1
requests-toolbelt>=1.0.0: Installed. No version info available.
requests<3,>=2: Installed. No version info available.
requests>=2.0.0: Installed. No version info available.
rich>=13.9.4;: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
vcrpy>=7.0.0;: Installed. No version info available.
zstandard>=0.23.0: Installed. No version info available.