LangChain.js createHistoryAwareRetriever with Ollama embeddings throws invalid input type error

Summary

A type mismatch inside LangChain.js caused createHistoryAwareRetriever to fail when paired with OllamaEmbeddings. The retriever expected a string input, but the history‑aware wrapper produced a message array, triggering an “invalid input type” error.

Root Cause

The failure stems from incompatible input/output expectations between:

ChatOllama, which emits AIMessage / HumanMessage objects
createHistoryAwareRetriever, which expects the LLM to return a plain string representing a rewritten query
OllamaEmbeddings, which only accepts raw text, not message objects
MemoryVectorStore.asRetriever(), which expects a string query, not a structured message array

Because the history-aware retriever forwards the entire message structure to the embeddings model, the embeddings layer throws an “invalid input type” error.

Why This Happens in Real Systems

This class of bug is extremely common in LLM pipelines because:

LLM frameworks evolve quickly, and interfaces drift out of sync
Chat models return structured messages, while embedding models expect raw text
History-aware chains wrap retrievers, often changing the shape of the input
Vector stores are strict about input types and reject anything that isn’t a string

In short: chat models speak in messages, embedding models speak in strings, and the glue code didn’t translate between them.

Real-World Impact

This type of mismatch causes:

Silent pipeline failures where retrieval never happens
Chatbots that suddenly “forget” context
RAG systems that return irrelevant or empty answers
Hard-to-debug errors because the failure occurs deep inside the chain stack

Example or Code (if necessary and relevant)

A minimal fix is to force the history-aware LLM step to output a string, not a message object:

const historyAwareRetriever = await createHistoryAwareRetriever({
  llm: model.withConfig({ output: "text" }),
  retriever,
  rephrasePrompt: retrieverPrompt,
});

Or explicitly convert the LLM output:

const safeModel = {
  ...model,
  invoke: async (input) => {
    const msg = await model.invoke(input);
    return msg.content.toString();
  }
};

How Senior Engineers Fix It

Experienced engineers solve this by:

Normalizing all LLM outputs to plain strings before passing them to embeddings
Wrapping chat models with adapters that guarantee consistent return types
Validating chain boundaries (LLM → retriever → embeddings → vector store)
Adding type guards to catch message‑vs‑string mismatches early
Testing each chain component independently before composing them

They treat LLM pipelines like distributed systems: every boundary must be explicit and validated.

Why Juniors Miss It

Newer engineers often overlook this because:

LangChain’s abstractions hide the data types, making mismatches non-obvious
Chat models “look like” they return strings, but actually return message objects
The error appears deep inside the embeddings layer, far from the real cause
They assume “if it works with a normal retriever, it should work with a history-aware retriever”

The subtle lesson: RAG pipelines are type-sensitive, even when written in JavaScript.