Unexpected string validation error in Langchain Pydantic output parser

Question

I do not understand why the below use of the PydanticOutputParser is erroring.

The docs do not seem correct - If I follow this exactly (i.e. use with_structured_output exclusively, without an output parser) then the output is a dict, not Pydantic class. So I thought I modified it consistently with so SO answers e.g. this

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser

from uuid import uuid4
from pydantic import BaseModel, Field

class TestSummary(BaseModel):
    """Represents a summary of the concept"""

    id: str = Field(default_factory=lambda: str(uuid4()), description="Unique identifier")
    summary: str = Field(description="Succinct summary")
 
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0).with_structured_output(TestSummary)
parser = PydanticOutputParser(pydantic_object=TestSummary)
prompt = PromptTemplate(
    template="You are an AI summarizing long texts. TEXT: {stmt}",
    input_variables=["stmt"]
)
runnable = prompt | llm | parser 
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})

The error is

ValidationError: 1 validation error for Generation
text
  str type expected (type=type_error.str)

As discussed, if I omit the output parser, I get a dict:

runnable = prompt | llm #| parser 
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
type(result)
dict

eventHandler · Accepted Answer · 2024-06-07 15:27:02Z

3

Output parsers in Langchain receive a string, not structured data. They are used to do what you are already doing with with_structured_output, parse some input string into structured data, or possibly change its format.

From the documentation:

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

"Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.

"Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

Now you have the structured data, you just need to fill the model with it. https://stackoverflow.com/a/64505888/3443596

runnable = prompt | llm
result_dict = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
result = TestSummary.parse_obj(result_dict)

answered Jun 7, 2024 at 15:27

eventHandler

1,21113 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peter Over a year ago

Thanks! Follow up question: Why does my example above using with_structured_ouput return a dict while the example in the docs returns the model (without needing to .parse_obj)? python.langchain.com/v0.2/docs/how_to/structured_output/…

Peter Over a year ago

Oh I see - from langchain_core.pydantic_v1 import BaseModel, Field vs my from pydantic import BaseModel, Field direct import (which is Pydantic v2)

Collectives™ on Stack Overflow

Unexpected string validation error in Langchain Pydantic output parser

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related