0

I do not understand why the below use of the PydanticOutputParser is erroring.

The docs do not seem correct - If I follow this exactly (i.e. use with_structured_output exclusively, without an output parser) then the output is a dict, not Pydantic class. So I thought I modified it consistently with so SO answers e.g. this

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser

from uuid import uuid4
from pydantic import BaseModel, Field

class TestSummary(BaseModel):
    """Represents a summary of the concept"""

    id: str = Field(default_factory=lambda: str(uuid4()), description="Unique identifier")
    summary: str = Field(description="Succinct summary")
 
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0).with_structured_output(TestSummary)
parser = PydanticOutputParser(pydantic_object=TestSummary)
prompt = PromptTemplate(
    template="You are an AI summarizing long texts. TEXT: {stmt}",
    input_variables=["stmt"]
)
runnable = prompt | llm | parser 
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})

The error is

ValidationError: 1 validation error for Generation
text
  str type expected (type=type_error.str)

As discussed, if I omit the output parser, I get a dict:

runnable = prompt | llm #| parser 
result = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
type(result)
dict

1 Answer 1

3

Output parsers in Langchain receive a string, not structured data. They are used to do what you are already doing with with_structured_output, parse some input string into structured data, or possibly change its format.

From the documentation:

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

  • "Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.
  • "Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

Now you have the structured data, you just need to fill the model with it. https://stackoverflow.com/a/64505888/3443596

runnable = prompt | llm
result_dict = runnable.invoke({"stmt": "This is a really long piece of literature I'm too lazy to read"})
result = TestSummary.parse_obj(result_dict)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Follow up question: Why does my example above using with_structured_ouput return a dict while the example in the docs returns the model (without needing to .parse_obj)? python.langchain.com/v0.2/docs/how_to/structured_output/…
Oh I see - from langchain_core.pydantic_v1 import BaseModel, Field vs my from pydantic import BaseModel, Field direct import (which is Pydantic v2)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.