QA with a large csv using langchain

Question

I was working on QA using a large csv dataset (140K rows,18 columns). I am using a local llm model (llama2) along with create_csv_agent.

First of all the agent is only displaying 5 rows instead of 10. Secondly when I asked about "count the total number of rows in the dataset". It also generated a wrong output (generated output 5).

How to fix this issue?

Following is my code snippet.

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_csv_agent

agent = create_csv_agent(
local_llm,
"MLdata.csv",
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,handle_parsing_errors=True
)
print(agent.run("Please provide me the 10 records with VAX_TYPE COVID19."))``

I tried with a different agent (create_pandas_dataframe_agent) and also embed the data into vector database and try to do QA (please check the attached image and output), but it was not fully correct also.

number_of_head_rows=10 can be addd to fix the issue regarding displaying 5 rows instead of 10. But the LLM is still giving error when asked about total number of rows. — Rahul Paul
– Rahul Paul, Commented Nov 6, 2023 at 22:48
For the issue of the agent only displaying 5 rows instead of 10 and providing an incorrect total row count, you should check the documentation for the create_csv_agent function from the langchain library to find if there are parameters that control the number of rows returned or how the agent calculates counts. Verify your CSV file's integrity to ensure it's properly formatted with the correct number of rows. — user14816009
– user14816009, Commented Nov 7, 2023 at 2:44
@ImanMohammadi With "number_of_head_rows " you can control the total number of rows you want to display. But the problems remains when asking about "Count the total number of rows" or "count total number of columns". Also there is issue when searching for e.g., "fina the patients with covid19 vaccine with breathlessness symptom". The agent gives o/p but they are not complete. — Rahul Paul
– Rahul Paul, Commented Nov 7, 2023 at 13:28
@RahulPaul I'm facing the same issue. Have you been able to resolve it? — AnonymousMe
– AnonymousMe, Commented Apr 16, 2024 at 18:00

Mukesh Lohumi · Accepted Answer · 2024-06-07 11:04:20Z

Instead of azure openai you can simply use openai

import os

os.environ['OPENAI_API_KEY'] = ''
os.environ['OPENAI_API_TYPE'] =''
os.environ['OPENAI_API_VERSION'] =''
os.environ['OPENAI_API_BASE'] =''

from langchain.chat_models import AzureChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent

llm = AzureChatOpenAI(
    deployment_name="gpt-4-32k",
    model_name="gpt-4-32k"
)

import pandas as pd
df = pd.read_csv("your_csv-file")

# Import the create_pandas_dataframe_agent function from the langchain_experimental package
from langchain_experimental.agents import create_pandas_dataframe_agent

# Create a Pandas Dataframe agent using the llm and df objects
agent = create_pandas_dataframe_agent(llm, df, verbose=True)

agent.run("how many unique brand names are there, give the count?")

agent.run("what is total sales_quantity?")

Collectives™ on Stack Overflow

QA with a large csv using langchain

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related