3

I was working on QA using a large csv dataset (140K rows,18 columns). I am using a local llm model (llama2) along with create_csv_agent.

First of all the agent is only displaying 5 rows instead of 10. Secondly when I asked about "count the total number of rows in the dataset". It also generated a wrong output (generated output 5).

How to fix this issue?

Following is my code snippet.

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_csv_agent

agent = create_csv_agent(
local_llm,
"MLdata.csv",
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,handle_parsing_errors=True
)
print(agent.run("Please provide me the 10 records with VAX_TYPE COVID19."))``

I tried with a different agent (create_pandas_dataframe_agent) and also embed the data into vector database and try to do QA (please check the attached image and output), but it was not fully correct also.

4
  • number_of_head_rows=10 can be addd to fix the issue regarding displaying 5 rows instead of 10. But the LLM is still giving error when asked about total number of rows. Commented Nov 6, 2023 at 22:48
  • For the issue of the agent only displaying 5 rows instead of 10 and providing an incorrect total row count, you should check the documentation for the create_csv_agent function from the langchain library to find if there are parameters that control the number of rows returned or how the agent calculates counts. Verify your CSV file's integrity to ensure it's properly formatted with the correct number of rows. Commented Nov 7, 2023 at 2:44
  • @ImanMohammadi With "number_of_head_rows " you can control the total number of rows you want to display. But the problems remains when asking about "Count the total number of rows" or "count total number of columns". Also there is issue when searching for e.g., "fina the patients with covid19 vaccine with breathlessness symptom". The agent gives o/p but they are not complete. Commented Nov 7, 2023 at 13:28
  • @RahulPaul I'm facing the same issue. Have you been able to resolve it? Commented Apr 16, 2024 at 18:00

1 Answer 1

0

Instead of azure openai you can simply use openai

import os

os.environ['OPENAI_API_KEY'] = ''
os.environ['OPENAI_API_TYPE'] =''
os.environ['OPENAI_API_VERSION'] =''
os.environ['OPENAI_API_BASE'] =''

from langchain.chat_models import AzureChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent

llm = AzureChatOpenAI(
    deployment_name="gpt-4-32k",
    model_name="gpt-4-32k"
)

import pandas as pd
df = pd.read_csv("your_csv-file")

# Import the create_pandas_dataframe_agent function from the langchain_experimental package
from langchain_experimental.agents import create_pandas_dataframe_agent

# Create a Pandas Dataframe agent using the llm and df objects
agent = create_pandas_dataframe_agent(llm, df, verbose=True)

agent.run("how many unique brand names are there, give the count?")

agent.run("what is total sales_quantity?")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.