I was working on QA using a large csv dataset (140K rows,18 columns). I am using a local llm model (llama2) along with create_csv_agent.
First of all the agent is only displaying 5 rows instead of 10. Secondly when I asked about "count the total number of rows in the dataset". It also generated a wrong output (generated output 5).
How to fix this issue?
Following is my code snippet.
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_csv_agent
agent = create_csv_agent(
local_llm,
"MLdata.csv",
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,handle_parsing_errors=True
)
print(agent.run("Please provide me the 10 records with VAX_TYPE COVID19."))``
I tried with a different agent (create_pandas_dataframe_agent) and also embed the data into vector database and try to do QA (please check the attached image and output), but it was not fully correct also.
create_csv_agentfunction from thelangchainlibrary to find if there are parameters that control the number of rows returned or how the agent calculates counts. Verify your CSV file's integrity to ensure it's properly formatted with the correct number of rows.