Newest 'llama3' Questions

4 votes

2 answers

507 views

No module named 'llama_models.cli.model' error while llama 3.1 8B downloading

I'm trying to install the LLaMA 3.1 8B model by following the instructions in the llamamodel GitHub README. When I run the command: llama-model download --source meta --model-id CHOSEN_MODEL_ID (...

alwayssaewoo

41

asked Nov 6 at 20:59

0 votes

0 answers

49 views

Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?

I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document. I read that after having reached maximum context, I should restart the ...

user305883

1,739

asked Sep 23 at 23:35

0 votes

0 answers

46 views

Custom NER to extract header, request and response from API document

I'm trying to extract API integration parameters like Authorization headers, query params, and request body fields from API documentation. This is essentially a custom NER task. I’ve experimented with ...

Rukhma

1

asked Jul 12 at 14:42

0 votes

1 answer

139 views

LLM-Agent: Tool calling problem after conversion from HuggingFace to Ollama for llama stack

I am using llama stack (https://llama-stack.readthedocs.io/en/latest/) and as provider of models to interact with Ollama. At first I used tool calling from models directly downloaded from Ollama. ...

andrealorenzetti

13

asked Jul 4 at 13:15

0 votes

0 answers

99 views

How to implement context-aware tool routing with local models like Ollama?

I'm using a locally hosted model(llama3.2) with Ollama and trying to replicate functionality similar to bind_tools(to create and run the tools with LLM ) for tool calling. This is my model service ...

Ahmad Ali

98

asked Jun 25 at 8:10

1 vote

0 answers

235 views

Multi MCP Tool Servers Issue with llama-3-3-70b-instruct

I'm following codes from links: https://github.com/jalr4ever/Tiny-OAI-MCP-Agent/blob/main/mcp_client.py https://github.com/philschmid/mcp-openai-gemini-llama-example/blob/master/...

Akshay Kulkarni

21

asked Jun 14 at 20:20

0 votes

1 answer

122 views

WASM LlamaEdge won't use GPU; problem fix or change tools?

So I'm trying to toss together a little demo that is essentially: 1) generate some text live and save to a file (I've got this working), 2) have a local instance of an LLM running (Llama3 in this case)...

PoGaMi

133

asked May 25 at 20:46

0 votes

0 answers

565 views

passing correct context to the model via the Ollama api

I am teaching myself LLM programming by developing a RAG application. I am running Llama 3.2 on my laptop using Ollama, and using a mix of SQLite and langchain. I can pass a context to the llm along ...

punkish

15.6k

asked Apr 18 at 17:07

0 votes

0 answers

28 views

Encountering problem while fine tuning Llama3.1 using custom dataset with Lora

I am learning to fine tune Llama3.1 on a custom dataset.I have converted my dataset to a hugging face dataset.By evaluating directly using the model gives accuracy of 80%.Now when i am trying to fine ...

Jagatha Pugazhendhi

7

asked Mar 26 at 4:01

0 votes

0 answers

330 views

Repetition Issues in Llama Models (3:8B, 3:70B, 3.1, 3.2)

I'm extracting Inputs, Outputs, and Summaries from large legacy codebases (COBOL, RPG), but facing repetition issues, especially when generating bullet points. Summaries work fine, but sections like ...

Saurav Srivastava

1

asked Mar 5 at 9:10

0 votes

1 answer

129 views

llama31 - Results from tool ignored

I am communicating with ollama (llama3.1b) and have it respond with a tool call that I can resolve. However - I am struggling with the final call to ollama that would resolve the orginal question. I ...

Michaela.Merz

173

asked Feb 27 at 18:00

1 vote

1 answer

436 views

Unable to get llama3 to serve json reponse on a local ollama installaiton using jupyter notebook

On a windows 11 machine, I am trying to get a json reponse from the llama3 model on my local ollama installation on jupyter notebook but it does not work Steps I tried: This below snippet works ...

Pri

11

asked Feb 27 at 5:41

0 votes

1 answer

225 views

llama3 responding only function call?

I am trying to make Llama3 Instruct able to use function call from tools , it does work but now it is answering only function call! if I ask something like who are you ? or what is apple device ? it ...

Kodr.F

14.5k

asked Feb 17 at 10:35

1 vote

0 answers

3k views

How can I accurately count tokens for Llama3/DeepSeek r1 prompts when Groq API reports “Request too large”?

I'm integrating the Groq API in my Flask application to classify social media posts using a model based on DeepSeek r1 (e.g., deepseek-r1-distill-llama-70b). I build a prompt by combining multiple ...

Towsif Ahamed Labib

824

asked Feb 2 at 16:19

0 votes

0 answers

135 views

How does batch option work in pipeline transformers library

I have a collection of news articles and I want to produce some new (unbiased) news articles using meta-llama/Meta-Llama-3-8B-Instruct. The articles are in a huggingface Dataset and to feed the ...

Xhulio Xhelilai

45

asked Jan 3 at 16:45

0 votes

1 answer

131 views

Llama-index chatbot in FastAPI endpoint throws "Event loop is already running"

The end-point implementation is like so: @app.post("/api/chat/{question}", dependencies=[Depends(sessionValidator)]) async def chat(question: str = Path(...), my_chatbot=Depends(...

Sun Bee

1,840

asked Dec 19, 2024 at 16:38

0 votes

2 answers

2k views

Why does running Llama 3.1 70B model underutilises the GPU?

I have deployed Llama 3.1 70B and Llama 3.1 8B on my system and it works perfectly for the 8B model. When I tested it for 70B, it underutilized the GPU and took a lot of time to respond. Here are the ...

JAMSHAID

1,375

asked Dec 5, 2024 at 9:13

1 vote

0 answers

334 views

Inconsistent Tool Calling Behavior with LLaMA 3.1 70B Model on AWS Bedrock

I am using the LLaMA 3.1 70B Instruct model via AWS Bedrock with LangChain for agent-based function calling. While testing, I observed the following issues: Inconsistent Tool Calling: The model often ...

Nilesh Malode

11

asked Dec 3, 2024 at 12:42

0 votes

1 answer

364 views

LangChain OutputParserException with SQL Agent using Bedrock Model in Node.js

I'm integrating a SQL agent with LangChain in a Node.js application using the AWS Bedrock model (us.meta.llama3-2-1b-instruct-v1:0) for natural language to SQL conversion. Database: PostgreSQL ...

Nibin

3,960

asked Nov 9, 2024 at 16:40

0 votes

1 answer

313 views

How to Speed Up Document Retrieval with llama_index Using a Local Model in Jupyter Notebook?

I'm working on a project that uses llama_index to retrieve document information in Jupyter Notebook, but I'm experiencing very slow query response times (around 15 minutes per query). I'm using the ...

Kavinila

13

asked Nov 4, 2024 at 13:51

0 votes

2 answers

1k views

Ollama with Python - Chat is stuck on the first prompt

I'm testing a local GPT with Ollama running on a Flask server. I've developed an interface to chat using Llama3.2 model. I've managed to create the chat history and the chatbot answers according to ...

P. Frau

71

asked Nov 1, 2024 at 18:49

8 votes

2 answers

2k views

Llama3.2 fails to respond to simple text inputs when bounded with tool calling on LangGraph

I am following along a LangChain tutorial for LangGraph. They are using OpenAI models in the tutorial. However, I want to use my local Ollama models. I am using Llama 3.2 as that supports tool ...

Neha

179

asked Oct 21, 2024 at 12:39

0 votes

2 answers

957 views

llama3.2 Installation Error: exiting with status 0xc0000135

When I run ollama run llama3.2 after it is installed this error shows up, llama runner process has terminated: exit status 0xc0000135. Can anyone tell me the issue. Using ollama I installed llama3.2 ...

Arhan

3

asked Oct 10, 2024 at 0:15

0 votes

1 answer

2k views

HFValidation Error for calling the repo-id incorrectly, what am I doing wrong?

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'meta-llama/llama3.1/8b-instruct-fp16'. Use repo_type argument if needed. tokenizer = AutoTokenizer....

James Brittain

1

asked Sep 30, 2024 at 15:27

0 votes

0 answers

217 views

Loading Llama model to a Google Cloud Run Ollama Container through a Dockerfile

I have been trying to Dockerize Ollama and consequently load the Llama3.1 model into the Google Cloud Run deployment. While Ollama is running as expected in Cloud Run, the model is not loaded as ...

wayne

1

asked Sep 30, 2024 at 14:02

0 votes

1 answer

376 views

AWS Bedrock Chatbot with Llama 3 Repeating Entire Conversation History Instead of Just Answering

I'm working on a chatbot application using Amazon Bedrock with the Llama 3 model. I'm using Streamlit for the frontend and LangChain for managing the conversation. However, I'm encountering an issue ...

rahul raj

21

asked Aug 12, 2024 at 13:00

0 votes

1 answer

83 views

Celery workers rebuild the model every startup

I am facing an issue with my Django project which runs inside a Docker container along with Redis and Celery. I am using Redis and Celery to manage queues and other processes. The problem is that my ...

Mehmet Yıldırım

1

asked Aug 11, 2024 at 11:57

2 votes

1 answer

5k views

Characters limit on request for LLama3.1:8b running on Ollama

I'm currently running the LLama 3.1:8B model using the Ollama Docker container. My context window has the following structure: Bot Personality Bot Directives Conversation (an array of messages) I ...

Claus

5,762

asked Aug 5, 2024 at 23:29

2 votes

0 answers

93 views

Deploy llama3 on cloud and use it in a Java app

I have a custom llama3:8b model which I have created using a model file with specific instructions. I need steps/resources to do the following which I could not find: Deploy this llama model on cloud ...

Toji

250

asked Aug 4, 2024 at 13:36

2 votes

1 answer

3k views

Finding config.json for Llama 3.1 8B

I installed the Llama 3.1 8B model through Meta's Github page, but I can't get their example code to work. I'm running the following code in the same directory as the Meta-Llama-3.1-8B folder: import ...

MatthewScarpino

5,966

asked Aug 3, 2024 at 12:54

1 vote

0 answers

631 views

Size Mismatch Error When Loading State Dict for (Fine-tuned model)

I'm encountering a RuntimeError while trying to load a state_dict for LlamaForCausalLM. The error message indicates a size mismatch: RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: ...

DigiSpocDeera

61

asked Aug 2, 2024 at 5:51

1 vote

0 answers

50 views

Improve the Llama3 inference's latency and/or throughput?

You can also use Llama3 model on SageMaker JumpStart as below: from sagemaker.jumpstart.model import JumpStartModel model = JumpStartModel(model_id = "meta-textgeneration-llama-3-70b-instruct&...

celsofranssa

196

asked Aug 2, 2024 at 0:01

0 votes

1 answer

360 views

How to clear GPU memory in Google Collab after training a model

I am currently trying to do cross-validation with Llama-3 LLM in Google Collab, and I am facing with the issue that the GPU memory runs out way before I am able to finish my experiments. My code is ...

Silvia A

1

asked Jul 24, 2024 at 20:04

0 votes

0 answers

908 views

ConnectError: All connection attempts failed when connecting indexing to neo4j database using PropertyGraphIndex from llama3

I am working on knowledge graph and all connection to neo4j browser is a success(using neo4j desktop windows not docker deployed). however with llama3 i am running the same notebooks as in property ...

Kcndze

29

asked Jul 19, 2024 at 13:46

0 votes

1 answer

604 views

Does langchain with llama-cpp-python fail to work with very long prompts?

I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...

bibiibibin

1

asked Jul 18, 2024 at 15:39

2 votes

1 answer

3k views

'LlamaForCausalLM' object has no attribute 'max_seq_length'

I'm fine-tuning llama3 using unsloth , I trained my model and saved it successfully but when I tried loading using AutoPeftModelForCausalLM.from_pretrained ,then I used TextStreamer from transformer ...

Sarra Ben Messaoud

21

asked Jul 18, 2024 at 10:47

0 votes

1 answer

611 views

How to merge multiple (at least two) existing LlamaIndex VectorStoreIndex instances?

I'm working with LlamaIndex and have created two separate VectorStoreIndex instances, each from different documents. Now, I want to merge these two indexes into a single index. Here's my current setup:...

林抿均

53

asked Jul 16, 2024 at 12:07

1 vote

0 answers

391 views

In Pytorch and Huggingface transformers, why does loading Llama3 to CPU and then using .to use so much more memory than loading with device_map

I've tried loading Huggingface transformers models to MPS in two different ways: llm = AutoModelForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch....

Owen D

85

asked Jul 4, 2024 at 2:16

1 vote

1 answer

5k views

How to set eos_token_id in llama3 in HuggingFaceLLM?

I wanna set my eos_token_id, and pad_token_id. I googled alot, and most are suggesting to use e.g. tokenizer.pad_token_id (like from here https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/...

yts61

1,669

asked Jun 30, 2024 at 11:11

0 votes

0 answers

194 views

Loading int8 version of llama3 from llama.cpp

I'm trying to load an 8 bit quantized version of llama3 on my local laptop (linux) from llama.cpp, but the process is getting killed due to memory exceeding. Is there any way around this? I've already ...

Anagha

1

asked Jun 27, 2024 at 9:03

0 votes

1 answer

523 views

Long response time with llama-server (40–60sec)

I managed to run the Llama server with the following command: ./llama-server -m models/7B/ggml-model.gguf -c 2048 My request looks like this: time curl --request POST --url http://localhost:8080/...

didinko

572

asked Jun 17, 2024 at 13:56

0 votes

1 answer

297 views

Prompt Template for Sequence matching using LlamaIndex and Llama3-70B-Instruct

I'm trying to get llama3-70b to find all sequences that match a given list. The list contains multiple terms (which range from one word to twelve words). I want the model to match all terms in a given ...

joshpopelka20

69

asked Jun 12, 2024 at 15:40

0 votes

0 answers

846 views

Using BAAI/bge-small-en-v1.5 with ChromaDb and LlamaIndex

I am new to LLMs. I created a local RAG using Llamaindex with llama3 to load our documents and I am using ChromaDb to persist the embeddings. I am not clear on how do I specify a specific embedding ...

tigger tigger

109

asked Jun 12, 2024 at 4:30

2 votes

0 answers

3k views

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate

I encountered an error when downloading a model from huggingface. It was working on Google Colab, but not working on my windows machine. I am using Python 3.10.0. The error code is shown below: E:\...

Aswin Jimmy

21

asked Jun 8, 2024 at 8:37

1 vote

0 answers

50 views

Expected scalar type Long but found Int error while using tune on llama 3

When im trying to use Llama3-8B tune guide from : https://pytorch.org/torchtune/0.1/tutorials/llama3.html it gave me this error : W0608 08:41:38.766000 10904 torch\distributed\elastic\multiprocessing\...

graph User

45

asked Jun 8, 2024 at 6:45

0 votes

1 answer

5k views

"You have a version of `bitsandbytes` that is not compatible with 4bit inference and training"

I am now trying to finetune a llama3 model. I am using unsloth, from unsloth import FastLanguageModel Then I load Llama3 model. model, tokenizer = FastLanguageModel.from_pretrained( model_name = &...

yts61

1,669

asked Jun 7, 2024 at 17:08

0 votes

1 answer

200 views

Not able to access llama3 using python

I am testing llama3 here using this simple code below import ollama message = "What is football" # connect to Llama3 model try: response_stream = ollama.chat( model="llama3&...

Nived Puthumana Meleppattu

3

asked Jun 6, 2024 at 6:42

0 votes

1 answer

522 views

llama-index,uncharted and llama2:7b run locally to generate Index

I wanted to use llama-index locally with ollama and llama3:8b to index utf-8 json file. I dont have a gpu. I use uncharted to convert docs into json. Now If it is not possible to use llama-index ...

Asif Rahman

465

asked Jun 5, 2024 at 12:38

1 vote

1 answer

429 views

FineTune llama3 model with torch tune gives error

Im trying to fine tune the llama3 model with torch tune. these are the steps that ive already done : 1.pip install torch 2.pip install torchtune 3.tune download meta-llama/Meta-Llama-3-8B --output-dir ...

Ahad Porkar

1,718

asked Jun 3, 2024 at 15:25

Collectives™ on Stack Overflow