49 questions
4
votes
2
answers
507
views
No module named 'llama_models.cli.model' error while llama 3.1 8B downloading
I'm trying to install the LLaMA 3.1 8B model by following the instructions in the llamamodel GitHub README. When I run the command:
llama-model download --source meta --model-id CHOSEN_MODEL_ID
(...
0
votes
0
answers
49
views
Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?
I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document.
I read that after having reached maximum context, I should restart the ...
0
votes
0
answers
46
views
Custom NER to extract header, request and response from API document
I'm trying to extract API integration parameters like Authorization headers, query params, and request body fields from API documentation. This is essentially a custom NER task.
I’ve experimented with ...
0
votes
1
answer
139
views
LLM-Agent: Tool calling problem after conversion from HuggingFace to Ollama for llama stack
I am using llama stack (https://llama-stack.readthedocs.io/en/latest/) and as provider of models to interact with Ollama.
At first I used tool calling from models directly downloaded from Ollama. ...
0
votes
0
answers
99
views
How to implement context-aware tool routing with local models like Ollama?
I'm using a locally hosted model(llama3.2) with Ollama and trying to replicate functionality similar to bind_tools(to create and run the tools with LLM ) for tool calling.
This is my model service
...
1
vote
0
answers
235
views
Multi MCP Tool Servers Issue with llama-3-3-70b-instruct
I'm following codes from links:
https://github.com/jalr4ever/Tiny-OAI-MCP-Agent/blob/main/mcp_client.py
https://github.com/philschmid/mcp-openai-gemini-llama-example/blob/master/...
0
votes
1
answer
122
views
WASM LlamaEdge won't use GPU; problem fix or change tools?
So I'm trying to toss together a little demo that is essentially: 1) generate some text live and save to a file (I've got this working), 2) have a local instance of an LLM running (Llama3 in this case)...
0
votes
0
answers
565
views
passing correct context to the model via the Ollama api
I am teaching myself LLM programming by developing a RAG application. I am running Llama 3.2 on my laptop using Ollama, and using a mix of SQLite and langchain.
I can pass a context to the llm along ...
0
votes
0
answers
28
views
Encountering problem while fine tuning Llama3.1 using custom dataset with Lora
I am learning to fine tune Llama3.1 on a custom dataset.I have converted my dataset to a hugging face dataset.By evaluating directly using the model gives accuracy of 80%.Now when i am trying to fine ...
0
votes
0
answers
330
views
Repetition Issues in Llama Models (3:8B, 3:70B, 3.1, 3.2)
I'm extracting Inputs, Outputs, and Summaries from large legacy codebases (COBOL, RPG), but facing repetition issues, especially when generating bullet points. Summaries work fine, but sections like ...
0
votes
1
answer
129
views
llama31 - Results from tool ignored
I am communicating with ollama (llama3.1b) and have it respond with a tool call that I can resolve. However - I am struggling with the final call to ollama that would resolve the orginal question. I ...
1
vote
1
answer
436
views
Unable to get llama3 to serve json reponse on a local ollama installaiton using jupyter notebook
On a windows 11 machine, I am trying to get a json reponse from the llama3 model on my local ollama installation on jupyter notebook but it does not work
Steps I tried:
This below snippet works
...
0
votes
1
answer
225
views
llama3 responding only function call?
I am trying to make Llama3 Instruct able to use function call from tools , it does work but now it is answering only function call! if I ask something like who are you ? or what is apple device ? it ...
1
vote
0
answers
3k
views
How can I accurately count tokens for Llama3/DeepSeek r1 prompts when Groq API reports “Request too large”?
I'm integrating the Groq API in my Flask application to classify social media posts using a model based on DeepSeek r1 (e.g., deepseek-r1-distill-llama-70b). I build a prompt by combining multiple ...
0
votes
0
answers
135
views
How does batch option work in pipeline transformers library
I have a collection of news articles and I want to produce some new (unbiased) news articles using meta-llama/Meta-Llama-3-8B-Instruct. The articles are in a huggingface Dataset and to feed the ...
0
votes
1
answer
131
views
Llama-index chatbot in FastAPI endpoint throws "Event loop is already running"
The end-point implementation is like so:
@app.post("/api/chat/{question}", dependencies=[Depends(sessionValidator)])
async def chat(question: str = Path(...), my_chatbot=Depends(...
0
votes
2
answers
2k
views
Why does running Llama 3.1 70B model underutilises the GPU?
I have deployed Llama 3.1 70B and Llama 3.1 8B on my system and it works perfectly for the 8B model. When I tested it for 70B, it underutilized the GPU and took a lot of time to respond. Here are the ...
1
vote
0
answers
334
views
Inconsistent Tool Calling Behavior with LLaMA 3.1 70B Model on AWS Bedrock
I am using the LLaMA 3.1 70B Instruct model via AWS Bedrock with LangChain for agent-based function calling. While testing, I observed the following issues:
Inconsistent Tool Calling: The model often ...
0
votes
1
answer
364
views
LangChain OutputParserException with SQL Agent using Bedrock Model in Node.js
I'm integrating a SQL agent with LangChain in a Node.js application using the AWS Bedrock model (us.meta.llama3-2-1b-instruct-v1:0) for natural language to SQL conversion.
Database: PostgreSQL
...
0
votes
1
answer
313
views
How to Speed Up Document Retrieval with llama_index Using a Local Model in Jupyter Notebook?
I'm working on a project that uses llama_index to retrieve document information in Jupyter Notebook, but I'm experiencing very slow query response times (around 15 minutes per query). I'm using the ...
0
votes
2
answers
1k
views
Ollama with Python - Chat is stuck on the first prompt
I'm testing a local GPT with Ollama running on a Flask server. I've developed an interface to chat using Llama3.2 model.
I've managed to create the chat history and the chatbot answers according to ...
8
votes
2
answers
2k
views
Llama3.2 fails to respond to simple text inputs when bounded with tool calling on LangGraph
I am following along a LangChain tutorial for LangGraph. They are using OpenAI models in the tutorial. However, I want to use my local Ollama models. I am using Llama 3.2 as that supports tool ...
0
votes
2
answers
957
views
llama3.2 Installation Error: exiting with status 0xc0000135
When I run ollama run llama3.2 after it is installed this error shows up,
llama runner process has terminated: exit status 0xc0000135.
Can anyone tell me the issue.
Using ollama I installed llama3.2 ...
0
votes
1
answer
2k
views
HFValidation Error for calling the repo-id incorrectly, what am I doing wrong?
HFValidationError: Repo id must be in the form 'repo_name' or
'namespace/repo_name': 'meta-llama/llama3.1/8b-instruct-fp16'. Use
repo_type argument if needed.
tokenizer = AutoTokenizer....
0
votes
0
answers
217
views
Loading Llama model to a Google Cloud Run Ollama Container through a Dockerfile
I have been trying to Dockerize Ollama and consequently load the Llama3.1 model into the Google Cloud Run deployment. While Ollama is running as expected in Cloud Run, the model is not loaded as ...
0
votes
1
answer
376
views
AWS Bedrock Chatbot with Llama 3 Repeating Entire Conversation History Instead of Just Answering
I'm working on a chatbot application using Amazon Bedrock with the Llama 3 model. I'm using Streamlit for the frontend and LangChain for managing the conversation. However, I'm encountering an issue ...
0
votes
1
answer
83
views
Celery workers rebuild the model every startup
I am facing an issue with my Django project which runs inside a Docker container along with Redis and Celery. I am using Redis and Celery to manage queues and other processes.
The problem is that my ...
2
votes
1
answer
5k
views
Characters limit on request for LLama3.1:8b running on Ollama
I'm currently running the LLama 3.1:8B model using the Ollama Docker container. My context window has the following structure:
Bot Personality
Bot Directives
Conversation (an array of messages)
I ...
2
votes
0
answers
93
views
Deploy llama3 on cloud and use it in a Java app
I have a custom llama3:8b model which I have created using a model file with specific instructions. I need steps/resources to do the following which I could not find:
Deploy this llama model on cloud ...
2
votes
1
answer
3k
views
Finding config.json for Llama 3.1 8B
I installed the Llama 3.1 8B model through Meta's Github page, but I can't get their example code to work. I'm running the following code in the same directory as the Meta-Llama-3.1-8B folder:
import ...
1
vote
0
answers
631
views
Size Mismatch Error When Loading State Dict for (Fine-tuned model)
I'm encountering a RuntimeError while trying to load a state_dict for LlamaForCausalLM. The error message indicates a size mismatch:
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
...
1
vote
0
answers
50
views
Improve the Llama3 inference's latency and/or throughput?
You can also use Llama3 model on SageMaker JumpStart as below:
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id = "meta-textgeneration-llama-3-70b-instruct&...
0
votes
1
answer
360
views
How to clear GPU memory in Google Collab after training a model
I am currently trying to do cross-validation with Llama-3 LLM in Google Collab, and I am facing with the issue that the GPU memory runs out way before I am able to finish my experiments. My code is ...
0
votes
0
answers
908
views
ConnectError: All connection attempts failed when connecting indexing to neo4j database using PropertyGraphIndex from llama3
I am working on knowledge graph and all connection to neo4j browser is a success(using neo4j desktop windows not docker deployed). however with llama3 i am running the same notebooks as in property ...
0
votes
1
answer
604
views
Does langchain with llama-cpp-python fail to work with very long prompts?
I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...
2
votes
1
answer
3k
views
'LlamaForCausalLM' object has no attribute 'max_seq_length'
I'm fine-tuning llama3 using unsloth , I trained my model and saved it successfully but when I tried loading using AutoPeftModelForCausalLM.from_pretrained ,then I used TextStreamer from transformer ...
0
votes
1
answer
611
views
How to merge multiple (at least two) existing LlamaIndex VectorStoreIndex instances?
I'm working with LlamaIndex and have created two separate VectorStoreIndex instances, each from different documents. Now, I want to merge these two indexes into a single index. Here's my current setup:...
1
vote
0
answers
391
views
In Pytorch and Huggingface transformers, why does loading Llama3 to CPU and then using .to use so much more memory than loading with device_map
I've tried loading Huggingface transformers models to MPS in two different ways:
llm = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
torch_dtype=torch....
1
vote
1
answer
5k
views
How to set eos_token_id in llama3 in HuggingFaceLLM?
I wanna set my eos_token_id, and pad_token_id. I googled alot, and most are suggesting to use e.g. tokenizer.pad_token_id (like from here https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/...
0
votes
0
answers
194
views
Loading int8 version of llama3 from llama.cpp
I'm trying to load an 8 bit quantized version of llama3 on my local laptop (linux) from llama.cpp, but the process is getting killed due to memory exceeding.
Is there any way around this?
I've already ...
0
votes
1
answer
523
views
Long response time with llama-server (40–60sec)
I managed to run the Llama server with the following command:
./llama-server -m models/7B/ggml-model.gguf -c 2048
My request looks like this:
time curl --request POST --url http://localhost:8080/...
0
votes
1
answer
297
views
Prompt Template for Sequence matching using LlamaIndex and Llama3-70B-Instruct
I'm trying to get llama3-70b to find all sequences that match a given list. The list contains multiple terms (which range from one word to twelve words). I want the model to match all terms in a given ...
0
votes
0
answers
846
views
Using BAAI/bge-small-en-v1.5 with ChromaDb and LlamaIndex
I am new to LLMs. I created a local RAG using Llamaindex with llama3 to load our documents and I am using ChromaDb to persist the embeddings. I am not clear on how do I specify a specific embedding ...
2
votes
0
answers
3k
views
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate
I encountered an error when downloading a model from huggingface. It was working on Google Colab, but not working on my windows machine. I am using Python 3.10.0.
The error code is shown below:
E:\...
1
vote
0
answers
50
views
Expected scalar type Long but found Int error while using tune on llama 3
When im trying to use Llama3-8B tune guide from :
https://pytorch.org/torchtune/0.1/tutorials/llama3.html
it gave me this error :
W0608 08:41:38.766000 10904 torch\distributed\elastic\multiprocessing\...
0
votes
1
answer
5k
views
"You have a version of `bitsandbytes` that is not compatible with 4bit inference and training"
I am now trying to finetune a llama3 model.
I am using unsloth,
from unsloth import FastLanguageModel
Then I load Llama3 model.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = &...
0
votes
1
answer
200
views
Not able to access llama3 using python
I am testing llama3 here using this simple code below
import ollama
message = "What is football"
# connect to Llama3 model
try:
response_stream = ollama.chat(
model="llama3&...
0
votes
1
answer
522
views
llama-index,uncharted and llama2:7b run locally to generate Index
I wanted to use llama-index locally with ollama and llama3:8b to index utf-8 json file. I dont have a gpu. I use uncharted to convert docs into json. Now If it is not possible to use llama-index ...
1
vote
1
answer
429
views
FineTune llama3 model with torch tune gives error
Im trying to fine tune the llama3 model with torch tune.
these are the steps that ive already done :
1.pip install torch
2.pip install torchtune
3.tune download meta-llama/Meta-Llama-3-8B --output-dir ...