385 questions
4
votes
2
answers
507
views
No module named 'llama_models.cli.model' error while llama 3.1 8B downloading
I'm trying to install the LLaMA 3.1 8B model by following the instructions in the llamamodel GitHub README. When I run the command:
llama-model download --source meta --model-id CHOSEN_MODEL_ID
(...
0
votes
0
answers
105
views
pippy examples: torch._dynamo.exc.UserError: It looks like one of the outputs with type <class transformers.cache_utils.DynamicCache> is not supported
when the program starts to initialize pipeline object, a unexpected error was thrown:
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/anaconda3/envs/polar/lib/python3.12/site-...
0
votes
0
answers
49
views
Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?
I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document.
I read that after having reached maximum context, I should restart the ...
0
votes
0
answers
49
views
The data type of the llava model uncontrollably changes to float32
I am using the llama-8b-llava model. I have made some modifications to the model, which are non-structural and do not introduce any parameters. During the model loading process, I used the torch....
1
vote
1
answer
150
views
Import "llama_index.llms.ollama" could not be resolved
I have the following imports for a python file thats meant to be a multi llm agent soon. I wanted to use llama_index and I found a nice video from Tech with Tim which explains everything very well. I ...
1
vote
0
answers
115
views
Fine-tuned LLaMA 2–7B with QLoRA, but reloading fails: missing 4bit metadata. Likely saved after LoRA+resize. Need proper 4bit save method
I’ve been working on fine-tuning LLaMA 2–7B using QLoRA with bitsandbytes 4-bit quantization and ran into a weird issue. I did adaptive pretraining on Arabic data with a custom tokenizer (vocab size ~...
1
vote
0
answers
194
views
llama-cpp-python installing for x86_64 instead of arm64
I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python.
Even when I run
CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \
...
2
votes
1
answer
181
views
Llama_cookbook: why are labels not shifted for CausalLM?
I'm studying the llama_cookbok repo, in particular their finetuning example.
This example uses LlamaForCausalLM model and samsum_dataset (input: dialog, output: summary). Now, looking at how they ...
0
votes
0
answers
58
views
Using llama-index with the deployed LLM
I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...
0
votes
0
answers
112
views
Why `mul_mat` in ggml slower than llama.cpp?
I use the following command to compile an executable file for Android:
cmake \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-...
1
vote
0
answers
161
views
How to implement timeout and retry for long-running Hugging Face model inference in Python?
I'm working with a locally hosted Hugging Face transformers model (mistral-7b, llama2-13b, etc.), using the pipeline interface on a GPU server (A100).
Sometimes inference takes much longer than ...
2
votes
1
answer
87
views
How to re-use attention in huggingface
I have a long chunk of text that I need to process using a transformer, I would then like to have users ask different questions about it (all questions are independent, they don't relate to each other)...
1
vote
1
answer
230
views
No stopping token generated by Llama-3.2-1B-Instruct
I am experimenting with Llama-3.2-1B-Instruct for learning purposes. When I try to implement a simple re-write task with Hugging Face transformers, I get a weird result when the model does not ...
1
vote
1
answer
79
views
How to incorporate additional data in fine tuning LLM
My goal is to create a chat bot specialized in answering questions related to diabetes.
I am new to fine tuning and have a couple questions before I begin. My question is about the dataset format and ...
1
vote
0
answers
89
views
Microsoft.Extensions.AI responses output JSON when custom functions are used
I'm using Microsoft.Extensions.AI to run queries against numerous Ollama models that I have installed. I have added a custom functions (AIFunction type) by creating a ChatOptions instance and passing ...
0
votes
1
answer
213
views
Meta llama 3.2 3b model local download
I am installed the llamma3.2 model from meta directly and got it in this format
-a---- 3/10/2025 3:22 PM 209 checklist.chk
-a---- 3/10/2025 9:47 AM 6425585114 ...
0
votes
0
answers
39
views
Fine tuning LLama 3 8b stuck at save_step
I got an issue while training Llama-3-8B locally (on an RTX 3080 GPU).
It gets stuck at any save_step.If I set save_step to 200, it gets stuck at 200/300.If I set save_step to 1, it gets stuck at 1/...
1
vote
1
answer
331
views
Why does my Llama 3.1 model act differently between AutoModelForCausalLM and LlamaForCausalLM?
I have one set of weights, one tokenizer, the same prompt, and identical generation parameters. Yet somehow, when I load the model using AutoModelForCausalLM, I get one output, and when I construct it ...
0
votes
0
answers
133
views
How do I fix TypeError from Llama api call
I am running below code from: Llama quick start
import json
from llamaapi import LlamaAPI
# Initialize the SDK
llama = LlamaAPI("<your_api_token>")
# Build the API request
...
0
votes
1
answer
230
views
Does streaming work for Llama models on OpenAI's python API?
I setup streaming responses on a client that uses OpenAI's API on python. It is working fine for ChatGPT models, but when I attempt to use a Llama model (llama3.1-8b) I am getting a valid streaming ...
0
votes
0
answers
212
views
How to fix "ERROR: Could not find a version that satisfies the requirement llama-hub==0.0.79.post1"?
I am trying to install a set of requirements to progress some development around building up an Agent capable of interacting with LLMs.
I used the command pip3 install -r requirements.txt.
...
0
votes
1
answer
77
views
Program runs only once even after using loop
def data_input():
# It gets the data that the llm need and the user wants to ask question related to this data
def chat_with_data(data):
messages = [
{
"role": "system&...
0
votes
1
answer
225
views
llama3 responding only function call?
I am trying to make Llama3 Instruct able to use function call from tools , it does work but now it is answering only function call! if I ask something like who are you ? or what is apple device ? it ...
-1
votes
2
answers
604
views
while pip install llama-cpp-python getting error on windows pc
Creating directory "llava_shared.dir\Release".
Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...
0
votes
0
answers
143
views
Llama-API stops working after second request
I am trying to generate data based on previous texts via LLMs, and therefore making consecutive calls for all texts I have available. I have done this so far with OpenAI's and Ollama's Python ...
0
votes
0
answers
955
views
groq.GroqError: The api_key client option must be set either by passing api_key to the client or by setting the GROQ_API_KEY environment variable
I have been trying to use a LLama API usig Groq cloud but encountering this error. I have tried set GROQ_API_KEY=api_key
And
$env:GROQ_API_KEY = "api_key"
But havent found the any solution ...
-2
votes
2
answers
5k
views
What can cause this error in LMStudio : '''Failed to send message vk::Queue::submit: ErrorDeviceLost'''
I will have that error in the following scenarios:
once i ask a second question to a model without realoading it
once i create a new chat with any downloaded without realoading the model
once i try ...
0
votes
0
answers
70
views
RuntimeError with PyTorch when Fine-tuning LLM: "element 0 of tensors does not require grad"
I'm trying to fine-tune a LLaMA model using LoRA, but I'm getting the following error during training:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Code
Here's ...
1
vote
2
answers
2k
views
Langgraph tool calling - LM doesnt call provided tools
Currently im trying to learn how to develop an agentic AI based on langgraph academy video, while in the langgraph academy video is using openAI GPT, i decide to use llama3.2 3B as it is free.
Below ...
0
votes
0
answers
100
views
Does the input dimension of llama have to be the same as the output dimension (Expected input batch_size to match target batch_size?
When I fine-tune Llama3.2(11B/8B) to generate the target text
with self.maybe_autocast():
outputs = self.llama_model(
input_ids=None,
inputs_embeds=concat_inputs_embeds,
...
0
votes
0
answers
65
views
Trying to deploy Llama 3.2 model using Vertex AI Model Garden but I am not able to locate the URI containing Llama 3.2 pretrained and finetuned models
I am trying to follow this notebook to deploy llama 3.2 vision 11B model. In the before you begin step it's mentioned that in the Access Llama 3.2 models on Vertex AI for serving section, it's ...
0
votes
0
answers
72
views
Llama 3: Getting a CUDA unknown error while fine tuning Llama 3 on wikitext
I am a beginner in Large Language Models and the Hugging Face API. I was trying to fine tune the Llama 3.1 8b model on the wikitext dataset as practice.
When I try to run the following script, I get ...
0
votes
0
answers
99
views
Converting .gguf model to .pte results in an error
I am trying to convert https://huggingface.co/PrunaAI/Meta-Llama-3-8B-Instruct-GGUF-smashed/blob/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf to .pte format. The tool to do that is here - https://github....
0
votes
0
answers
22
views
Issue with LlamaParse ...Page 1 [error] - FONT_ERROR : Fail to identify 120 glyphs on page 1 from font : Type3
As I load this pdf either through python library or llamacloud.ai frontend it outputs markdown doc of random characters.
As I check out the History Log it shows to every page the same related error:
...
0
votes
0
answers
91
views
Removing non-English languages from Llama
I'm working with the meta-llama/Llama-3.2-1B model from Hugging Face Transformers and I only need it to support English. I was wondering if it's possible to remove all the other languages from this ...
0
votes
1
answer
557
views
Error calling the LLM using model API, using LlamaIndex
I am trying to build an agent that searches in a database, but when I call the query, o LlamaIndex is calling the OpenAI's API apparently, when it should to call the Groq API.
Code:
model = 'llama-3.3-...
0
votes
1
answer
269
views
How do I setup a python code to access llama 3.3 model
I have installed llama 3.3 on remote GPU using these commands.
curl -fsSL https://ollama.com/install.sh | sh
ollama run llama3.3
I need to run the following code which uses the llama3.3 installed on ...
0
votes
0
answers
295
views
python3 ../llama.cpp/convert_hf_to_gguf.py : KeyError: 'architectures'
Objective: convert this pytorch model https://huggingface.co/mtspeech/MooER-MTL-80K to gguf model
I downloaded the model :
(.venv) raphy@raohy:~/whisper.cpp/models$ cat download-MooER-MTL-80K.py
from ...
2
votes
0
answers
697
views
Cannot download Llama 3.2 3B model using Unsloth and Hugging Face
I want to locally fine-tune using my own dataset and then save the Llama 3.2-3B model locally too. I have an Anaconda setup and I'm on the base environment, where I can see clearly that unsloth and ...
0
votes
0
answers
74
views
How can HuggingFaceEndpoint instance not need a quantization config or tokenizer?
My original goal was to make a base chain class so I could further instantiate a chain with a LLM of my choice (e.g. gpt-4o-mini or meta-llama/Meta-Llama-3-8B etc).
I've noticed that ...
-1
votes
1
answer
1k
views
While executing 'llama model list' in my python environment, I was getting "'ModuleNotFoundError: No module named 'termios'" error
I was trying to use the Llama 3.2 multimodal model, and on the Llama AI's websitetext, it told me to run 'llama models list' command in my environment.Then I got an error:
(meteor_ai_1.0)
$ llama ...
2
votes
0
answers
410
views
Generating very long sequences with ollama
I'm trying to generate very long texts using ollama and python in a single run for research purposes but the generation stops with a stop status, despite everything
I'm trying to increase the context ...
0
votes
1
answer
117
views
Data extraction from diagrams using Vision Language Model
looking for some ideas to accurately extract data flows from system context diagram. I've tried a number of models and prompt engineering techniques, but i'm still getting missing flows, and ...
0
votes
0
answers
292
views
Working with Llama 3.2 Vision Multimodal for Object detection using roboflow
So I am working on a project where I am using a dataset downloaded from Roboflow in yolo format and then I am trying to pass the training folder to Llama 3.2 for supervised learning on that dataset ...
1
vote
0
answers
107
views
Finetuning with 8.1B Llama with LORA but responses are just repetitions of the inputs on inference
I'm finetuning a Llama 8.1B using LoRA with about a 1,000 samples for 3 epochs but after training (which takes about 3 hours) the model on inference just keeps repeating the input.
I'm using the
`{
...
0
votes
0
answers
113
views
Efficiently Handling Large Datasets with Locally Hosted LLM (Ollama) and PostgreSQL
I am working with a locally hosted LLM (Ollama with Llama 3.1) to process queries based on a large dataset stored in a PostgreSQL database (~1 million rows). I am fetching data in chunks from the ...
0
votes
0
answers
98
views
Can LlamaParse return as a pydantic object?
I have been unable to make LlamaParse return the parsed content in a strictly structured format,like Pydantic objects. Otherwise I would have to make a separate api call just to make it the parsed ...
0
votes
0
answers
25
views
TypeScript Error: 'quantized' property not recognized in Pipeline configuration when initializing LLaMA model with Transformers.js
I'm encountering a TypeScript error while trying to initialize a LLaMA model with quantization enabled using Transformers.js. The compiler is throwing an error indicating that the 'quantized' property ...
0
votes
0
answers
86
views
Create adapter for training llama model
I want to create small adapter for asking llama model about my abstract game, but when I query it afterward, it doesn't seem to know anything about game. What might be the problem?
This is text of ...
0
votes
0
answers
218
views
Finetuning LLaMa with Lora - bf16 errors on A100 GPU on Colab
I am attempting to fine-tune Llama3.2-1b model from huggingface on colab using A100 gpu, following this guide.
My understanding is that the A100 gpu supports bf16, but when attempting to finetune, it ...