Newest 'llama' Questions

4 votes

2 answers

507 views

No module named 'llama_models.cli.model' error while llama 3.1 8B downloading

I'm trying to install the LLaMA 3.1 8B model by following the instructions in the llamamodel GitHub README. When I run the command: llama-model download --source meta --model-id CHOSEN_MODEL_ID (...

alwayssaewoo

41

asked Nov 6 at 20:59

0 votes

0 answers

105 views

pippy examples: torch._dynamo.exc.UserError: It looks like one of the outputs with type <class transformers.cache_utils.DynamicCache> is not supported

when the program starts to initialize pipeline object, a unexpected error was thrown: [rank0]: Traceback (most recent call last): [rank0]: File "/root/anaconda3/envs/polar/lib/python3.12/site-...

Aerith

1

asked Sep 24 at 0:05

0 votes

0 answers

49 views

Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?

I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document. I read that after having reached maximum context, I should restart the ...

user305883

1,739

asked Sep 23 at 23:35

0 votes

0 answers

49 views

The data type of the llava model uncontrollably changes to float32

I am using the llama-8b-llava model. I have made some modifications to the model, which are non-structural and do not introduce any parameters. During the model loading process, I used the torch....

ILOT

23

asked Aug 29 at 13:26

1 vote

1 answer

150 views

Import "llama_index.llms.ollama" could not be resolved

I have the following imports for a python file thats meant to be a multi llm agent soon. I wanted to use llama_index and I found a nice video from Tech with Tim which explains everything very well. I ...

Joshie

23

asked Aug 13 at 15:07

1 vote

0 answers

115 views

Fine-tuned LLaMA 2–7B with QLoRA, but reloading fails: missing 4bit metadata. Likely saved after LoRA+resize. Need proper 4bit save method

I’ve been working on fine-tuning LLaMA 2–7B using QLoRA with bitsandbytes 4-bit quantization and ran into a weird issue. I did adaptive pretraining on Arabic data with a custom tokenizer (vocab size ~...

orchid Ali

11

asked Jun 26 at 17:50

1 vote

0 answers

194 views

llama-cpp-python installing for x86_64 instead of arm64

I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...

Dennis Losett

11

asked Jun 22 at 16:45

2 votes

1 answer

181 views

Llama_cookbook: why are labels not shifted for CausalLM?

I'm studying the llama_cookbok repo, in particular their finetuning example. This example uses LlamaForCausalLM model and samsum_dataset (input: dialog, output: summary). Now, looking at how they ...

Dmitry

340

asked Jun 1 at 4:29

0 votes

0 answers

58 views

Using llama-index with the deployed LLM

I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...

Utkarsh

1

asked May 29 at 11:17

0 votes

0 answers

112 views

Why `mul_mat` in ggml slower than llama.cpp?

I use the following command to compile an executable file for Android: cmake \ -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -...

XUHAO77

11

asked May 13 at 6:49

1 vote

0 answers

161 views

How to implement timeout and retry for long-running Hugging Face model inference in Python?

I'm working with a locally hosted Hugging Face transformers model (mistral-7b, llama2-13b, etc.), using the pipeline interface on a GPU server (A100). Sometimes inference takes much longer than ...

Swati

224

asked Apr 27 at 2:15

2 votes

1 answer

87 views

How to re-use attention in huggingface

I have a long chunk of text that I need to process using a transformer, I would then like to have users ask different questions about it (all questions are independent, they don't relate to each other)...

Matt

45

asked Apr 25 at 20:08

1 vote

1 answer

230 views

No stopping token generated by Llama-3.2-1B-Instruct

I am experimenting with Llama-3.2-1B-Instruct for learning purposes. When I try to implement a simple re-write task with Hugging Face transformers, I get a weird result when the model does not ...

user2101255

11

asked Apr 21 at 3:10

1 vote

1 answer

79 views

How to incorporate additional data in fine tuning LLM

My goal is to create a chat bot specialized in answering questions related to diabetes. I am new to fine tuning and have a couple questions before I begin. My question is about the dataset format and ...

Shlok Kothari

21

asked Mar 26 at 19:57

1 vote

0 answers

89 views

Microsoft.Extensions.AI responses output JSON when custom functions are used

I'm using Microsoft.Extensions.AI to run queries against numerous Ollama models that I have installed. I have added a custom functions (AIFunction type) by creating a ChatOptions instance and passing ...

Dmitri Nesteruk

23.9k

asked Mar 17 at 6:48

0 votes

1 answer

213 views

Meta llama 3.2 3b model local download

I am installed the llamma3.2 model from meta directly and got it in this format -a---- 3/10/2025 3:22 PM 209 checklist.chk -a---- 3/10/2025 9:47 AM 6425585114 ...

Joe

23

asked Mar 13 at 7:21

0 votes

0 answers

39 views

Fine tuning LLama 3 8b stuck at save_step

I got an issue while training Llama-3-8B locally (on an RTX 3080 GPU). It gets stuck at any save_step.If I set save_step to 200, it gets stuck at 200/300.If I set save_step to 1, it gets stuck at 1/...

Line B Chubb

1

asked Mar 10 at 19:13

1 vote

1 answer

331 views

Why does my Llama 3.1 model act differently between AutoModelForCausalLM and LlamaForCausalLM?

I have one set of weights, one tokenizer, the same prompt, and identical generation parameters. Yet somehow, when I load the model using AutoModelForCausalLM, I get one output, and when I construct it ...

han mo

23

asked Mar 8 at 8:24

0 votes

0 answers

133 views

How do I fix TypeError from Llama api call

I am running below code from: Llama quick start import json from llamaapi import LlamaAPI # Initialize the SDK llama = LlamaAPI("<your_api_token>") # Build the API request ...

user1700890

7,824

asked Mar 4 at 12:55

0 votes

1 answer

230 views

Does streaming work for Llama models on OpenAI's python API?

I setup streaming responses on a client that uses OpenAI's API on python. It is working fine for ChatGPT models, but when I attempt to use a Llama model (llama3.1-8b) I am getting a valid streaming ...

El3ktra

49

asked Mar 3 at 3:29

0 votes

0 answers

212 views

How to fix "ERROR: Could not find a version that satisfies the requirement llama-hub==0.0.79.post1"?

I am trying to install a set of requirements to progress some development around building up an Agent capable of interacting with LLMs. I used the command pip3 install -r requirements.txt. ...

Andrea Mortini

1

asked Mar 2 at 22:41

0 votes

1 answer

77 views

Program runs only once even after using loop

def data_input(): # It gets the data that the llm need and the user wants to ask question related to this data def chat_with_data(data): messages = [ { "role": "system&...

Praveen Kumar

29

asked Mar 2 at 13:08

0 votes

1 answer

225 views

llama3 responding only function call?

I am trying to make Llama3 Instruct able to use function call from tools , it does work but now it is answering only function call! if I ask something like who are you ? or what is apple device ? it ...

Kodr.F

14.5k

asked Feb 17 at 10:35

-1 votes

2 answers

604 views

while pip install llama-cpp-python getting error on windows pc

Creating directory "llava_shared.dir\Release". Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...

sandeep

161

asked Feb 14 at 12:48

0 votes

0 answers

143 views

Llama-API stops working after second request

I am trying to generate data based on previous texts via LLMs, and therefore making consecutive calls for all texts I have available. I have done this so far with OpenAI's and Ollama's Python ...

padraig

31

asked Feb 12 at 14:35

0 votes

0 answers

955 views

groq.GroqError: The api_key client option must be set either by passing api_key to the client or by setting the GROQ_API_KEY environment variable

I have been trying to use a LLama API usig Groq cloud but encountering this error. I have tried set GROQ_API_KEY=api_key And $env:GROQ_API_KEY = "api_key" But havent found the any solution ...

electroboiii

1

asked Feb 9 at 16:21

-2 votes

2 answers

5k views

What can cause this error in LMStudio : '''Failed to send message vk::Queue::submit: ErrorDeviceLost'''

I will have that error in the following scenarios: once i ask a second question to a model without realoading it once i create a new chat with any downloaded without realoading the model once i try ...

Arnold Hge

7

asked Feb 1 at 1:56

0 votes

0 answers

70 views

RuntimeError with PyTorch when Fine-tuning LLM: "element 0 of tensors does not require grad"

I'm trying to fine-tune a LLaMA model using LoRA, but I'm getting the following error during training: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Code Here's ...

ErenalpCet

1

asked Jan 31 at 10:11

1 vote

2 answers

2k views

Langgraph tool calling - LM doesnt call provided tools

Currently im trying to learn how to develop an agentic AI based on langgraph academy video, while in the langgraph academy video is using openAI GPT, i decide to use llama3.2 3B as it is free. Below ...

KycdiA

121

asked Jan 31 at 9:07

0 votes

0 answers

100 views

Does the input dimension of llama have to be the same as the output dimension (Expected input batch_size to match target batch_size?

When I fine-tune Llama3.2(11B/8B) to generate the target text with self.maybe_autocast(): outputs = self.llama_model( input_ids=None, inputs_embeds=concat_inputs_embeds, ...

Magotsuge Shikaru

1

asked Jan 23 at 13:27

0 votes

0 answers

65 views

Trying to deploy Llama 3.2 model using Vertex AI Model Garden but I am not able to locate the URI containing Llama 3.2 pretrained and finetuned models

I am trying to follow this notebook to deploy llama 3.2 vision 11B model. In the before you begin step it's mentioned that in the Access Llama 3.2 models on Vertex AI for serving section, it's ...

Suvid Sahay

35

asked Jan 19 at 18:39

0 votes

0 answers

72 views

Llama 3: Getting a CUDA unknown error while fine tuning Llama 3 on wikitext

I am a beginner in Large Language Models and the Hugging Face API. I was trying to fine tune the Llama 3.1 8b model on the wikitext dataset as practice. When I try to run the following script, I get ...

Kartik Bali

31

asked Jan 19 at 12:40

0 votes

0 answers

99 views

Converting .gguf model to .pte results in an error

I am trying to convert https://huggingface.co/PrunaAI/Meta-Llama-3-8B-Instruct-GGUF-smashed/blob/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf to .pte format. The tool to do that is here - https://github....

Arjun

495

asked Jan 19 at 7:29

0 votes

0 answers

22 views

Issue with LlamaParse ...Page 1 [error] - FONT_ERROR : Fail to identify 120 glyphs on page 1 from font : Type3

As I load this pdf either through python library or llamacloud.ai frontend it outputs markdown doc of random characters. As I check out the History Log it shows to every page the same related error: ...

Paulo Britto

23

asked Jan 18 at 22:03

0 votes

0 answers

91 views

Removing non-English languages from Llama

I'm working with the meta-llama/Llama-3.2-1B model from Hugging Face Transformers and I only need it to support English. I was wondering if it's possible to remove all the other languages from this ...

A.A

4,339

asked Jan 18 at 21:31

0 votes

1 answer

557 views

Error calling the LLM using model API, using LlamaIndex

I am trying to build an agent that searches in a database, but when I call the query, o LlamaIndex is calling the OpenAI's API apparently, when it should to call the Groq API. Code: model = 'llama-3.3-...

Sergio Pantano

141

asked Jan 16 at 18:37

0 votes

1 answer

269 views

How do I setup a python code to access llama 3.3 model

I have installed llama 3.3 on remote GPU using these commands. curl -fsSL https://ollama.com/install.sh | sh ollama run llama3.3 I need to run the following code which uses the llama3.3 installed on ...

pkj

793

asked Jan 15 at 13:25

0 votes

0 answers

295 views

python3 ../llama.cpp/convert_hf_to_gguf.py : KeyError: 'architectures'

Objective: convert this pytorch model https://huggingface.co/mtspeech/MooER-MTL-80K to gguf model I downloaded the model : (.venv) raphy@raohy:~/whisper.cpp/models$ cat download-MooER-MTL-80K.py from ...

Raphael10

3,246

asked Jan 14 at 16:29

2 votes

0 answers

697 views

Cannot download Llama 3.2 3B model using Unsloth and Hugging Face

I want to locally fine-tune using my own dataset and then save the Llama 3.2-3B model locally too. I have an Anaconda setup and I'm on the base environment, where I can see clearly that unsloth and ...

Worker1432

161

asked Jan 11 at 19:16

0 votes

0 answers

74 views

How can HuggingFaceEndpoint instance not need a quantization config or tokenizer?

My original goal was to make a base chain class so I could further instantiate a chain with a LLM of my choice (e.g. gpt-4o-mini or meta-llama/Meta-Llama-3-8B etc). I've noticed that ...

user29109772

1

asked Jan 8 at 15:40

-1 votes

1 answer

1k views

While executing 'llama model list' in my python environment, I was getting "'ModuleNotFoundError: No module named 'termios'" error

I was trying to use the Llama 3.2 multimodal model, and on the Llama AI's websitetext, it told me to run 'llama models list' command in my environment.Then I got an error: (meteor_ai_1.0) $ llama ...

Mr. Dragonfly

5

asked Jan 7 at 0:54

2 votes

0 answers

410 views

Generating very long sequences with ollama

I'm trying to generate very long texts using ollama and python in a single run for research purposes but the generation stops with a stop status, despite everything I'm trying to increase the context ...

Nick Mikhailovsky

46

asked Jan 6 at 16:26

0 votes

1 answer

117 views

Data extraction from diagrams using Vision Language Model

looking for some ideas to accurately extract data flows from system context diagram. I've tried a number of models and prompt engineering techniques, but i'm still getting missing flows, and ...

yixiang

11

asked Jan 5 at 23:25

0 votes

0 answers

292 views

Working with Llama 3.2 Vision Multimodal for Object detection using roboflow

So I am working on a project where I am using a dataset downloaded from Roboflow in yolo format and then I am trying to pass the training folder to Llama 3.2 for supervised learning on that dataset ...

Rauhan Siddiqui

1

asked Dec 31, 2024 at 9:22

1 vote

0 answers

107 views

Finetuning with 8.1B Llama with LORA but responses are just repetitions of the inputs on inference

I'm finetuning a Llama 8.1B using LoRA with about a 1,000 samples for 3 epochs but after training (which takes about 3 hours) the model on inference just keeps repeating the input. I'm using the `{ ...

user62939

11

asked Dec 26, 2024 at 16:16

0 votes

0 answers

113 views

Efficiently Handling Large Datasets with Locally Hosted LLM (Ollama) and PostgreSQL

I am working with a locally hosted LLM (Ollama with Llama 3.1) to process queries based on a large dataset stored in a PostgreSQL database (~1 million rows). I am fetching data in chunks from the ...

Ajai k

1

asked Dec 26, 2024 at 9:52

0 votes

0 answers

98 views

Can LlamaParse return as a pydantic object?

I have been unable to make LlamaParse return the parsed content in a strictly structured format,like Pydantic objects. Otherwise I would have to make a separate api call just to make it the parsed ...

Faraz Fazal

29

asked Dec 26, 2024 at 1:57

0 votes

0 answers

25 views

TypeScript Error: 'quantized' property not recognized in Pipeline configuration when initializing LLaMA model with Transformers.js

I'm encountering a TypeScript error while trying to initialize a LLaMA model with quantization enabled using Transformers.js. The compiler is throwing an error indicating that the 'quantized' property ...

Drago

1

asked Dec 25, 2024 at 3:46

0 votes

0 answers

86 views

Create adapter for training llama model

I want to create small adapter for asking llama model about my abstract game, but when I query it afterward, it doesn't seem to know anything about game. What might be the problem? This is text of ...

user3338412

1

asked Dec 23, 2024 at 10:54

0 votes

0 answers

218 views

Finetuning LLaMa with Lora - bf16 errors on A100 GPU on Colab

I am attempting to fine-tune Llama3.2-1b model from huggingface on colab using A100 gpu, following this guide. My understanding is that the A100 gpu supports bf16, but when attempting to finetune, it ...

d3_zander

1

asked Dec 21, 2024 at 11:51

Collectives™ on Stack Overflow