Unable for sending multiple input using Llama CPP and Llama-index

Question

I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me this errors: **GGML_ASSERT: D:\a\llama-cpp-python\llama-cpp-python\vendor\llama.cpp\ggml-backend.c:314: ggml_are_same_layout(src, dst) && "cannot copy tensors with different layouts"**

I have tried to use the code to check, it return that the layout is same

def same_layout(tensor1, tensor2):
  return tensor1.flags.f_contiguous == tensor2.flags.f_contiguous
     and tensor1.flags.c_contiguous == tensor2.flags.c_contiguous

tensor_a = np.random.rand(3, 4) # Creating a tensor
tensor_b = np.random.rand(3, 4) # Creating another tensor
print(same_layout(tensor_a, tensor_b))

and this is how i load for my model

llm = LlamaCPP(
#model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf',
model_path="C:/Users/ASUS608/AppData/Local/llama_index/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
temperature=0.3,
max_new_tokens=512,
context_window=4096,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 25},
messages_to_prompt=messages_to_prompt,
#completion_to_prompt=completion_to_prompt,
verbose=True,
)

What happen?

*update, and the next error is **GGML_ASSERT: D:\a\llama-cpp-python\llama-cpp-python\vendor\llama.cpp\ggml-cuda.cu:352: ptr == (void *) (pool_addr + pool_used)**

Raj Singh Parihar · Accepted Answer · 2024-06-12 19:45:22Z

0

This error suggests that your system is possibly out of memory. I tried to use a 30b model once on my m1 pro macbook with 16GB RAM, llama cpp raised the same error.

Even though you're able to load the model, while inference, if it recieves two or more queries, it is unable to handle those requests because there is a memory limit error.

Another thing could be that a single instance of an LLM cannot handle multiple requests. So create a queue for your requests and process them one by one.

answered Jun 12, 2024 at 19:45

Raj Singh Parihar

113 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unable for sending multiple input using Llama CPP and Llama-index

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related