I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me this errors:
**GGML_ASSERT: D:\a\llama-cpp-python\llama-cpp-python\vendor\llama.cpp\ggml-backend.c:314: ggml_are_same_layout(src, dst) && "cannot copy tensors with different layouts"**
I have tried to use the code to check, it return that the layout is same
def same_layout(tensor1, tensor2):
return tensor1.flags.f_contiguous == tensor2.flags.f_contiguous
and tensor1.flags.c_contiguous == tensor2.flags.c_contiguous
tensor_a = np.random.rand(3, 4) # Creating a tensor
tensor_b = np.random.rand(3, 4) # Creating another tensor
print(same_layout(tensor_a, tensor_b))
and this is how i load for my model
llm = LlamaCPP(
#model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf',
model_path="C:/Users/ASUS608/AppData/Local/llama_index/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
temperature=0.3,
max_new_tokens=512,
context_window=4096,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 25},
messages_to_prompt=messages_to_prompt,
#completion_to_prompt=completion_to_prompt,
verbose=True,
)
What happen?
*update, and the next error is
**GGML_ASSERT: D:\a\llama-cpp-python\llama-cpp-python\vendor\llama.cpp\ggml-cuda.cu:352: ptr == (void *) (pool_addr + pool_used)**