1

I have been using LlamaCPP to load my llm models, the llama-index library provides methods to offload some layers onto the GPU. Why does it not provide any methods to fully load the model on GPU. If there is some method Please help.

LlamaCPP method

Here we have the option to offload some layers on GPU but I want to fully load the model on GPU.

1 Answer 1

2

Try setting n_gpu_layers to -1.

Check Official Documentation Here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.