Is there any method to fully load the GGUF models on GPU

Question

I have been using LlamaCPP to load my llm models, the llama-index library provides methods to offload some layers onto the GPU. Why does it not provide any methods to fully load the model on GPU. If there is some method Please help.

LlamaCPP method

Here we have the option to offload some layers on GPU but I want to fully load the model on GPU.

Praveen Kumar · Accepted Answer · 2024-05-21 05:19:27Z

2

Try setting n_gpu_layers to -1.

Check Official Documentation Here

answered May 21, 2024 at 5:19

Praveen Kumar

312 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Is there any method to fully load the GGUF models on GPU

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related