I have been using LlamaCPP to load my llm models, the llama-index library provides methods to offload some layers onto the GPU. Why does it not provide any methods to fully load the model on GPU. If there is some method Please help.
Here we have the option to offload some layers on GPU but I want to fully load the model on GPU.