4

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container.

Following this repo for installation of llama_cpp_python==0.2.6.

DOCKERFILE

# Use the official Python image for Python 3.11
FROM python:3.11

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# ARG FORCE_CMAKE=1

# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"


# Install project dependencies

 RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt

# Command to run the server
CMD ["python", "./server.py"]
Run cmd:
docker build -t llm_server ./llm
docker run -it -p 2023:2023 --gpus all llm_server

Problem: For some reason, the env variables in the llama cpp docs do not work as expected in a docker container.

Current behaviour: BLAS= 0 (llm using CPU) llm initialization

Expected behaviour: BLAS= 1 (llm using GPU)

nvidia-smi output inside container:

# nvidia-smi
Thu Nov 23 05:48:30 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01              Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti     On  | 00000000:01:00.0  On |                  N/A |
| N/A   48C    P8               4W /  80W |   1257MiB /  6144MiB |      7%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                 N/A      |
|    0   N/A  N/A        20      G   /Xwayland                                 N/A      |
|    0   N/A  N/A       392      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+
#
  • Tried setting the env variables in the following ways inside dockerfile.
# ARG FORCE_CMAKE=1

# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# ENV FORCE_CMAKE=1

# ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"

# Install project dependencies

 RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt```
  • Tried re-installing llama-cpp-python again inside a running container using env variables.
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Update: This docker file works thanks to the person who answered.

FROM nvidia/cuda:11.7.1-devel-ubuntu22.04

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install Python and pip
RUN apt-get update && apt-get install -y python3 python3-pip

# Set environment variable
ENV CMAKE_ARGS="-DLLAMA_CUBLAS=ON"

# Install Python dependencies
RUN pip install --no-cache-dir --upgrade pip && \
    pip install -r requirements.txt --no-cache-dir

# Command to run the server
CMD ["python3", "./server.py"]

1 Answer 1

3

On Windows I use this image:

FROM nvidia/cuda:11.7.1-devel-ubuntu22.04

And this is how I set the necessary vars before install.

ENV CMAKE_ARGS="-DLLAMA_CUBLAS=ON"
RUN  pip install llama-cpp-python

Works for me. Again, on Windows with Docker Desktop!

Sign up to request clarification or add additional context in comments.

1 Comment

Oh and by the way, I don't even build it. I just install the version built with GPU support!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.