0

I am using Langchain with llama-2-13B. I have set up the llama2 on an AWS machine with 240GB RAM and 4x16GB Tesla V100 GPUs. It takes around 20s to make an inference. I want to make it faster, reaching around 8-10s, to make it real-time. And the output is very poor. If I ask a query, "Hi, How are you?" It will generate a 500-word paragraph. How can I improve the output results? I am currently using this configuration:

LlamaCpp(model_path= path,
                temperature=0.7,
                max_tokens=800,
                top_p=0.1,
                top_k=40,
                n_threads=4,
                callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
                verbose=True,
                n_ctx=2000,
                n_gpu_layers=80,
                n_batch=2048)

1 Answer 1

1

I would start by using the llama-2-13B-**chat**** instead of llama-2-13B.

Chat models are optimized for dialogue use cases, and the ones without the chat suffix are trained for predicting the next token. Hence, by generating a 500-word paragraph your model is doing exactly what it's supposed to do.

Also, prompting is essential for LLaMa models. You can use Beginning of Sequence (BOS) and End of Sequence (EOS) tokens, which would look something like this:

template = """
    [INST] <<SYS>>
    You are a helpful, respectful and honest assistant. 
    Always answer as helpfully as possible, while being safe.  
    Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. 
    Please ensure that your responses are socially unbiased and positive in nature.
    If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. 
    If you don't know the answer to a question, please don't share false information.
    <</SYS>>
    {INSERT_PROMPT_HERE} [/INST]
    """

prompt = 'Your actual question to the model'
prompt = template.replace('INSERT_PROMPT_HERE', prompt)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.