I am teaching myself LLM programming by developing a RAG application. I am running Llama 3.2 on my laptop using Ollama, and using a mix of SQLite and langchain.
I can pass a context to the llm along with my question so the model uses the context for generating an answer. The code is like so
const model = 'llama3.2:latest';
const temperature = 0;
const llm = new ChatOllama({ model, temperature });
const messages = await promptTemplate.invoke({ question, context });
const answer = await llm.invoke(messages);
But, nowhere in Ollama's api docs do I see the ability to pass a context to the model. The only "context" that the docs refer to is context (deprecated): the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory. This seems very different from the way I understand "context" as I noted above.
If I were to query the model via Ollama's curl api, how would I structure my query to get a contextual response from the model?
const prompt = "You are a helpful assistant.\n\nHere is some background information:\n" + context + "\n\nQuestion:\n" + question + "\n\nAnswer:\n";and thenconst promptTemplate = ChatPromptTemplate.fromMessages([ ['system', 'You are a helpful assistant.'], ['human', 'Here is some background information:\n' + context + '\n\nQuestion:\n' + question], ]);langchain. But, nowhere in the api docs do I see examples of how to pass a context. I can do that via langchain as I showed above in my code, but I want to understand how langchain communicates with the modelchatendpoint (notgenerate), and include previous responses as the context.