My goal is to create a chat bot specialized in answering questions related to diabetes.
I am new to fine tuning and have a couple questions before I begin. My question is about the dataset format and the underlying model I should use.
I want to fine tune the LLM on the following dataset - https://huggingface.co/datasets/passionMan/diabetes_instruct_v7 I am thinking of using the Alpaca format - make a prompt with ##[Instruction] ##[Input] ##[Output] - (https://huggingface.co/datasets/yahma/alpaca-cleaned). However, I want to use the rationale and explanation column from the database. How should I incorporate it? If not rationale, I want to incorporate explanation column while fine tuning.
I plan to use the base model rather than the instruct model. Will that be the right choice?
Seeking guidance, Thanks!