Expected scalar type Long but found Int error while using tune on llama 3

Ask Question

Asked 1 year, 6 months ago

Modified 1 year, 6 months ago

Viewed 50 times

When im trying to use Llama3-8B tune guide from :

https://pytorch.org/torchtune/0.1/tutorials/llama3.html

it gave me this error :

W0608 08:41:38.766000 10904 torch\distributed\elastic\multiprocessing\redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:torchtune.utils.logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  checkpoint_dir: D:\Hugging_Tune_Model\llama\original
  checkpoint_files:
  - consolidated.00.pth
  model_type: LLAMA3
  output_dir: D:\Hugging_Tune_Model\llama\original
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_cleaned_dataset
  train_on_input: true
device: cpu
dtype: bf16
enable_activation_checkpointing: true
epochs: 1
gradient_accumulation_steps: 64
log_every_n_steps: null
loss:
  _component_: torch.nn.CrossEntropyLoss
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.utils.metric_logging.DiskLogger
  log_dir: /tmp/lora_finetune_output
model:
  _component_: torchtune.models.llama3.lora_llama3_8b
  apply_lora_to_mlp: false
  apply_lora_to_output: false
  lora_alpha: 16
  lora_attn_modules:
  - q_proj
  - v_proj
  lora_rank: 8
optimizer:
  _component_: torch.optim.AdamW
  lr: 0.0003
  weight_decay: 0.01
output_dir: /tmp/lora_finetune_output
profiler:
  _component_: torchtune.utils.profiler
  enabled: false
resume_from_checkpoint: false
seed: null
shuffle: true
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: D:\Hugging_Tune_Model\llama\original/tokenizer.model

DEBUG:torchtune.utils.logging:Setting manual seed to local seed 3148683848. Local seed is seed + rank = 3148683848 + 0
Writing logs to \tmp\lora_finetune_output\log_1717823498.txt
INFO:torchtune.utils.logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils.logging:Tokenizer is initialized from file.
INFO:torchtune.utils.logging:Optimizer and loss are initialized.
INFO:torchtune.utils.logging:Loss is initialized.
Downloading readme: 100%|█████████████████████████████████████████████████████████████████| 11.6k/11.6k [00:00<?, ?B/s]
Downloading data: 100%|███████████████████████████████████████████████████████████| 44.3M/44.3M [00:08<00:00, 5.16MB/s]
Generating train split: 100%|██████████████████████████████████████████| 51760/51760 [00:00<00:00, 62663.60 examples/s]
INFO:torchtune.utils.logging:Dataset and Sampler are initialized.
INFO:torchtune.utils.logging:Learning rate scheduler is initialized.
  0%|                                                                                        | 0/25880 [19:50<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Scripts\tune.exe\__main__.py", line 7, in <module>
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchtune\_cli\tune.py", line 49, in main
    parser.run(args)
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchtune\_cli\tune.py", line 43, in run
    args.func(args)
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchtune\_cli\run.py", line 179, in _run_cmd
    self._run_single_device(args)
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchtune\_cli\run.py", line 93, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 286, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\recipes\lora_finetune_single_device.py", line 510, in <module>
    sys.exit(recipe_main())
             ^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchtune\config\_parse.py", line 50, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\recipes\lora_finetune_single_device.py", line 505, in recipe_main
    recipe.train()
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\recipes\lora_finetune_single_device.py", line 453, in train
    loss = self._loss_fn(logits, labels)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\loss.py", line 1185, in forward
    return F.cross_entropy(input, target, weight=self.weight,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\por\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\functional.py", line 3086, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected scalar type Long but found Int

these are the steps that i followed from tutorial :

1.pip install torch
2.pip install tune
3.tune download meta-llama/Meta-Llama-3-8B --output-dir D:\Hugging_Tune_Model\llama --hf-token XXXXXX
4.tune run lora_finetune_single_device --config llama3/8B_lora_single_device checkpointer.checkpoint_dir=D:\Hugging_Tune_Model\llama\original tokenizer.path=D:\Hugging_Tune_Model\llama\original/tokenizer.model checkpointer.output_dir=D:\Hugging_Tune_Model\llama\original device="c

ive tried the tutorial from pytorch site. also there where a guide in youtube with this link : https://youtu.be/7euBTCT0S2Q and i followed exactly the same steps but couldnt the pass the tune run process

asked Jun 8, 2024 at 6:45

graph User

454 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Expected scalar type Long but found Int error while using tune on llama 3

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest