Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
98 views

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
cuneyttyler's user avatar
  • 1,395
1 vote
1 answer
2k views

I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...
Quinten's user avatar
  • 42.8k
0 votes
0 answers
159 views

The complete codes and data are available at:Google Disk I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
氢氰酸's user avatar
0 votes
0 answers
239 views

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
Promit Dey Sarker Arjan's user avatar
1 vote
1 answer
122 views

In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
Blockchain Kid's user avatar
1 vote
1 answer
198 views

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...
6zL's user avatar
  • 21
0 votes
1 answer
44 views

I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation So to check what means equivariant and permutation invariant I wrote the following code import torch import torch....
fenaux's user avatar
  • 47
0 votes
0 answers
41 views

I'm using the Temporal Fusion Transformer (TFT) to train on time series data, aiming to make real-time forecasts for a specific process unit at any point in time during operation. However, for ...
YoungJoo Park's user avatar
0 votes
0 answers
67 views

In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers: import torch import torch.nn as nn class ...
Yilmaz's user avatar
  • 51k
0 votes
0 answers
83 views

I have this code in transformer model: keys = x @ W_key queries = x @ W_query values = x @ W_value attention_scores = queries @ keys.T # keys.shape[-1]**0.5: used to scale the attention scores before ...
Yilmaz's user avatar
  • 51k
0 votes
0 answers
98 views

I am trying to fine-tune a transformer/encoder based pose estimation model available here at: https://huggingface.co/docs/transformers/en/model_doc/vitpose When passing "labels" attribute to ...
Soham Bhaumik's user avatar
2 votes
0 answers
60 views

I am trying to understand the code for temporal embedding inside autoformer implementation using pytorch. https://github.com/thuml/Autoformer/blob/main/layers/Embed.py class TemporalEmbedding(nn....
prem's user avatar
  • 449
2 votes
1 answer
86 views

The problem The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat). Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>) ...
Yousef's user avatar
  • 51
2 votes
1 answer
193 views

I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, ...
Thanasis Mpoulionis's user avatar
0 votes
0 answers
88 views

I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...
Kabir Juneja's user avatar
0 votes
2 answers
224 views

I'm trying to fine-tune a model using SFTTrainer from trl. This is how my SFTConfig arguments look like, from trl import SFTConfig training_arguments = SFTConfig( output_dir=output_dir, ...
sabira kabeer's user avatar
0 votes
0 answers
44 views

I am working on a project to pre-train a custom transformer model I developed and then fine-tune it for a downstream task. I am pre-training the model on an H100 cluster and this is working great. ...
Martin Weiss's user avatar
1 vote
1 answer
71 views

When I want to accelerate the model training by using deepspeed, a problem occured when I want to evaluate the model on validation dataset. Here is the problem code snippet: def evaluate(self, ...
external 's user avatar
0 votes
0 answers
173 views

I am trying to use the bitsandbytes library for 4-bit quantization in my model loading function, but I keep encountering an ImportError. The error message says, "Using bitsandbytes 4-bit ...
from's user avatar
  • 1
0 votes
1 answer
36 views

I am currently trying to implement the attention layer from the transformer architecture but it is not working as I expect. I have been unable to figure out what the problem is for several days now. ...
RB2k's user avatar
  • 1
0 votes
0 answers
35 views

When I fine-tuned the model Hubert to detect phoneme, I chose a fine-tuned ASR Hubert model and I removed the last two layers and added a linear layer to the config vocab_size of phoneme. What is ...
Ngoc Anh's user avatar
0 votes
1 answer
58 views

Im currently investigating the effekt of masking attention Scores on MultiHeadAttention Layers in a Transformer model for classification of time series data. I have build a model that accepts a time ...
Henning's user avatar
  • 31
0 votes
1 answer
194 views

In the appendix B of the PaLM paper (https://arxiv.org/pdf/2204.02311) it describes a metric called "Model Flops Utilization (MFU)" and the formula for estimating it. It's computation makes ...
cangozpi's user avatar
  • 159
0 votes
0 answers
48 views

In this neural network structure, I want the model to do the train and validation without using the historical target values and to make the prediction directly through the covariates, so I set these ...
zzhuqshun's user avatar
2 votes
1 answer
499 views

General question (hopefully useful for people coming from google): What to do when the gradient explodes? When working with transformers and deep NNs (with PyTorch), do you have a mental checklist of ...
Nicholas Kryger-Nelson's user avatar
0 votes
0 answers
63 views

I am trying a semantic segmentation task using Segformer model with pretrained model 'mit_b3_cityscapes_1024' . encoder = keras_hub.models.MiTBackbone.from_preset( "mit_b3_cityscapes_1024&...
masume keshavarzi's user avatar
2 votes
0 answers
319 views

Problem distil-large-v3#sequential-long-form I'm using distil-whisper through the 🤗 Transformers pipeline for speech recognition. When setting return_timestamps=True, the timestamps reset to 0 every ...
Martin Zhu's user avatar
0 votes
0 answers
30 views

I'm building an app using Qwik, TypeScript, and Konva, and I'm trying to implement a transformer that allows users to click on a shape and resize or transform it. My goal is to create and access the ...
olu's user avatar
  • 123
0 votes
0 answers
12 views

I am working on an end-to-end (E2E) project for websites that involves: Capturing Tight Screenshots of Data Tables: The project automatically detects and takes precise screenshots of all the data ...
Michael Dzwinel's user avatar
1 vote
1 answer
149 views

I'be been learning workings of an Vision transformer, I couldn't get it to run at first(building the ViT from scratch). But somehow I managed to scramble up a code that shows very low accuracy(3%). ...
kiNo's user avatar
  • 11
1 vote
1 answer
469 views

I have been trying to run TFBertModel from Transformers, but it kept on throwing me this error ValueError Traceback (most recent call last) Cell In[9], line 1 ----> 1 ...
Faiz khan's user avatar
0 votes
1 answer
39 views

I just discovered that the implementation of listener with IAnnotationTransformer annotation is executed as soon as the tests are launched even before the very first test is executed. Background: I ...
S P's user avatar
  • 69
0 votes
1 answer
711 views

I’m working on an audio recognition task using a Transformer-based model in PyTorch. My input features are generated by a CNN-based embedding layer and have the shape [batch_size, d_model, n_token], ...
MuxAte's user avatar
  • 43
1 vote
0 answers
79 views

I have the profiling results of the inference of Llama 3.1 8b model by Meta. I deployed the model on the AI Accelerator. I managed to create a memory trace of the whole model from the Host to the ...
Sudais Alam's user avatar
0 votes
1 answer
58 views

I'm reading a file using a sequential file in Datastage and I'm doing some transformation in the data using a transformer, I want to compare the current row with the previous row, to check a value of ...
Chaimaa Emily's user avatar
0 votes
1 answer
164 views

I found Scaled_dot_product_attention cost much more memory when head number is large(>=16). This is my code to reproduce the issue. import torch length = 10000 dim = 64 head_num1 = 8 head_num2 = ...
Kerry Zhu's user avatar
1 vote
0 answers
72 views

I was distilling my student model (base model t5-small) based on a fine-tuned T5-xxl. Here is the config student_model = AutoModelForSeq2SeqLM.from_pretrained( args.student_model_name_or_path, ...
user28369747's user avatar
0 votes
1 answer
36 views

I am trying to budget for setting up a llm based RAG application which will serve users with dynamic size(Anything from 100 to 2000). I am able to figure out the GPU requirement to host a certain llm[...
Bing's user avatar
  • 631
0 votes
0 answers
61 views

I am wondering whether there is a way to extract a Swin-VIT backbone similar to resnet ? I am attempting to train a few self-supervised learning algorithms, where I need to get just the backbone (...
imantha's user avatar
  • 3,880
1 vote
0 answers
35 views

I am trying to implement write a simple quantized tensor linear multiplication. Assuming the weight matrix w3 of shape (14336, 4096) and the input tensor x of shape (2, 512, 4096) where first dim is ...
hafezmg48's user avatar
0 votes
1 answer
195 views

I downloaded packages from https://github.com/tencent-ailab/IP-Adapter run the commands to train an IP-Adapter plus model (input: text + image, output: image): accelerate launch --num_processes 2 --...
weiming's user avatar
  • 29
0 votes
1 answer
105 views

I am trying to do some strucutured text extraction using some kv caching tricks. For this example I will use the following model and data: model_name = "Qwen/Qwen2.5-0.5B-Instruct" model = ...
sachinruk's user avatar
  • 10k
0 votes
0 answers
232 views

I'm trying to implement a sinusoidal positional encoding. I found two solutions that give different encodings. I am wondering if one of them is wrong or both are correct. I showcase visual figures of ...
Janikas's user avatar
  • 487
0 votes
1 answer
99 views

I am trying to develop a transformer sequence to vector model but encounter performance issues. I am working with a Tesla V100-PCIE-16GB. Whenever the model encounters an unseen sequence length, the (...
D. E.'s user avatar
  • 1
0 votes
2 answers
248 views

I have just noticed that the token/sentence embeddings trained from Transformer-based model will have strong anisotropy problem which means most of the embeddings are close to each other in the vector ...
yuu Mu's user avatar
  • 1
0 votes
1 answer
269 views

This multihead self attention code causes the training loss and validation loss to become NaN, but when I remove this part, everything goes back to normal. I know that when the training loss and ...
Fuji's user avatar
  • 117
3 votes
1 answer
519 views

I've been trying to look at the attention scores of a pretrained transformer when I pass specific data in. It's specifically a Pytorch Transformer. I've tried using forward hooks, but I'm only able to ...
Thomas's user avatar
  • 31
1 vote
1 answer
8k views

I've been using LLAMA 2 for research for a few months now and I import as follows: from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda") tokenizer = ...
lucasa.lisboa's user avatar
0 votes
1 answer
120 views

I am trying to run an inference workflow of a Llama model in compile mode using transformers.pipeline(). I am using the following line of codes to run the inference workflow in compile mode: model = ...
Arunima Ghosh's user avatar
0 votes
1 answer
159 views

I am trying to understand SegFormer model, and would like to use encoder and decoder separately with different models. I have tried looking into official implementation which is based on mmseg and ...
Deep's user avatar
  • 646

1
2 3 4 5
22