Newest 'pytorch' Questions

-2 votes

0 answers

26 views

How to efficiently train a CNN based CV model? [closed]

I would say im intermediatly experienced in Deep Learning and computer vision. However i have a task to mask certain textured objects for instance segmentation. Im not sure how to train a better model....

Talha Aydın

1

asked 23 hours ago

Advice

0 votes

0 replies

16 views

HRNet trained from scratch on MPII high PCKh but poor visual predictions

I'm training an HRNet model from scratch on the MPII dataset for human pose estimation. The model succeded to get high accuracy around 0.93 [email protected] however the visual results are not really promising....

msakni22

11

asked 2 days ago

1 vote

0 answers

17 views

instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background)

I am running a training of instance segmentation on custom coco dataset using pytorch maskrcnn + fpn for 83 categories (+background). What is the problem with following setup and why RPN head not ...

SavEng

11

asked 2 days ago

0 votes

1 answer

32 views

Installing Python with PyTorch - PowerShell not recognizing Torch package

I have a problem where I am trying to use PyCharm for PyTorch. I have installed Python separately (quite a task as it tried to install it in the Microsoft/AppData folder?). In PyCharm, I have to first ...

alexanderjansma

55

asked Dec 7 at 15:43

0 votes

0 answers

14 views

How do I implement mult-task fixed Gaussian noise in GPyTorch?

I'm attempting to perform multi-task Gaussian process regression using GPyTorch. I have, for each of N training examples, its corresponding (T x T) cross-task noise covariance matrix. I aim to ...

SirAndy3000

1

asked Dec 2 at 19:03

0 votes

0 answers

61 views

Performance problem with automatic parallelization in `pytorch`

I'm having problems with python code that uses pytorch. The details are a bit complicated (the code is part of a quantum mechanical calculation) but the code structure is very straightforward and ...

kacper

77

asked Dec 2 at 10:24

0 votes

1 answer

232 views

PyTorch not recognizing RTX 5090 (sm_120) on Windows 11 – CUDA error: no kernel image available

I'm trying to use PyTorch with an NVIDIA GeForce RTX 5090 (Blackwell architecture, CUDA Compute Capability sm_120) on Windows 11, and I keep running into compatibility issues. PyTorch detects CUDA, ...

sajjadesmaili

41

asked Nov 30 at 14:43

0 votes

0 answers

61 views

How do I visualize the latent representation produced by the Stable Diffusion VAE?

I am trying to visualize the latent representation produced by the VAE inside a Stable Diffusion pipeline from diffusers import StableDiffusionPipeline import torch # A CUDA ordinal is simply the ...

Yilmaz

51k

asked Nov 30 at 2:36

0 votes

0 answers

33 views

AWS SageMaker PyTorch Model Deployment - is entry_point needed?

I'm trying to deploy a pre-trained PyTorch model to SageMaker using the Python SDK. I have a model.tar.gz file that is uploaded to S3, with the following structure: code/ code/requirements.txt code/...

RefresherM

1

asked Nov 28 at 15:14

Tooling

0 votes

0 replies

56 views

Good packages for bounded Linear Quantile Regression?

I'm looking for a good package to train a linear quantile regression model, i.e. $\hat y = \sum_{i=1}^n w_i \cdot X_i$. With $x_i$ are the input features, and $w_i$ are the bounded trainable weights. ...

student13

13

asked Nov 28 at 14:50

0 votes

0 answers

33 views

Attribution Error when using Huggingface transformers Trainer with FSDP

I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like train_dataset = Mydataset(...) args = TrainingArguments(...) model = LlamaForCausalLM....

xuehao-049

11

asked Nov 28 at 4:11

2 votes

1 answer

90 views

Having trouble with R's torch and tensor dimensions

I am trying to follow along with this webpage: https://jtr13.github.io/cc21fall2/tutorial-on-r-torch-package.html I am trying to understand R's implementation of PyTorch. I am having some trouble with ...

Huy Pham

173

asked Nov 27 at 9:28

0 votes

0 answers

57 views

How to force NCCL build to embed PTX for all kernels (prevent linker from stripping ncclDevKernel PTX)?

I am compiling NCCL 2.27.5-1 (I tried also 2.28.9-1) from source for a V100 GPU (sm_70). My goal is to have libnccl.so contain compute_70 PTX for every kernel. Despite passing explicit -gencode=arch=...

CiZ

9

asked Nov 26 at 17:05

1 vote

0 answers

113 views

PyTorch installed via uv project shows CPU-only version on Windows with CUDA specification in pyproject.toml

I'm trying to set up a Python project using uv and pyproject.toml on Windows. I want to install the CUDA-enabled PyTorch, but after installing, when I check the version, it shows CPU-only. Here’s my ...

wonone11

11

asked Nov 25 at 9:01

Advice

0 votes

0 replies

30 views

When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?

With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...

Bejo

13

asked Nov 25 at 6:43

1 vote

2 answers

130 views

pytorch Module B=A, A.to('cpu'), but the tensor in B is still in GPU, why?

After converting module A to CPU, the origin parameter tensor still stays on the GPU? When it is released? Is it wrong if I reuse the parameter? My code: import torch.nn as nn class A(nn.Module): ...

jiwei zhang

11

asked Nov 21 at 10:11

2 votes

1 answer

28 views

PyTorch .view() operation to manipulate tensor dimensions vis a vis using torch.unbind followed by torch.cat

In Torch, .view() reshapes the tensor. However, there are multiple ways to reshape a multi-dimensional tensor to a target shape. How does it decide between those different ways? For example, in Torch, ...

Sanchit

21

asked Nov 20 at 21:47

2 votes

1 answer

1k views

PyTorch fails on Windows Server 2019: “Error loading c10.dll” (works fine on Windows 10)

I'm trying to deploy a Python project on Windows Server 2019, but PyTorch fails to import with a DLL loading error. On my local machine (Windows 10, same Python version), everything works perfectly. ...

Rael Clariana

21

asked Nov 20 at 17:59

1 vote

1 answer

61 views

.so file built on same CPU but different EC2 instances lead to missing symbols

I am building a wheel of PyTorch from source, based on their https://github.com/pytorch/pytorch/blob/v2.6.0/.ci/manywheel/build_common.sh CI build script. I tested on a "local" instance of a ...

Corneau

193

asked Nov 18 at 21:40

Advice

0 votes

2 replies

48 views

Fixing a UNET in pytorch that doesn't work in eval mode due to BatchNorm2d layers

I have a UNET model trained in pytorch (by someone else) that produces quite different results in eval mode to train mode (train mode results look good, eval mode they are rubbish). A bit of googling ...

user18504955

11

asked Nov 17 at 11:26

0 votes

0 answers

55 views

Given groups=1, weight of size [64, 1024, 1, 1], expected input[1, 256, 1, 1] to have 1024 channels, but got 256 channels instead

I have encountered this issue and I searched on the forums but I couldnt solve it. How can I solve this problem ? I tried to add CBAM module in yolov12 for my custom dataset to improve accuracy. I ...

partizal

33

asked Nov 17 at 11:22

0 votes

0 answers

103 views

My SimSiam is collapsing- SimSiam on CUB-200-2011 with ViT

I'm trying to implement SimSiam using a ViT backbone on the CUB-200-2011 dataset. However, during training, the embeddings collapse to a single direction despite using stop-gradient. Here’s what I ...

p10

33

asked Nov 15 at 13:48

-1 votes

0 answers

25 views

How to use the models from huggingface from local machine server

I am trying to use the following model Emotion Llama and try to understand how to download the models and place them in the right dir from huggingface. It actually suggests to donwload three models in ...

Jose Ramon

5,374

asked Nov 11 at 20:05

1 vote

1 answer

76 views

Is passing ray resources as options when calling the function equivalent to setting them in the function's decorator?

Is @ray.remote def run_experiment(...): (...) if __name__ == '__main__': ray.init() exp_config = sys.argv[1] params_tuples, num_cpus, num_gpus = load_exp_config(exp_config) ray.get(...

Blupon

1,091

asked Nov 10 at 14:51

0 votes

0 answers

48 views

Unclear formulation in Temporal Fusion Transformer paper

I am currently trying to implement the Temporal Fusion Transformer using PyTorch. This paper (https://arxiv.org/pdf/1912.09363) is my reference. Currently I am stuck with the variable selection ...

Haifischbecken

181

asked Nov 10 at 13:33

0 votes

0 answers

31 views

Where is EXECUTORCH_LIBRARY defined in ExecuTorch v1.0?

I’m trying to register a custom operator for ExecuTorch (v1.0, built from the PyTorch 2.5 source tree). My goal is to create a shared library that defines a few quantum operators and runs them from a ....

Melvin

1

asked Nov 10 at 4:51

0 votes

0 answers

49 views

Torch 2.4.1 doesn't utilize my system memory after CUDA memory runs out

I wrote a lot of scripts to test the compatibility of my system with PyTorch 2.4.1, and they all indicate I can run it. I don't have enough memory on my GPU, so I tried enabling expandable_segments so ...

N3.2's Channel

11

asked Nov 10 at 4:17

1 vote

1 answer

150 views

How to configure uv via pyproject.toml to lock PyTorch (+cu118) to a custom index and prevent uv run from using the CPU-only version?

I am managing a project with uv (v0.9.4) that requires a specific PyTorch CUDA build. The generic installation works, but using uv run causes a package conflict, despite the environment being correct. ...

ATILADE OKE

11

asked Nov 9 at 11:22

0 votes

0 answers

82 views

IndexError: index -1 is out of bounds for dimension 0 with size 0

I am currently experimenting with modifying the KV cache of the LLaVA model in order to perform controlled interventions during generation (similar to cache-steering methods in recent research). The ...

Pulkit Mittal

25

asked Nov 7 at 7:41

0 votes

1 answer

35 views

How can I get torch.set_grad_enabled(True) to work in ComfyUI?

I just spent hours figuring out that the following code fails when included in a ComfyUI custom node, but works perfectly fine outside (using the same Python venv). I finally found out that someone ...

user2845840

396

asked Nov 5 at 22:38

0 votes

1 answer

80 views

Unable to step into torch.nn.functional.linear using VS Code debugging

I want to step into the linear function using VS Code's step-in , but it skips automatically when I click "step into". Could anyone help me with this? I used DEBUG=1 when compiling PyTorch. ...

Shui_

33

asked Nov 5 at 13:20

1 vote

0 answers

68 views

Should I use torch.inference_mode() in a prediction method even when using model.eval()? [duplicate]

I'm following the book "Deep Learning with PyTorch Step By Step" and I have a question about the predict method in the StepByStep class (from this repository: GitHub). The current ...

Matteo

93

asked Nov 4 at 12:43

1 vote

0 answers

186 views

Transformers 'could not import module pipeline' to jupyter notebook

I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code from ...

Alex Colville

11

asked Nov 4 at 9:16

Advice

2 votes

0 replies

89 views

How should I balance DSA, ML fundamentals, PyTorch implementation, and Kaggle practice for ML Engineer interviews?

I’m a Computer Science graduate preparing for ML/AI Engineer roles. I’m facing a dilemma about what to focus on, how much to allocate time to each area, and what exact roadmap to follow to prepare ...

syntaxprnv

11

asked Oct 31 at 19:35

2 votes

0 answers

117 views

I get the error " ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory " when i try to use torch in virtual env

I have installed Cuda 13 on fedora 42 . When i use pytorch localy, torch works fine, but when i creat a virtualenv my pytorch cant find the ibcudnn files. I get the error ImportError: libcudnn.so.9: ...

TR SIXtree

29

asked Oct 31 at 9:21

2 votes

2 answers

94 views

Decoder only model AI making repetitive responses

I am making a Decoder only transformer using Pytorch and my dataset of choice is the fullEnglish dataset from kaggle Plaintext Wikipedia (full English). The problem is that my model output is ...

Kirito

13

asked Oct 29 at 14:32

0 votes

1 answer

77 views

Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens

I am trying to run Mistral-7B-Instruct-v0.2. Each run is PROMPT + details[i]. PROMPT has instructions on how to generate JSON based on details. As the prefix part of each input is same; kind of like a ...

acdhemtos

1

asked Oct 28 at 22:54

2 votes

1 answer

39 views

AttributeError: 'NoneType' object has no attribute 'blocks' when running Cache-DiT example with Wan2.2 model

I’m trying to use Cache-DiT to accelerate inference for the Wan2.2 model. However, when I run the example script, python run_wan_2.2_i2v.py --steps 28 --cache I get the following error. Namespace(...

傅靖茹

51

asked Oct 27 at 9:21

0 votes

0 answers

39 views

How do I interpret Gaussian process parameters?

I'm performing Gaussian process regression using GPyTorch. I'm modeling two correlated tasks as follows: class MyModel(gpytorch.models.ExactGP): def __init__(self, X, Y, likelihood): super(...

SirAndy3000

1

asked Oct 26 at 15:18

2 votes

0 answers

59 views

Having problems computing PDE Residuals

I'm computing PDE residuals for The_Well datasets (e.g. turbulent_radiative_layer_2D and shear_flow) using finite differences, but the residuals are much larger than I expect. The data are generated ...

Kain

21

asked Oct 26 at 10:22

0 votes

1 answer

29 views

Can I avoid setting-up and tearing down processes when using PyTorch DataLoader?

In my scenario I use multiple DataLoaders with multiple Datasets to evaluate models against each other (I want to test models with multiple resolutions, which means each dataset has a distinct ...

Yuval

3,598

asked Oct 24 at 17:10

1 vote

1 answer

125 views

Can uv integrate with e.g. pytorch prebuilt docker env?

So, pytorch requires a rather large bundle of packages. The prebuilt docker pytorch gpu images (https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html) are quite helpful in ...

helt

5,347

asked Oct 23 at 18:18

1 vote

0 answers

186 views

Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...

plznobug

143

asked Oct 23 at 12:36

0 votes

0 answers

98 views

ModuleNotFoundError: No module named 'losses.loss'; 'losses' is not a package error when training MAT model (PyTorch, NVIDIA repo)

I'm trying to fine-tune the MAT (Masked Attention Transformer) model from the official repository: https://github.com/fenglinglwb/MAT However, I keep getting the following error during training: ...

kitten3032

1

asked Oct 23 at 10:16

-1 votes

0 answers

38 views

open3d.ml build for tourch==2.10 (for sm_120 GPU architecture)

I have NVIDIA GeForce RTX 5060 with the "Blackwell" architecture with compute capability 12.0 that's why i have to use nightly build of pytorch=2.10.0.dev20251017+cu128 which support for ...

msaLina

21

asked Oct 21 at 16:45

0 votes

0 answers

98 views

Torch example transformer with TransformerDecoder

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...

cuneyttyler

1,395

asked Oct 21 at 8:48

0 votes

0 answers

40 views

T5-small generates only padding tokens during validation/test in PyTorch Lightning

I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps. The Problem: During validation_step and test_step, model.generate() consistently ...

GeraniumCat

21

asked Oct 20 at 20:11

-1 votes

0 answers

75 views

Torchvision save segmentation masks to png

There is a tutorial i try to follow https://docs.pytorch.org/tutorials/intermediate/torchvision_tutorial.html working with .png files as segmentation masks. The png files can be found here: https://...

Paul Borowy

57

asked Oct 20 at 14:27

2 votes

1 answer

123 views

Fast vectorized maximal independent set greedy algorithm [closed]

I need a really fast vectorized maximal independent set algorithm implemented in pytorch, so I can use it for tasks with thousands of nodes in reasonable time. I cannot use networkx, it is way too ...

Kemsikov

640

asked Oct 20 at 13:58

1 vote

0 answers

68 views

How to pass P_map: dict[str, torch.Tensor] to PEFT (LoRA)?

My proxy goal is to change LoRA from h = (W +BA)x to h = (W + BAP)x. Preliminary code attached for your reference My actual goal is to train a model with the following loss: 〖Θ ̃=(arg min)┬Δ ̂ 〗⁡〖‖𝑓_(...

Jason Rich Darmawan

2,193

asked Oct 15 at 5:25

Collectives™ on Stack Overflow