CUDA Error - Kernel execution failed with invalid device function

Question

i am trying to run CIFAR10 after successfully compiling cuda-convnet2, i am getting this error

src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .

i am running linux on Zotak Nvidia geforce 750ti GPU. Here is the log output

$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
python: can't open file 'convnet.py': [Errno 2] No such file or directory
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop$ cd cuda-convnet2
pbu@pbu-OptiPlex-740-Enhanced:~/Desktop/cuda-convnet2$ python convnet.py --data-provider cifar --test-range 6 --train-range 1-5 --data-path cifar/cifar-10-py-colmajor --inner-size 24 --save-path cifar/save/ --gpu 0 --layer-def layers/layers-cifar10-11pct.cfg --layer-params layers/layer-params-cifar10-11pct.cfg
Initialized data layer 'data', producing 1728 outputs
Initialized data layer 'labels', producing 1 outputs
Initialized convolutional layer 'conv1' on GPUs 0, producing 24x24 64-channel output
Initialized max-pooling layer 'pool1' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm1' on GPUs 0, producing 12x12 64-channel output
Initialized convolutional layer 'conv2' on GPUs 0, producing 12x12 64-channel output
Initialized cross-map response-normalization layer 'rnorm2' on GPUs 0, producing 12x12 64-channel output
Initialized max-pooling layer 'pool2' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local3' on GPUs 0, producing 6x6 64-channel output
Initialized locally-connected layer 'local4' on GPUs 0, producing 6x6 32-channel output
Initialized fully-connected layer 'fc10' on GPUs 0, producing 10 outputs
Initialized softmax layer 'probs' on GPUs 0, producing 10 outputs
Initialized logistic regression cost 'logprob' on GPUs 0
Initialized neuron layer 'conv2_neuron' on GPUs 0, producing 9216 outputs
Initialized neuron layer 'conv1_neuron' on GPUs 0, producing 36864 outputs
Initialized neuron layer 'local4_neuron' on GPUs 0, producing 1152 outputs
Initialized neuron layer 'local3_neuron' on GPUs 0, producing 2304 outputs
Layer local4_neuron using acts from layer local4
Layer conv2_neuron using acts from layer conv2
Layer local3_neuron using acts from layer local3
Layer conv1_neuron using acts from layer conv1
=========================
Importing cudaconvnet._ConvNet C++ module
Fwd terminal: logprob
found bwd terminal conv1[0] in passIdx=0
=========================
Training ConvNet
Add PCA noise to color channels with given scale                        : 0     [DEFAULT]
Check gradients and quit?                                               : 0     [DEFAULT]
Conserve GPU memory (slower)?                                           : 0     [DEFAULT]
Convert given conv layers to unshared local                             :       
Cropped DP: crop size (0 = don't crop)                                  : 24    
Cropped DP: test on multiple patches?                                   : 0     [DEFAULT]
Data batch range: testing                                               : 6-6   
Data batch range: training                                              : 1-5   
Data path                                                               : cifar/cifar-10-py-colmajor 
Data provider                                                           : cifar 
Force save before quitting                                              : 0     [DEFAULT]
GPU override                                                            : 0     
Layer definition file                                                   : layers/layers-cifar10-11pct.cfg 
Layer file path prefix                                                  :       [DEFAULT]
Layer parameter file                                                    : layers/layer-params-cifar10-11pct.cfg 
Load file                                                               :       [DEFAULT]
Logreg cost layer name (for --test-out)                                 :       [DEFAULT]
Minibatch size                                                          : 128   [DEFAULT]
Number of epochs                                                        : 50000 [DEFAULT]
Output test case predictions to given path                              :       [DEFAULT]
Save file override                                                      :       
Save path                                                               : cifar/save/ 
Subtract this scalar from image (-1 = don't)                            : -1    [DEFAULT]
Test and quit?                                                          : 0     [DEFAULT]
Test on one batch at a time?                                            : 1     [DEFAULT]
Testing frequency                                                       : 57    [DEFAULT]
Unshare weight matrices in given layers                                 :       
Write test data features from given layer                               :       [DEFAULT]
Write test data features to this path (to be used with --write-features):       [DEFAULT]
=========================
Running on CUDA device(s) 0
Current time: Thu Jan 15 20:15:50 2015
Saving checkpoints to cifar/save/ConvNet__2015-01-15_20.15.47
=========================
src/nvmatrix.cu(394) : getLastCudaError() CUDA error : kSetupCurand: Kernel execution failed : (8) invalid device function .

This probably means the code you are using hasn't been compiled for the architecture you are trying to run it on. — talonmies
– talonmies, Commented Jan 15, 2015 at 19:38
Thanks but i compiled successfully on my machine, just when i run the executable it crashes. :( — pbu
– pbu, Commented Jan 16, 2015 at 15:42
By architecture, I mean the GPU architecture. You must make sure that the code has been compiled for your GPU (or to an architecture that could be JIT recompiled for your GPU) otherwise it will fail at runtime. — talonmies
– talonmies, Commented Jan 16, 2015 at 15:44
Thanks talonmies :) My gpu is Geforce 750ti. Do i need to tweak the Makefile? — pbu
– pbu, Commented Jan 16, 2015 at 15:49

oneg · Accepted Answer · 2015-02-08 03:44:20Z

1

You may need to modify the Makefile:

cudaconv3/Makefile
cudaconvnet/Makefile
nvmatrix/Makefile

and change

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS   := $(GENCODE_SM35)

to

GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_SM50    := -gencode arch=compute_50,code=sm_50
GENCODE_FLAGS   := $(GENCODE_SM50)

since 750Ti is with Compute Capability 5.0.

answered Feb 8, 2015 at 3:44

oneg

385 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

CUDA Error - Kernel execution failed with invalid device function

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related