1

I'm currently writing an OpenCL kernel (but I suppose that in CUDA in will be the same), and currently I try to optimize for NVidia GPU.

I currently use 63 registers in my kernel, this kernel is very big and so it use all the GPU registers. I'm looking for some way to:

1) See which variables are in registers and which are then in global memory (Because if I have not enough registers it seems the compiler save the variables in global memory).

2) Is there a way to specify which variable is more important (or which should be in registers). Because I use some variables that are present but less used. A way to give priority ?

Is there other optimization strategy when we already use all the registers ?

BTW : I have also try to read the PTX code and search for all the ".reg" keywords but the problem is that the PTX is unreadable, I don't know which register is used for which variable in my code. I have'nt find any way to have the correspondance !

thanks

3 Answers 3

3

(1) It's called register spilling. I don't think there is a way to find out which variables get spilled except examining the SASS assembly. OpenCL first gets compiled to PTX, which is a virtual machine with an infinite number of registers (no spilling). See the NVIDIA presentation Local Memory and Register Spilling for more information.

(2) You can try using the volatile keyword when declaring the variables that you don't want to keep in registers. volatile will force the compiler to push the variable out to memory instead of carrying it in a register between operations.

Sign up to request clarification or add additional context in comments.

2 Comments

volatile also prevents a lot of optimizations however so it's not necessarily a win.
volatile is a bad idea, because every time the variable is used, there will be a read. And from global memory, this will be slow.
2

See which variables are in registers and which are then in global memory

For this i do not know the way how to check it, however

Is there a way to specify which variable is more important

One trick that i use when i see that i have spilled registers (due to lack of them or when i need to use dynamic indexing in local vars, which is bad) is to explicitly store ones, that i think are not so critical, into local memory (called "shared" in CUDA)

e.g. before:

uint16 somedata;

after:

__local uint16 somedata[WG_SIZE]; // or __local uint someadata[16];

but beware that if your local memory usage will be greatly increased you are risking to have penalty because number of inflight wavefronts will be less ( i.e. you might have lower occupancy)

Hope this helps.

2 Comments

__local uint16 somedata means all threads in the group will see the same variable instead of a private copy, so it will cause different behavior.
yes you are right, i made a typo, actualyy i should have writen __local uint16 somedata[WG_SIZE]; I'll correct it.
1

Without seeing the code, one way to try to "force" the use of registers can be to use local copies in a limited scope. Maybe only some of your variables are accessed in a given part of your code. Then you can declare new variables in a scope and use these intensively. There is no guarantee, but I know it helps sometimes.

int a, b, d;
double x,y;

...

{
     int ra = a;     // copy into new variables more likely to be kept in registers
     double rx = x;

     ... use rx and ra ...

     a = ra;
     b = rx;       // copy back.
}

...

4 Comments

Thanks a lot for all your answers, I have also try to read the PTX code and search for all the ".reg" keywords but the problem is that the PTX is unreadable, I don't know which register is used for which variable in my code. I have'nt find any way to have the correspondance !
It seems that the best way to optimize registers usage is using {...} to define scope. I'm surprised that the compiler is not able to do this without any help !
Something else... In CUDA we can use the cudaDeviceSetCacheConfig method to increase the size of the L1 cache. Does someone know if there is a way to call it from an OpenCL software, even if it is not really standard !
Another related questions: 1) Does my structures are also stored in register or in global memory, L1 etc ? 2) If my values are put in L1 cache, then it can also suffer from memory banks conflicts ? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.