Does nvcc optimize register usage?

Question

I have the following kernel:

    void version1(float *X, float *Y, int N) {
        int n;
        float x,y;

        n = blockIdx.x * blockDim.x + threadIdx.x;
        if (n >= N) return;

        x=X[n];
       x=x+1;
       X[n]=x;

       y=Y[n];
       y=y+1;
       Y[n]=y;
    }

and a second version

    void version2(float *X, float *Y, int N) {
        int n;
        float Xb47w;

        n = blockIdx.x * blockDim.x + threadIdx.x;
        if(n >= N) return;

        Xb47w=X[n];
        Xb47w=Xb47w+1;
        X[n]=Xb47w;

        Xb47w=Y[n];
        Xb47w=Xb47w+1;
        Y[n]=Xb47w;
    }

They produce the same result. However version1 is simpler to read while version2 is more difficult because Xb47w is used for X as well as for Y. So I would prefer version1 but there are two registers x y instead of 1 Xb47w for version2. I have a lot of kernels where I save registers this way but there are more difficult to read and maintain.

x is no longer used after X[n]=x so I wonder if the CUDA compiler understands that and makes version1 nearly identical to version2, thus saving one register?

einpoklum · Accepted Answer · 2020-07-31 16:45:57Z

2

Does nvcc optimize register usage?

Yes, it nvcc tries to compile your code to use less registers (although minimum register use is not in itself the goal).

I wonder if the CUDA compiler understands that and makes version1 nearly identical to version2, thus saving one register?

Yes, it does. Or rather, it doesn't "understand" what your code does, but it notices redundant variables/values and removes them as part of the optimization process.

Thus, both versions of your function compile to the same PTX code (GodBolt.org)

answered Jul 31, 2020 at 16:45

einpoklum

138k86 gold badges448 silver badges919 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Thomas Caissard · Accepted Answer · 2020-07-31 15:43:52Z

2

Internally, nvcc uses a C++ compiler to optimize the code (well I'm oversimplifying link) The question would, therefore, be would a C++ compiler saves a register?

And the answer is use godbolt and compare the assembly of your two programs!

Edit: it's not the whole story, what you are going to see is the PTX representation of your program (which you can also obtain using nvcc). The next step would be to look at the gpu assembly itself called the SASS (which is card dependent).

edited Jul 31, 2020 at 15:43

answered Jul 31, 2020 at 15:37

Thomas Caissard

8764 silver badges10 bronze badges

4 Comments

einpoklum Over a year ago

With GodBolt, you can compare the compiled "PTX" code of the kernels. That's not quite an assembly language; it's an intermediate representation that's close to the assembly language of NVIDIA GPUs and common to all of them - but isn't itself the assembly of any of them.

Thomas Caissard Over a year ago

Well yeah, thus this oversimplying things. But it would give you a rough idea of what it's doing.

YLS Over a year ago

So I understand that I cannot be sure that nvcc compiles version1 as version. That means that I have to stick with this register management that can be confusing .

einpoklum Over a year ago

@YLS: No, you misunderstand.

Collectives™ on Stack Overflow

Does nvcc optimize register usage?

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related