0

I have the following kernel:

    void version1(float *X, float *Y, int N) {
        int n;
        float x,y;

        n = blockIdx.x * blockDim.x + threadIdx.x;
        if (n >= N) return;

        x=X[n];
       x=x+1;
       X[n]=x;

       y=Y[n];
       y=y+1;
       Y[n]=y;
    }

and a second version

    void version2(float *X, float *Y, int N) {
        int n;
        float Xb47w;

        n = blockIdx.x * blockDim.x + threadIdx.x;
        if(n >= N) return;

        Xb47w=X[n];
        Xb47w=Xb47w+1;
        X[n]=Xb47w;

        Xb47w=Y[n];
        Xb47w=Xb47w+1;
        Y[n]=Xb47w;
    }

They produce the same result. However version1 is simpler to read while version2 is more difficult because Xb47w is used for X as well as for Y. So I would prefer version1 but there are two registers x y instead of 1 Xb47w for version2. I have a lot of kernels where I save registers this way but there are more difficult to read and maintain.

x is no longer used after X[n]=x so I wonder if the CUDA compiler understands that and makes version1 nearly identical to version2, thus saving one register?

2 Answers 2

2

Does nvcc optimize register usage?

Yes, it nvcc tries to compile your code to use less registers (although minimum register use is not in itself the goal).

I wonder if the CUDA compiler understands that and makes version1 nearly identical to version2, thus saving one register?

Yes, it does. Or rather, it doesn't "understand" what your code does, but it notices redundant variables/values and removes them as part of the optimization process.

Thus, both versions of your function compile to the same PTX code (GodBolt.org)

Sign up to request clarification or add additional context in comments.

Comments

2

Internally, nvcc uses a C++ compiler to optimize the code (well I'm oversimplifying link) The question would, therefore, be would a C++ compiler saves a register?

And the answer is use godbolt and compare the assembly of your two programs!

Edit: it's not the whole story, what you are going to see is the PTX representation of your program (which you can also obtain using nvcc). The next step would be to look at the gpu assembly itself called the SASS (which is card dependent).

4 Comments

With GodBolt, you can compare the compiled "PTX" code of the kernels. That's not quite an assembly language; it's an intermediate representation that's close to the assembly language of NVIDIA GPUs and common to all of them - but isn't itself the assembly of any of them.
Well yeah, thus this oversimplying things. But it would give you a rough idea of what it's doing.
So I understand that I cannot be sure that nvcc compiles version1 as version. That means that I have to stick with this register management that can be confusing .
@YLS: No, you misunderstand.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.