1

For some programs (not only one) I see that for most of the kernels, cache utilizations (l2 and unified) are low (up to 3 in the scale of 1 to 10). The programs are not toy and simple. Is that normal? The device is M2000.

I would like to know how cache utilization is measured? I didn't find any explanation about that in the documents.

1 Answer 1

4

If the kernel is limited by some other factor, such as compute or memory bound, then it is normal for the cache utilization to be low. The only way you can get the cache utilization really high (7 or higher) is to have a lot of data reuse in that cache.

The cache utilization should be measured as a percentage (from 0 to 10, 10 being 100%) of peak cache bandwidth (apparently with some normalization).

Often (will vary by GPU, and not clearly published) the available L2 cache bandwidth is around 2x or more the available memory (i.e. GPU DRAM) bandwidth. Therefore, to get a reading above 5 on this metric, the data bandwidth into your code as seen at the L2 would have to be higher than memory bandwidth. This usually implies data reuse.

It should be possible to write a test microbenchmark to explore this.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for that. I also have seen that for some kernels the reported L2 utilization is n/a. Does that mean, the kernel doesn't use cache at all? But L2 hit rate is a number greater than zero.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.