If the kernel is limited by some other factor, such as compute or memory bound, then it is normal for the cache utilization to be low. The only way you can get the cache utilization really high (7 or higher) is to have a lot of data reuse in that cache.
The cache utilization should be measured as a percentage (from 0 to 10, 10 being 100%) of peak cache bandwidth (apparently with some normalization).
Often (will vary by GPU, and not clearly published) the available L2 cache bandwidth is around 2x or more the available memory (i.e. GPU DRAM) bandwidth. Therefore, to get a reading above 5 on this metric, the data bandwidth into your code as seen at the L2 would have to be higher than memory bandwidth. This usually implies data reuse.
It should be possible to write a test microbenchmark to explore this.