0

in these days, i'm trying program on mobile gpu(adreno)

the algorithm what i use for image processing has 'randomness' for memory access.

it refers some pixels in 'fixed' range for filtering.

BUT, i cant know exactly which pixel will be referred(depends on image)

as far as i understood. if multiple thread access local memory bank it causes bank conflict. so in my case it should make bank conflict.

MY question: Can i eliminate bank conflict at random memory access?

or can i reduce them?

1
  • adreno (qualcomm) has nothing to do with CUDA, removing CUDA tag. Commented May 9, 2015 at 3:10

2 Answers 2

0

Assuming that the distances of your randomly accessed pixels is somehow normal distributed, you could think of tiling your image into subimages.

What I mean: instead of working with a (lets say) 1024x1024 image, you might have 4x4 images of size 256x256. Each of them is kept together in memory, so "near" pixel access stays within the same image object. Only the far distance operations need to access different subimages.

A second option: instead of using CLImage objects, try to save your data into an array. The data in the array can be stored in a Z-order curve sorting. This also leads to a reduced spatially distribution (compared to row-order-sorting)

But of course, this depends strongly on your image size.

Sign up to request clarification or add additional context in comments.

2 Comments

the dependency is very strong, infact it often slows down things for larger images - AMD uses this 2d strategy with their CLImage 2d types as does Nvidia. Always benchmark to see if such strategies hurt or help! It may not do what you expect.
I agree that you always benchmark a memory access pattern. But is there any hard data to to support the claim of 'infact it often slows down things for larger images '? What threshold determines the largeness of the image? Or is this just a supposition?
0

There are a variety of ways to deal with bank conflicts - the size of the elements you are working with, the strides between lines and shifting the coordinates around to different memory addresses. It's never going to be as good as non-random / conflict free though and so what you will notice is depending on the image - you will see significantly different compute times.

See http://cuda-programming.blogspot.com/2013/02/bank-conflicts-in-shared-memory-in-cuda.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.