You can safely overcome that problem checking for the correct indices. This is your complete kernel call:
__global__ void UpdateParticle(float* position, float* velocity, float frameTime, int numParticles)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x; // Compute the index
if (idx < numParticles) { // Is this index valid?
position[idx] = position[idx] - velocity[idx] * frameTime * 0.001f;
... // some more updates
}
}
You might also want to precompute the frameTime * 0.001f bit in a register before anything else (just do float realTime = frameTime * 0.001f and use it instead) or even better: pass it already transformed from host codeeven better: pass it already transformed from host code. It won't be a problem for such a small number of operations, but registers are also shared between blocksregisters are also shared between blocks, so registers (any non-qualified variable inside your kernel, like idx in my examples) can be a bottleneck too. Bear it in mind!