Skip to main content
deleted 12 characters in body
Source Link
  • Scene has 87706 vertices, shown in Blender statistics.
  • I'm using glMultiDrawElementsIndirect with single VAO.
  • Joint matrices for non-skinned meshes are identity matrix.
  • MeshUniform is write-only, persistent, coherent ssbo map. Only updated when needed.
  • I'm using the same calculation on shadowmaps, so it takes another 6ms.
  • Gpu is 1080 Ti.
  • Scene has 87706 vertices, shown in Blender statistics.
  • I'm using glMultiDrawElementsIndirect with single VAO.
  • Joint matrices for non-skinned meshes are identity matrix.
  • MeshUniform is write-only, persistent, coherent ssbo map. Only updated when needed.
  • I'm using the same calculation on shadowmaps, so it takes another 6ms.
  • Gpu is 1080 Ti.
  • Scene has 87706 vertices, shown in Blender statistics.
  • I'm using glMultiDrawElementsIndirect with single VAO.
  • Joint matrices for non-skinned meshes are identity matrix.
  • MeshUniform is persistent, coherent ssbo map. Only updated when needed.
  • I'm using the same calculation on shadowmaps, so it takes another 6ms.
  • Gpu is 1080 Ti.
added 1415 characters in body
Source Link

Edit with DMGregory's suggestions:

I tried,

  • Multiplying joint matrices by position vector, then summing the results.
  • Pre-multiplying model with joint matrices on cpu.

It looks like this now;

vec4 positionVec4 = vec4(position, 1.0);

vec4 sum =
  meshUniform.jointMatrices[joints[0]] * weights[0] * positionVec4 +
  meshUniform.jointMatrices[joints[1]] * weights[1] * positionVec4 +
  meshUniform.jointMatrices[joints[2]] * weights[2] * positionVec4 +
  meshUniform.jointMatrices[joints[3]] * weights[3] * positionVec4;

positionVec4 = sum;

It's still taking 5-6ms to run.


Someone in lwjgl forums posted a question similar to mine in 2012.
http://forum.lwjgl.org/index.php?topic=4519.0

In his last message he said;

using a constant as the array index while accessing boneMatrixes brings performance up

Sure enough if I exclude joints array lookup from above code like this;

vec4 positionVec4 = vec4(position, 1.0);

vec4 sum =
  meshUniform.jointMatrices[0] * weights[0] * positionVec4 +
  meshUniform.jointMatrices[1] * weights[1] * positionVec4 +
  meshUniform.jointMatrices[2] * weights[2] * positionVec4 +
  meshUniform.jointMatrices[3] * weights[3] * positionVec4;

positionVec4 = sum;

it renders in 1ms. But of course resulting image is not correct.

Maybe it will give some ideas to more experienced people on OpenGL.


Edit with DMGregory's suggestions:

I tried,

  • Multiplying joint matrices by position vector, then summing the results.
  • Pre-multiplying model with joint matrices on cpu.

It looks like this now;

vec4 positionVec4 = vec4(position, 1.0);

vec4 sum =
  meshUniform.jointMatrices[joints[0]] * weights[0] * positionVec4 +
  meshUniform.jointMatrices[joints[1]] * weights[1] * positionVec4 +
  meshUniform.jointMatrices[joints[2]] * weights[2] * positionVec4 +
  meshUniform.jointMatrices[joints[3]] * weights[3] * positionVec4;

positionVec4 = sum;

It's still taking 5-6ms to run.


Someone in lwjgl forums posted a question similar to mine in 2012.
http://forum.lwjgl.org/index.php?topic=4519.0

In his last message he said;

using a constant as the array index while accessing boneMatrixes brings performance up

Sure enough if I exclude joints array lookup from above code like this;

vec4 positionVec4 = vec4(position, 1.0);

vec4 sum =
  meshUniform.jointMatrices[0] * weights[0] * positionVec4 +
  meshUniform.jointMatrices[1] * weights[1] * positionVec4 +
  meshUniform.jointMatrices[2] * weights[2] * positionVec4 +
  meshUniform.jointMatrices[3] * weights[3] * positionVec4;

positionVec4 = sum;

it renders in 1ms. But of course resulting image is not correct.

Maybe it will give some ideas to more experienced people on OpenGL.

edited title
Link

OpenGL Vertex Shader "joint matrix * weight multiplication"weight" multiplication performance

Source Link
Loading