You might want to take a look at sparse voxel octrees and you should be able to find some implementations.
You want a dynamic VBOdynamic VBO. Every frame, you work out which boxes in the octree you want to draw, calculate the position of the box's corners, add them to the dynamic vbo as vertices and create a supporting index buffer to detail which points join to which other points to make the triangles for the box.
You can add texture information to each box and create separate lists for each texture. You can also use one giant texture sheet and one giant VBO, it just means generating the texture coordinates for each vertex a bit differently.
You can also do some culling tricks here to avoid adding points and indices for parts you can't see (don't add polygons for visible boxes touching other visible, non-transparent boxes on all sides, don't add box faces which touch other visible boxes etc).
You can use a different octree for each chunk and you don't need to rebuild the vertex buffer if you didn't change the octree in the last frame, but if you are using heavy culling in the tree then you need to rebuild when the camera moves.