The manifold assumption states that the higher-dimensional input space comprises multiple lower dimensional manifolds on which all data points lie, and that data points on the same manifold share the same label.
For an intuitive example, consider a piece of paper crumpled up into a ball. The location of any points on the spherical surface can only mapped with three-dimensional x,y,z coordinates. But if that crumpled up ball is now flattened back into a sheet of paper, those same points can now be mapped with two-dimensional x,y coordinates. This is called dimensionality reduction, and it can be achieved mathematically using methods like autoencoders or convolutions.
In machine learning, dimensions correspond not to the familiar physical dimensions, but to each attribute or feature of data. For example, in machine learning, a small RGB image measuring 32x32 pixels has 3,072 dimensions: 1,024 pixels, each of which has three values (for red, green and blue). Comparing data points with so many dimensions is challenging, both because of the complexity and computational resources required and because most of that high-dimensional space does not contain information meaningful to the task at hand.
The manifold assumption holds that when a model learns the proper dimensionality reduction function to discard irrelevant information, disparate data points converge to a more meaningful representation for which the other SSL assumptions are more reliable.