The Phong model (diffuse + specular + ambient) is just an approximation of how light behaves. In reality, there's what's called a BRDF or bidirectional reflectance distribution function. It is a function which tell you, for a given point on the surface the probability of light bouncing in a given direction given an incident ray of light. The components (diffuse, specular) of the Phong model, (or any other model for that matter) are used to approximate the BRDF, by splitting it into roughly independent components which are easy to evaluate independently and then added together.
What does it mean? well, when you cast a ray and it hits a surface, you now have to decide where to shoot the next ray or rays. For perfectly specular surfaces, the ray is reflected with the same incident angle. How much those reflected rays will deviate from the perfect reflection vector will depend on the glossiness of the surface. For completely diffuse surfaces, the ray can be cast in any direction on the hemisphere above the point on the surface.
If you want to approximate diffuse and specular components independently, then you need to cast at least two other rays, one which will sample the diffuse contribution and another one which will sample the specular contribution. If you consider refraction as well, then you'll need a third ray.
In reality, when the reflection is not deterministic (when you don't have a perfectly reflective surface) you won't just shoot a single ray, but instead you'll shoot many rays to sample the surroundings. The more rays you shoot, the less noise you'll have in your final image, but it will become slower to compute.
How many rays to actually send and where is a very interesting and complex topic in itself. Shooting rays is not cheap, specially considering the noise in the final image decreases as the square root of the number of rays you sample. So sampling efficiently is important.
For example, for a particular material you might want to send a thousand rays, but send 90% to sample the diffuse component and 10% to sample the specular component because the diffuse part will contribute more to the final color and the surface is very smooth, and hence those 10% of the rays will all follow pretty much the same path.