You have to handle collisions with a sorted data structure, so you can have nlog(n) times instead of the terrible n^2. And nlog(n) is almost linear as you might know. One (classical) example is a quadtree, there's a quite simple and well written tutorial here, with graphics and code (Java) :
Rq : it is quite easy to find an implementation for QuadTrees in
any language. Still, you have to have to think about the right
'granularity' for the tree, and the bigger the tree size, the more
we have entities that does not fit inside a node.
Rq 2 : since your space partitioning is done for collision detection
only, you have perfect freedom to divide space as you like.
For instance, i would not divide into four egal parts but rather
i would use the baricenter of current level entities as a center for
the new divide. 1) the algorithm is still n*log(n), 2) you loose the
possibility to 'rebuild' the scene out of the tree -but you don't care-
and 3) you have a much more balanced tree, less overhead.
Rq3 : Once you have your tree, a 'collision' between the screen and the
entities give you... the visible entities !! in a time more like log(n),
so why not if n is big ? (worst case is obviously a time in n for this
approach.)