I have a simulation task at hand with ~1e6 time series to be clustered on the basis of statistical measures every few days in the simulation. Most clustering methods I'm aware of require an affinity matrix to be constructed. Given that I've limited memory, I would like to work with a solution that is preferably linear in memory requirements, even if it takes longer to compute.
I have not had much success figuring out a good set of algorithms I can start looking into. k-means is one algo I'm looking at but it requires the number of partitions to be specified a-priori which is not available in my problem. So, it is not the best algo for my purposes.
If you have any advice on this topic which could help me get started, I'd really appreciate it.