1

I have a list of network logs. There is a time stamp for each row and I need to pull the distinct timestamps such that I can use these for later queries. At the moment I am using the following query:

select distinct time from metastatus

Now I firstly want to confirm that the select distinct query returns sorted values?

Secondly I have already created an index on time which the database is using for the above query.

Lastly I did think about retrieving all the times and then finding the unique values through my code however there are too many value for me to retrieve all the times (160,000,000).

Thanks, James

3
  • I am currently running a query to cluster the database by time index i.e. cluster metastatus using timeindex Commented Dec 4, 2012 at 11:37
  • 1
    Select distinct doesn't guarantee sorted values, you need an order by on the end. The optimizer may decide to sort the values to produce the distinct list, but it's not the only approach it has. Commented Dec 4, 2012 at 11:44
  • Thank you, I had the order by but I removed it as I saw it returned ordered values and assumed it did this every time. I'll put it back. Commented Dec 4, 2012 at 11:48

1 Answer 1

1

The usual method to get DISTINCT values used by PostgreSQL is to sort the rows, then pick the first of each set. That's why you normally get a sorted result, but Postgres asserts nothing as to the order. Just add an ORDER BY that agrees with the DISTINCT clause. It's better from a documentation point of view in any case. Doesn't usually cost anything extra. Check with EXPLAIN ANALYZE to confirm.

The idea to retrieve all rows and sort outside of Postgres is nonsense. There is no chance this will even get close to the performance of the RBDMS which is optimized for just such a task. You would also transfer quite a bit of noise over the wire. Consider the answers under this related question.

Sign up to request clarification or add additional context in comments.

2 Comments

I understand that the performance of an SQL query is likely to be better than any code I write to sort rows. I am not doing it through my code anyway because there are too many rows and as you said it would likely be slower. I was just going to try both methods for a comparison of the time that's the only reason I mentioned it, but it's not possible.
This is bad advice. An order by on a large table can have even worse performance than the select distinct query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.