2

I have the below table.

date    label   time
2014-04-06  A   12:05:56
2014-04-06  A   23:02:32
2014-04-06  B   8:39:25
2014-04-06  B   12:36:37
2014-04-06  C   12:20:43
2014-04-06  C   12:56:44
2014-04-06  D   20:52:22
2014-04-06  E   22:25:30
2014-04-06  F   12:16:15
2014-04-06  F   12:31:09
2014-04-06  F   17:12:06
2014-04-06  G   7:48:32
2014-04-06  H   17:58:11
2014-04-05  I   12:05:56
2014-04-05  I   20:02:32
2014-04-05  I   23:57:32
2014-04-05  J   12:36:37
2014-04-05  K   12:20:43
2014-04-05  L   12:56:44
2014-04-05  M   20:52:22
2014-04-05  M   22:25:30
2014-04-05  N   12:16:15
2014-04-05  O   12:31:09
2014-04-05  O   17:12:06
2014-04-05  P   7:48:32
2014-04-05  Q   17:58:11
2014-04-06  R   1:05:56
2014-04-06  R   5:02:32
2014-04-06  R   8:39:25
2014-04-06  R   12:36:37
2014-04-06  S   12:20:43
2014-04-06  S   12:56:44
2014-04-06  S   20:52:22
2014-04-06  T   22:25:30
2014-04-06  U   12:16:15
2014-04-06  V   12:31:09
2014-04-06  W   3:48:32
2014-04-06  W   7:48:32
2014-04-06  W   17:58:11

I'm trying to create a view with this output.

date    label   time    status
2014-04-06  A   12:05:56    Repeat
2014-04-06  A   23:02:32    Unique
2014-04-06  B   8:39:25 Repeat
2014-04-06  B   12:36:37    Unique
2014-04-06  C   12:20:43    Repeat
2014-04-06  C   12:56:44    Unique
2014-04-06  D   20:52:22    Unique
2014-04-06  E   22:25:30    Unique
2014-04-06  F   12:16:15    Repeat
2014-04-06  F   12:31:09    Repeat
2014-04-06  F   17:12:06    Unique
2014-04-06  G   7:48:32 Unique
2014-04-06  H   17:58:11    Unique
2014-04-05  I   12:05:56    Repeat
2014-04-05  I   20:02:32    Repeat
2014-04-05  I   23:57:32    Unique
2014-04-05  J   12:36:37    Unique
2014-04-05  K   12:20:43    Unique
2014-04-05  L   12:56:44    Unique
2014-04-05  M   20:52:22    Repeat
2014-04-05  M   22:25:30    Unique
2014-04-05  N   12:16:15    Unique
2014-04-05  O   12:31:09    Repeat
2014-04-05  O   17:12:06    Unique
2014-04-05  P   7:48:32 Unique
2014-04-05  Q   17:58:11    Unique
2014-04-06  R   1:05:56 Repeat
2014-04-06  R   5:02:32 Repeat
2014-04-06  R   8:39:25 Repeat
2014-04-06  R   12:36:37    Unique
2014-04-06  S   12:20:43    Repeat
2014-04-06  S   12:56:44    Repeat
2014-04-06  S   20:52:22    Unique
2014-04-06  T   22:25:30    Unique
2014-04-06  U   12:16:15    Unique
2014-04-06  V   12:31:09    Unique
2014-04-06  W   3:48:32 Repeat
2014-04-06  W   7:48:32 Repeat
2014-04-06  W   17:58:11    Unique

The criteria for the status column will be like this.

I wanted to loop in each row based on the label and time column criteria the status column is derived.

Suppose the 1st row of label = 2nd row of label and the difference of time for 2nd row and 1st row is greater 24:00:00 then it must be yes else no.

I do it like this in excel.

=IF(AND(B2=B3,C3-C2>1),"Yes","No")

I'm new to PostgreSQL and database.

Any suggestions or help will be very much helpful in passing through this.

Thanks in advance.

7
  • You want a window function - specifically, lag. Commented Apr 7, 2014 at 9:03
  • I got this. But how do I use it? Commented Apr 7, 2014 at 9:09
  • I answered your question, but now I am actually not sure what you need. Can you elaborate why there is a yes for D and E? Commented Apr 7, 2014 at 11:06
  • Why it is yes for B? Commented Apr 7, 2014 at 11:12
  • I'll make it very clear. Suppose I have A which has been repeated twice with in 24 hours. In this case. The first A will be counted as a repeat and the second A will be considered as unique one. Any labels with n number of times with in 24 hour span, only the last entry will be unique and the rest will be counted as repeat. And in the answer provided for the F label I get no, yes no but the actual value must be yes, yes, no. Hope you are clear now. Commented Apr 7, 2014 at 11:21

1 Answer 1

3

Notes:

  1. If your formula actually works in Excel, than you have stored dates in cells, not time.
  2. For D, E, I do not understand, how this should return 'yes' when previous row does not have same label
  3. You have to add some column with ID to your table (!). While Excel keeps the same order of rows in sheet (unless you change it explicitly), PostgreSQL does not. Thus, if you really have only time in column time, than there is no way you can get same order of rows as you have in your table, leading in completely incorrect results.
  4. If your are using version 8.4 then your link is correct, however it would be better if you use current documentation

Data:

drop table if exists tmp.test;

create table tmp.test (id int, ddate date, label varchar, ttime time);

insert into tmp.test values

(1, '2014/6/4','A','12:05:56'),
(2, '2014/6/4','A','23:02:32'),
(3, '2014/6/4','B','8:39:25'),
(4, '2014/6/4','B','12:36:37'),
(5, '2014/6/4','C','12:20:43'),
(6, '2014/6/4','C','12:56:44'),
(7, '2014/6/4','D','20:52:22'),
(8, '2014/6/4','E','22:25:30'),
(9, '2014/6/4','F','12:16:15'),
(10, '2014/6/4','F','12:31:09'),
(11, '2014/6/4','F','7:12:06'),
(12, '2014/6/4','G','7:48:32'),
(13, '2014/6/4','H','17:58:11');

Query:

select
  id, 
  ddate,
  label,
  ttime,
  case when (lag(ttime) over(partition by label order by id))::interval
        + ttime::interval > interval '24 hours' then 'yes' else 'no' end
  -- ,(lag(ttime) over(partition by label order by ttime))::interval + ttime::interval
from
  tmp.test

Explanation:

  1. lag function will get value in previous row for given partition. In our case, partition is defined by label.
  2. cast operator :: will change time type into interval, so we can add time and get more than 24 hours.
  3. We compare total to 24 hours interval and display a nice label yes or no.

Update:

select
  id, 
  ddate,
  label,
  ttime,
  case when lead(label) over(partition by label order by id) is null then 'no' else 'yes' end
from
  tmp.test
Sign up to request clarification or add additional context in comments.

6 Comments

In the updated answer there's no criteria for the time difference.
In your sample data, difference between following and current value for the same label is never more than 24 hours, because you have all positive values and all are lower then 24 hours. There is even same date for all of the rows. It is really unclear what you would like to test.
I'm extremely sorry for making it very hard you to understand this. It's very simple that I'm looking for a analysis on how many unique and repeats I have in a data. In the sample data I've given only one date and few labels. But in the real data I've got many months. I just wanted to get the unique and repeat count based on this logic. Any value in the label column repeated n number of times with in 24 hours, all the values will be considered as repeat except the last entry registered when their difference from the previous value is less than 24 hours.
Can you please modify your sample data in such way that there actually is an example demonstrating difference more than 24 hours?
Indicate at least one example (row) where difference is more than 24 hours.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.