1

I have the columns party_id and data_date. I want to sort the data_dates in ascending order for each party_id. I will always take the first data_date. After that, the data_date I select must be at least 30 days later than the previous one I selected (30 days or more: after 04.04.2023, 04.05.2023 would be OK).

For example, for party_id 12345, I have the following data_dates:

party_id data_date
12345 01.01.2023
12345 04.02.2023
12345 05.02.2023
12345 30.03.2023
12345 31.03.2023
12345 04.04.2023

For this party_id, the selected dates should be 01.01.2023, 04.02.2023, and 30.03.2023.
This is because:

  • I first selected 01.01.2023.
  • The difference between 01.01.2023 and 04.02.2023 is more than 30 days, so I choose 04.02.2023.
  • When I check 05.02.2023, the difference with 04.02.2023 is only 1 day, so I do not select this date. Comparing 30.03.2023 with the last selected date, 04.02.2023, I see that the difference is more than 30 days, so I select 30.03.2023 as well.
  • I do not select the last date, 04.04.2023, because the difference with the most recent selected date, 31.03.2023, is 4 days.

I tried to do it on Oracle SQL in one query but I could not achieve it. I need to do it with one step.

5 Answers 5

4

From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern patching:

SELECT *
FROM   table_name
       MATCH_RECOGNIZE(
         PARTITION BY party_id
         ORDER BY data_date
         ALL ROWS PER MATCH       
         PATTERN (first_row {- within_30_days* -})
         DEFINE
           within_30_days AS data_date < first_row.data_date + INTERVAL '30' DAY
       )

In Oracle 11, you can use a MODEL clause to keep track of the valid dates:

SELECT party_id, data_date
FROM   (
SELECT *
FROM   (
  SELECT party_id, data_date,
         ROW_NUMBER() OVER (
           PARTITION BY party_id
           ORDER BY data_date
         ) AS rn
  FROM   table_name
)
MODEL
  PARTITION BY (party_id)
  DIMENSION BY (rn)
  MEASURES (data_date, data_date AS latest_date, 1 AS is_valid)
  RULES AUTOMATIC ORDER (
    is_valid[rn>1]    = CASE
                        WHEN data_date[cv()] >= latest_date[cv()-1] + INTERVAL '30' DAY
                        THEN 1
                        ELSE 0
                        END,
    latest_date[rn>1] = CASE is_valid[cv()]
                        WHEN 1
                        THEN data_date[cv()]
                        ELSE latest_date[cv()-1]
                        END
  )
)
WHERE is_valid = 1;

Which, for the sample data:

CREATE TABLE table_name (party_id, data_date) AS
SELECT 12345, DATE '2023-01-01' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-02-04' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-02-04' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-02-05' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-03-30' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-03-31' FROM DUAL UNION ALL
SELECT 12345, DATE '2023-04-04' FROM DUAL;

Both output:

PARTY_ID DATA_DATE
12345 2023-01-01 00:00:00
12345 2023-02-04 00:00:00
12345 2023-03-30 00:00:00

Oracle 21 fiddle Oracle 11 fiddle

Sign up to request clarification or add additional context in comments.

2 Comments

I love your first query. I'll have to spend some quality time to learn MATCH_RECOGNIZE. The condition in the DEFINE clause should be changed from <= to < in order to generate the correct result when executed against the test data in my answer. The second query, however, only generates the first three rows for party_id = 90210, omitting the next seven which should be present.
Thanks, updated. The MODEL clause only needed RULES AUTOMATIC ORDER.
2

The table definition:

create table PARTY_TABLE
             ( PARTY_ID   number
             , PARTY_DATE date
             , primary key ( PARTY_ID, PARTY_DATE )
             )
;

The test data now contains a second party:

insert into PARTY_TABLE -- bronze
select 12345, date '2023-01-01' from dual union all
select 12345, date '2023-02-04' from dual union all
select 12345, date '2023-02-05' from dual union all
select 12345, date '2023-03-30' from dual union all
select 12345, date '2023-03-31' from dual union all
select 12345, date '2023-04-04' from dual union all
select 90210, date '2024-06-19' from dual union all
select 90210, date '2024-08-02' from dual union all
select 90210, date '2024-09-03' from dual union all
select 90210, date '2024-09-07' from dual union all
select 90210, date '2024-10-03' from dual union all
select 90210, date '2024-10-17' from dual union all
select 90210, date '2024-10-18' from dual union all
select 90210, date '2024-10-31' from dual union all
select 90210, date '2024-11-02' from dual union all
select 90210, date '2024-11-18' from dual union all
select 90210, date '2024-11-24' from dual union all
select 90210, date '2024-12-08' from dual union all
select 90210, date '2024-12-27' from dual union all
select 90210, date '2025-01-01' from dual union all
select 90210, date '2025-01-14' from dual union all
select 90210, date '2025-01-23' from dual union all
select 90210, date '2025-01-29' from dual union all
select 90210, date '2025-02-25' from dual union all
select 90210, date '2025-02-28' from dual union all
select 90210, date '2025-03-24' from dual union all
select 90210, date '2025-03-31' from dual union all
select 90210, date '2025-04-13' from dual union all
select 90210, date '2025-04-14' from dual union all
select 90210, date '2025-04-28' from dual union all
select 90210, date '2025-05-02' from dual union all
select 90210, date '2025-05-07' from dual union all
select 90210, date '2025-05-11' from dual union all
select 90210, date '2025-05-25' from dual
;
commit
;

The query:

with all_the_possibilities -- silver
  as (
       select aa.PARTY_ID
            , aa.PARTY_DATE
            , (
                select min( bb.PARTY_DATE ) -- we pick the next date here...
                  from PARTY_TABLE
                       bb
                 where bb.PARTY_ID = aa.PARTY_ID
                   and bb.PARTY_DATE >= aa.PARTY_DATE + 30 -- ... and here
              )
           as PARTY_DATE_NEXT
            , row_number()
         over (
                partition by aa.PARTY_ID
                    order by aa.PARTY_DATE
              )
           as PARTY_ORDER -- by imposing a partial ordering on the dates...
         from PARTY_TABLE
              aa
     )
   , just_the_answer -- gold
  as (
       select aa.PARTY_ID
            , aa.PARTY_DATE
         from all_the_possibilities
              aa
        start -- ... we facilitate the search for the earliest date
         with aa.PARTY_ORDER = 1
      connect
           by aa.PARTY_ID = prior aa.PARTY_ID
          and aa.PARTY_DATE = prior aa.PARTY_DATE_NEXT
     )
    -- and now, the proverbial presentation layer -- brownfield
select zz.*
  from just_the_answer
       zz
 order
    by zz.PARTY_ID
     , zz.PARTY_DATE
;

The result:

|----------+------------|
| PARTY_ID | PARTY_DATE |
|----------+------------|
|    12345 | 2023-01-01 |
|    12345 | 2023-02-04 |
|    12345 | 2023-03-30 |
|    90210 | 2024-06-19 |
|    90210 | 2024-08-02 |
|    90210 | 2024-09-03 |
|    90210 | 2024-10-03 |
|    90210 | 2024-11-02 |
|    90210 | 2024-12-08 |
|    90210 | 2025-01-14 |
|    90210 | 2025-02-25 |
|    90210 | 2025-03-31 |
|    90210 | 2025-05-02 |
|----------+------------|

1 Comment

Whoops, I realized that you were not the original poster, but I replied as if, and even modified the question accordingly :-) Either way, your example helped clarify the question, and as OP never replied to everything nor accepted, let's say that your clarification is the new specification.
2

MODEL clause can do it:

with data(party_id, data_date)  as (
    select 12345, to_date('01.01.2023', 'dd.mm.yyyy') union all
    select 12345, to_date('04.02.2023', 'dd.mm.yyyy') union all
    select 12345, to_date('05.02.2023', 'dd.mm.yyyy') union all
    select 12345, to_date('30.03.2023', 'dd.mm.yyyy') union all
    select 12345, to_date('31.03.2023', 'dd.mm.yyyy') union all
    select 12345, to_date('04.04.2023', 'dd.mm.yyyy') -- union all
),
rdata as (
    select 
        row_number() over(partition by party_id order  by data_date) as id,
        party_id, data_date
    from data
)
select party_id, data_date from (
    select * from rdata
    model
        partition by (party_id)
        dimension by (id)
        measures( data_date as data_date, cast(null as date) as latest_date )
        rules
        (
            latest_date[any] = 
                nvl2(
                    latest_date[cv()-1],
                      case when data_date[cv()] < latest_date[cv()-1] + 30
                        then latest_date[cv()-1] else data_date[cv()] end
                    , data_date[cv()]
                ),      
            data_date[any] = 
                nvl2(latest_date[cv()-1],
                    case when
                        data_date[cv()] >= latest_date[cv()-1] + 30 then data_date[cv()] end,
                    data_date[cv()])
                    
        )
)
where data_date is not null
;

5 Comments

when I run the query for whole of my table. For a party id that I selected randomly, it gives '1.12.2024' and '2.12.2024' outputs. But the other output dates are true. For another party_id I also noticed that it gave 1 day difference false date. How can I solve it for the above given 'model clause' query? Thanks in advance
extract the offending data and supply it here.
This query produces incorrect results. When executed against the test data in my answer, for party_id = 90210 it generates a row for 2024-10-03 (correct) as well as a row for 2024-10-17 (incorrect), with a cascading effect on all subsequent dates for that party_id.
Indeed but it's just a matter of replacing the "case when data_date[cv()] <= latest_date[cv()-1] + 30" by "case when data_date[cv()] < latest_date[cv()-1] + 30"
@p3consulting, confirmed, the query generates the correct result once the change has been applied.
2

You can use a recursive Common Table Expression to automate what you do manually.
And in that recursive part, the lag() window function will help you compare each row with the preceding one.

By eliminating entries too near from the previously chosen ones

/!\ Does not work on Oracle 11 (every date gets returned)

You'll start with all entries tagged as "hesitating" (if it's a 30-day period start or not),
then confirm as "firsts" the ones with no other date in the preceding 30 days,
confirm as "not firsts" those in the 30-day period after a just confirmed "first",
then reevaluates if this allows other "hesitating" to be confirmed "firsts",
and so on.

with
    -- Index our entries:
    i as (select row_number() over (partition by party_id order by data_date) id, t.* from t),
    -- Know whose election as a "first" will make each entry masked for good.
    l(id, party_id, data_date, pass, kind) as
    (
        -- "pass" increments to keep hesitating entries queued for the next iteration
        -- "kind":
        --   1: first of a serie
        --   0: don't know yet if first of serie or not
        --  -1: confirmed masked (follows a confirmed first)
        --  -2: first, but finished (has no more followers to evaluate)
        select i.*, 0 pass, 0 kind from i
            union all
        select id, party_id, data_date, pass + 1,
            case
                when kind = 0 then
                      case
                        -- Is the preceding entry more than 30 days before? We're a first!
                        when lag(data_date) over (partition by party_id order by data_date) is null then 1
                        when lag(data_date) over (partition by party_id order by data_date) <= data_date - interval '30' day then 1
                        -- Else (if preceding is less than 30 days away), if said preceding entry is itself a first, we're marked to mask.
                        when lag(kind) over (partition by party_id order by data_date) = 1 then -1
                        -- Still not sure.
                        else 0
                    end
                -- If we are a first but have no more followers waiting for us, get away.
                when kind = 1 and coalesce(lead(kind) over (partition by party_id order by data_date), 1) <> 0 then -2
                else kind
            end
        from l
        where kind >= 0 -- Only work with confirmed firsts, and still hesitating ones.
        and pass < 99 -- In case I missed something...
    )
select party_id, data_date from i where (party_id, id) in (select party_id, id from l where kind = -2)
order by party_id, data_date;

Here is a demo for your 12345 and 90210 parties.

(this was slightly adapted from an answer for the same problem in PostgreSQL)

By jumping 30 days by 30 days

A more efficient way (still based on recursive CTE and lag()) is, after each step of choosing to display a non-preceded dates, to directly jump to "the next date after 30 days have passed".

This relies on range with interval, which is supported on Oracle 11g (and maybe before?).

with
  -- Identify unambiguous window starts: those with no predecessor in the 30 previous days.
  maybe as
  (
    select
      t.*,
      row_number() over (partition by party_id order by data_date) num, -- Will ease our reading of results.
      -- startpoint:
      -- - true: confirmed start of a new window
      -- - null: maybe, maybe not; will be later decided (depending on if the previous data_date (nearer than 20 days ago), has itself been eaten by a previous window (thus let us be a new start) or not (then the previous is a start and we're eaten by it)).
      case when lag(data_date) over (partition by party_id order by data_date) >= data_date - interval '30' day then null else 1 end startpoint
    from t
  ),
  -- Continents of data_date never more than 30 days far one from another.
  c as
  (
    select
      maybe.*,
      -- Identify it by the num of the unambiguous starting point.
      max(case when startpoint = 1 then num end) over (partition by party_id order by data_date) continent,
      -- Now attributes for *hypothetical* new island starts:
      -- for each data_date, choose its successor in case this one becomes an island start
      -- The successor is the first row from the same continent, but further than 30 days from this one.
      min(num) over (partition by party_id order by data_date range between interval '30' day following and unbounded following) successor,
      -- Number of rows which would belong to this 30 days window (in case the current row is a window start).
      count(1) over (partition by party_id order by data_date range between current row and interval '30' day following) n_included
    from maybe
  ),
  -- Now iterate starting from the continents,
  -- to see if we can determine islands within them.
  -- (each start of island "eats" the 30 following days, so the first row after 30 days can be elected as the start of a new island)
  i(party_id, data_date, num, startpoint, continent, successor, n_included) as
  (
    select * from c where startpoint = 1
    union all
    select nexti.party_id, nexti.data_date, nexti.num, nexti.startpoint, nexti.continent, nexti.successor, nexti.n_included -- Need to deploy the * on Oracle 11.
    -- Do not forget to filter on island, as successor has been computed without this criteria (before we had determined islands).
    from i join c nexti on nexti.party_id = i.party_id and nexti.continent = i.continent
    -- EVERY filter has to be put in the "on" clause of the join, not in a separate "where";
    -- or we'll get an ORA-32044: cycle detected while executing recursive WITH query
    -- So here put an and instead of a where:
    and nexti.num = i.successor
  )
select * from i order by party_id, num;

This has been put in a fiddle.

The solution is a port of what I proposed on PostgreSQL.

10 Comments

Hello. I appreciate for your help. Unfortunately I received ORA-30484 "Invalid window function" error message in the below given line: coalesce(lag(data_date) over byparty < data_date - interval '30' day, true)
I adapted the query to work on Oracle 18 (but does not work on 11.2, seemingly due to a limitation on window functions in recursive CTEs back then). As asked in the answer, could you tell us the version of Oracle you are running on? This is the quickest way to adapt the query to work on it.
OCI: Version 11.1. Version 15.0.3.2059 (64bit).
@VadimK. As your original question did not contain any date difference of exactly 30 days, I couldn't know what to do in this case, so I supposed that "more than 30 days" meant "at least 31 days". The data that you posted on June, 8th is more clear about it, so I understand it is "30 days or more": then just change > data_date - interval '30' day to >= data_date - interval '30' day. I did it for the first query in dbfiddle.uk/aKuD0CHW, with your new dataset; I'll adapt the second query too and modify my answer.
the second query is no longer generating that error. This join vs where thing is a nice discovery, I'll have to add that to my notes. What a great question, with so many different approaches to generate an answer.
|
1

Using recursive query.
First, we take prev_date for every row.
Next row in desired sequence is when data_date>=30 days from selected and prev_date<30 days from selected.
UPDATE1 Anchor part of recursion is first row for every party_id and rows with dates that are more than or equal to 30 days apart from the previous one. Such lines are obviously included in the result.
This will significantly reduce the need for recursion levels.

We can see that no recursion was needed for PARTY_ID=12345.

PARTY_ID PARTY_DATE Anchor part   Recursive part
12345   01-JAN-23 -- first row  
12345   04-FEB-23 -- >=30 days  
12345   05-FEB-23     --          --  
12345   30-MAR-23 -- >=30 days  
12345   31-MAR-23     --          --   
12345   04-APR-23     --          --  

for PARTY_ID=90210 anchor part includes 3 rows.

PARTY_ID PARTY_DATE Anchor part   Recursive part
90210   19-JUN-24 -- first row  
90210   02-AUG-24 -- >=30 days  
90210   03-SEP-24 -- >=30 days  
90210   07-SEP-24     --          --   
90210   03-OCT-24     --          V   
90210   17-OCT-24
90210   18-OCT-24
90210   31-OCT-24
90210   02-NOV-24     --          V   
90210   18-NOV-24
90210   24-NOV-24
90210   08-DEC-24     --          V   
...

In recursive part we search rows, where days between party_date and previous row date is less than 30.

with dataRanges(PARTY_ID,PARTY_DATE,prev_date) as(
   select t.*
     ,lag(PARTY_DATE)over(partition by party_id order by PARTY_DATE) prev_date
   from PARTY_TABLE t
)
,r(PARTY_ID,PARTY_DATE,prev_date,lvl) as (
   (select t.PARTY_ID,t.PARTY_DATE,t.prev_date, 0 lvl
    from dataRanges t
    where t.prev_date is null -- first row
        or t.PARTY_DATE>=(t.prev_date+30)
   )
  union all
   (select t.PARTY_ID,t.PARTY_DATE,t.prev_date, lvl+1 lvl
    from dataRanges t  inner join  r on  t.party_id=r.party_id 
        and t.PARTY_DATE<(t.prev_date+30)
        and t.PARTY_DATE>=(r.PARTY_DATE+30) 
        and t.prev_date<(r.PARTY_DATE+30)
   )  
)
select r.PARTY_ID,PARTY_DATE,lvl 
from r
order by PARTY_ID, PARTY_DATE
PARTY_ID PARTY_DATE LVL
12345 01-JAN-23 0
12345 04-FEB-23 0
12345 30-MAR-23 0
90210 19-JUN-24 0
90210 02-AUG-24 0
90210 03-SEP-24 0
90210 03-OCT-24 1
90210 02-NOV-24 2
90210 08-DEC-24 3
90210 14-JAN-25 4
90210 25-FEB-25 5
90210 31-MAR-25 6
90210 02-MAY-25 7

Detailed query output (see in fiddle)

PARTY_ID PARTY_DATE PREV_DATE LVL RN DIFF
12345 01-JAN-23 null 0 1 null
12345 04-FEB-23 01-JAN-23 0 2 34
12345 30-MAR-23 05-FEB-23 0 3 53
90210 19-JUN-24 null 0 1 null
90210 02-AUG-24 19-JUN-24 0 2 44
90210 03-SEP-24 02-AUG-24 0 3 32
90210 03-OCT-24 07-SEP-24 1 3 26
90210 02-NOV-24 31-OCT-24 2 3 2
90210 08-DEC-24 24-NOV-24 3 3 14
90210 14-JAN-25 01-JAN-25 4 3 13
90210 25-FEB-25 29-JAN-25 5 3 27
90210 31-MAR-25 24-MAR-25 6 3 7
90210 02-MAY-25 28-APR-25 7 3 4

fiddle

Source data from @VadimK. example see in fiddle.

OLD QUERY

with dataRanges(PARTY_ID,DATA_DATE,prev_date) as(
   select t.*
     ,lag(data_date,1,data_date)over(partition by party_id order by data_date) prev_date
   from test t
)
,r(PARTY_ID,DATA_DATE,prev_date) as (
  (  select t.PARTY_ID,t.DATA_DATE,t.prev_date
   from dataRanges t
   where t.data_date=(select min(data_date) from test t2 where t2.party_id=t.party_id)
  )
  union all
 (select t.PARTY_ID,t.DATA_DATE,t.prev_date
  from dataRanges t  inner join  r on  t.party_id=r.party_id 
     and t.data_date>(r.data_date+30) and  t.prev_date<(r.data_date+30)
 )  
)
select r.* from r;

With sample data

PARTY_ID DATA_DATE
12345 01-JAN-23
12345 04-FEB-23
12345 05-FEB-23
12345 30-MAR-23
12345 31-MAR-23
12345 04-APR-23

output is

PARTY_ID DATA_DATE PREV_DATE
12345 01-JAN-23 01-JAN-23
12345 04-FEB-23 01-JAN-23
12345 30-MAR-23 05-FEB-23

fiddle

2 Comments

This query generates incorrect results. When executed against the test data in my answer, for PARTY_ID = 90210 the first three rows are correct but all subsequent rows that should be in the output are missing.
@VadimK., thank you for comment. I updated my answer with your test data. And condition changed from >30 to >=30.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.