Skip to content

Commit 80d67e4

Browse files
antamelCommitfest Bot
authored andcommitted
Add knn support to btree indexes
This commit implements support for knn scans in btree indexes. When knn search is requested, btree index is traversed ascending and descending simultaneously. At each step the closest tuple is returned. Filtering operators can reduce knn to regular ordered scan. Ordering operators are added to opfamilies of scalar datatypes. No extra supporting functions are required: knn-btree algorithm works using comparison function and ordering operator itself. Distance operators are not leakproof, because they throw error on overflow. Therefore we relax opr_sanity check for btree ordering operators. It's OK for them to be leaky while comparison function is leakproof. Catversion is bumped. Discussion: https://postgr.es/m/ce35e97b-cf34-3f5d-6b99-2c25bae49999%40postgrespro.ru Author: Nikita Glukhov Reviewed-by: Robert Haas, Tom Lane, Anastasia Lubennikova, Alexander Korotkov
1 parent feafce1 commit 80d67e4

File tree

21 files changed

+2526
-171
lines changed

21 files changed

+2526
-171
lines changed

doc/src/sgml/btree.sgml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,53 @@
200200
planner relies on them for optimization purposes.
201201
</para>
202202

203+
<para>
204+
In order to implement the distance ordered (nearest-neighbor) search,
205+
one needs to define a distance operator (usually it's called
206+
<literal>&lt;-&gt;</literal>) with a correpsonding operator family for
207+
distance comparison in the operator class. These operators must
208+
satisfy the following assumptions for all non-null values
209+
<replaceable>A</replaceable>, <replaceable>B</replaceable>,
210+
<replaceable>C</replaceable> of the data type:
211+
212+
<itemizedlist>
213+
<listitem>
214+
<para>
215+
<replaceable>A</replaceable> <literal>&lt;-&gt;</literal>
216+
<replaceable>B</replaceable> <literal>=</literal>
217+
<replaceable>B</replaceable> <literal>&lt;-&gt;</literal>
218+
<replaceable>A</replaceable>
219+
(<firstterm>symmetric law</firstterm>)
220+
</para>
221+
</listitem>
222+
<listitem>
223+
<para>
224+
if <replaceable>A</replaceable> <literal>=</literal>
225+
<replaceable>B</replaceable>, then <replaceable>A</replaceable>
226+
<literal>&lt;-&gt;</literal> <replaceable>C</replaceable>
227+
<literal>=</literal> <replaceable>B</replaceable>
228+
<literal>&lt;-&gt;</literal> <replaceable>C</replaceable>
229+
(<firstterm>distance equivalence</firstterm>)
230+
</para>
231+
</listitem>
232+
<listitem>
233+
<para>
234+
if (<replaceable>A</replaceable> <literal>&lt;=</literal>
235+
<replaceable>B</replaceable> and <replaceable>B</replaceable>
236+
<literal>&lt;=</literal> <replaceable>C</replaceable>) or
237+
(<replaceable>A</replaceable> <literal>&gt;=</literal>
238+
<replaceable>B</replaceable> and <replaceable>B</replaceable>
239+
<literal>&gt;=</literal> <replaceable>C</replaceable>),
240+
then <replaceable>A</replaceable> <literal>&lt;-&gt;</literal>
241+
<replaceable>B</replaceable> <literal>&lt;=</literal>
242+
<replaceable>A</replaceable> <literal>&lt;-&gt;</literal>
243+
<replaceable>C</replaceable>
244+
(<firstterm>monotonicity</firstterm>)
245+
</para>
246+
</listitem>
247+
</itemizedlist>
248+
</para>
249+
203250
</sect2>
204251

205252
<sect2 id="btree-support-funcs">

doc/src/sgml/indices.sgml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1193,6 +1193,17 @@ SELECT x FROM tab WHERE x = 'key' AND z &lt; 42;
11931193
make this type of scan very useful in practice.
11941194
</para>
11951195

1196+
<para>
1197+
B-tree indexes are also capable of optimizing <quote>nearest-neighbor</quote>
1198+
searches, such as
1199+
<programlisting><![CDATA[
1200+
SELECT * FROM events ORDER BY event_date <-> date '2017-05-05' LIMIT 10;
1201+
]]>
1202+
</programlisting>
1203+
which finds the ten events closest to a given target date. The ability
1204+
to do this is again dependent on the particular operator class being used.
1205+
</para>
1206+
11961207
<para>
11971208
<indexterm>
11981209
<primary><literal>INCLUDE</literal></primary>

doc/src/sgml/xindex.sgml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,10 @@
131131
<entry>greater than</entry>
132132
<entry>5</entry>
133133
</row>
134+
<row>
135+
<entry>distance</entry>
136+
<entry>6</entry>
137+
</row>
134138
</tbody>
135139
</tgroup>
136140
</table>
@@ -1320,7 +1324,8 @@ SELECT sum(x) OVER (ORDER BY x RANGE BETWEEN 5 PRECEDING AND 10 FOLLOWING)
13201324
<title>Ordering Operators</title>
13211325

13221326
<para>
1323-
Some index access methods (currently, only GiST and SP-GiST) support the concept of
1327+
Some index access methods (currently, B-tree, GiST and SP-GiST)
1328+
support the concept of
13241329
<firstterm>ordering operators</firstterm>. What we have been discussing so far
13251330
are <firstterm>search operators</firstterm>. A search operator is one for which
13261331
the index can be searched to find all rows satisfying

src/backend/access/brin/brin_minmax.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
typedef struct MinmaxOpaque
2424
{
2525
Oid cached_subtype;
26-
FmgrInfo strategy_procinfos[BTMaxStrategyNumber];
26+
FmgrInfo strategy_procinfos[BTMaxSearchStrategyNumber];
2727
} MinmaxOpaque;
2828

2929
static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
@@ -264,7 +264,7 @@ minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno, Oid subtype,
264264
MinmaxOpaque *opaque;
265265

266266
Assert(strategynum >= 1 &&
267-
strategynum <= BTMaxStrategyNumber);
267+
strategynum <= BTMaxSearchStrategyNumber);
268268

269269
opaque = (MinmaxOpaque *) bdesc->bd_info[attno - 1]->oi_opaque;
270270

@@ -277,7 +277,7 @@ minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno, Oid subtype,
277277
{
278278
uint16 i;
279279

280-
for (i = 1; i <= BTMaxStrategyNumber; i++)
280+
for (i = 1; i <= BTMaxSearchStrategyNumber; i++)
281281
opaque->strategy_procinfos[i - 1].fn_oid = InvalidOid;
282282
opaque->cached_subtype = subtype;
283283
}

src/backend/access/nbtree/README

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1081,3 +1081,25 @@ item is irrelevant, and need not be stored at all. This arrangement
10811081
corresponds to the fact that an L&Y non-leaf page has one more pointer
10821082
than key. Suffix truncation's negative infinity attributes behave in
10831083
the same way.
1084+
1085+
Nearest-neighbor search
1086+
-----------------------
1087+
1088+
B-tree supports a special scan strategy for nearest-neighbor (kNN) search,
1089+
which is used for queries with "ORDER BY indexed_column operator constant"
1090+
clause. See the following example.
1091+
1092+
SELECT * FROM tab WHERE col > const1 ORDER BY col <-> const2 LIMIT k
1093+
1094+
Unlike GiST and SP-GiST, B-tree supports kNN by the only one ordering operator
1095+
applied to the first indexed column.
1096+
1097+
At the beginning of kNN scan, we determine the scan strategy to use: normal
1098+
unidirectional or special bidirectional. If the second distance operand falls
1099+
into the scan range, then we use bidirectional scan, otherwise we use normal
1100+
unidirectional scan.
1101+
1102+
The bidirectional scan algorithm is quite simple. We start both forward and
1103+
backward scans starting from the tree location corresponding to the second
1104+
distance operand. Each time we need the next tuple, we return the nearest
1105+
tuple from two directions and advance scan in corresponding direction.

0 commit comments

Comments
 (0)