mirror of https://github.com/FirebirdSQL/firebird.git synced 2025-01-22 17:23:03 +01:00

History

Dmitry Yemanov 213bc1c2b8 Front-copied BLOB_APPEND readme to master		2022-10-21 10:04:53 +03:00
..
README.aggregate_filter.md	Syntax correction FILTER	2018-11-30 14:56:52 +01:00
README.aggregate_tracking	Please give some comments	2003-07-29 00:55:34 +00:00
README.alternate_string_quoting.txt	- External Engines implementation.	2009-10-21 00:42:38 +00:00
README.autonomous_transactions.txt	Merge branch 'query_restarts_2'	2020-03-28 12:20:04 +02:00
README.blob_append.md	Front-copied BLOB_APPEND readme to master	2022-10-21 10:04:53 +03:00
README.builtin_functions.txt	Docs about FirebirdSQL/firebird-documentation#173	2022-07-13 22:19:48 -03:00
README.case	update readme's with method used for determing result data type	2005-02-01 23:55:56 +00:00
README.coalesce	update readme's with method used for determing result data type	2005-02-01 23:55:56 +00:00
README.column_type_psql.txt	The merge is being fixed.	2008-01-16 13:38:46 +00:00
README.common_table_expressions	Fixed some warnings	2009-04-03 10:49:07 +00:00
README.context_variables	Reworked syntax for new error context vars, based on feedback from the other devs.	2016-12-02 10:18:47 +03:00
README.context_variables2	Remove the WNET protocol (#7082 )	2022-01-05 10:34:07 +03:00
README.cumulative_roles.txt	Fixed CORE-5593: System function RDB$ROLE_IN_USE cannot take long role names with lower-case letters	2018-06-22 17:57:53 +03:00
README.current_time	Typo.	2005-10-21 09:49:22 +00:00
README.cursor_variables.txt	Fixed CORE-4488 - Wrong results of FOR SELECT <L> FROM <T> AS CURSOR <C> and table <T> is modified inside cursor's begin...end block.	2014-07-21 02:37:58 +00:00
README.cursors	Documentation for CORE-4403 - Allow referencing cursors as record variables in PSQL.	2014-06-16 16:04:55 +00:00
README.data_type_results_of_aggregations.txt	Corrections.	2007-04-09 14:39:03 +00:00
README.data_types	Documentation	2020-06-24 13:11:29 +03:00
README.db_triggers.txt	Misc.	2007-12-04 08:23:13 +00:00
README.ddl_access.txt	Correct description	2016-09-28 18:06:17 +03:00
README.ddl_triggers.txt	Improvement CORE-4923 - Add ability to track domains rename in DDL triggers.	2015-09-02 16:24:31 +00:00
README.ddl.txt	Merge branch 'work/profiler-plugin' into work/profiler-plugin-merge	2022-08-10 22:26:38 -03:00
README.default_parameters	Correctly written my name and e-mail	2008-05-13 07:26:32 +00:00
README.derived_tables.txt	Documentation	2020-04-06 11:32:31 +03:00
README.distinct	More docs.	2005-01-22 21:03:03 +00:00
README.domains_psql.txt	The merge is being fixed.	2008-01-16 13:38:46 +00:00
README.exception_handling	More context variables for error handlers, see CORE-1132 and CORE-2040. (#46 )	2016-09-06 21:12:03 +03:00
README.execute_block	Correctly written my name and e-mail	2008-05-13 07:26:32 +00:00
README.execute_statement	Misc, adding example.	2006-07-31 07:42:47 +00:00
README.execute_statement2	Let external transaction run with CONCURRENCY isolation mode if local transaction runs with READ COMMITED READ CONSISTENCY isolation and such isolation mode is not supported by external data source.	2019-11-15 10:54:50 +02:00
README.explicit_locks	Fix typo	2003-07-31 11:54:46 +00:00
README.expression_indices	Fixed some warnings	2009-04-03 10:49:07 +00:00
README.external_connections_pool	Let external transaction run with CONCURRENCY isolation mode if local transaction runs with READ COMMITED READ CONSISTENCY isolation and such isolation mode is not supported by external data source.	2019-11-15 10:54:50 +02:00
README.floating_point_types.md	CORE-6109 SQL standard FLOAT type syntax support (#218 )	2019-08-02 20:55:34 +03:00
README.global_temporary_tables	Fixed bug CORE-6043 : GTTs do not release used space	2019-04-12 15:44:18 +03:00
README.hex_literals.txt	Update readme for #5588 .	2021-05-13 14:11:29 -03:00
README.identity_columns.txt	Feature CORE-5463 - Support GENERATED ALWAYS identity columns and OVERRIDE clause.	2017-02-24 23:03:04 -03:00
README.iif	More docs.	2005-01-22 21:13:15 +00:00
README.isc_info_xxx	Renames requested by Vlad Khorsun	2020-06-16 14:10:18 +02:00
README.joins.txt	Doc. for new join types	2007-05-03 01:32:08 +00:00
README.keywords	Merge pull request #6793 from FirebirdSQL/work/gh-6788	2021-05-14 20:48:01 +03:00
README.leave_labels	Added some docs.	2005-01-06 18:32:22 +00:00
README.length	Merge INTL branch into HEAD	2005-05-27 22:45:31 +00:00
README.linger	Implemented CORE-4263: Database linger	2013-11-14 16:16:24 +00:00
README.list	Updated LIST documentation - thanks to Maycon Ferraça	2007-05-09 15:23:33 +00:00
README.management_statements_psql.md	Started with doc for a feature	2019-12-06 18:35:43 +03:00
README.mapping.html	Misc docs	2020-04-13 14:20:53 +03:00
README.merge.txt	Feature #6681 and fix for #6942 .	2021-09-02 10:00:15 -03:00
README.null_value	Placeholders for further docs.	2005-01-22 21:18:33 +00:00
README.nullif	update readme's with method used for determing result data type	2005-02-01 23:55:56 +00:00
README.offset_fetch.txt	CORE-4526 Support for SQL:2008 OFFSET and FETCH clauses	2014-11-29 15:39:59 +00:00
README.order_by_expressions_nulls	Corrections for v2.0 and above.	2007-07-03 12:10:22 +00:00
README.packages.txt	Correction.	2013-05-13 15:56:29 +00:00
README.partial_indices	Added the docs	2022-08-12 16:00:15 +03:00
README.plan	Minor fixes.	2005-01-25 06:24:30 +00:00
README.profiler.md	Design for future extensions with different timings types.	2022-06-22 22:25:47 -03:00
README.PSQL_stack_trace.txt	Correctly written my name and e-mail	2008-05-13 07:26:32 +00:00
README.regr_functions.txt	Cleanup: get rid of unused variables.	2021-09-21 09:47:30 -03:00
README.returning	Feature #6815 - Support multiple rows for DML RETURNING.	2021-08-25 14:56:20 -03:00
README.rows	CORE-4526 Support for SQL:2008 OFFSET and FETCH clauses	2014-11-29 15:39:59 +00:00
README.savepoints	Added docs for savepoints	2003-07-29 11:33:26 +00:00
README.scrollable_cursors.txt	Fixed a mistake.	2015-03-21 05:15:25 +00:00
README.select_expressions	Corrections, thanks to Claudio.	2006-01-10 07:59:54 +00:00
README.sequence_generators	Minor fixes.	2005-01-25 06:24:30 +00:00
README.set_bind.md	Fixed CORE-6303: Error writing to TIMESTAMP/TIME WITH TIME ZONE array	2020-07-21 19:07:04 +03:00
README.set_role	Added doc	2014-11-14 08:24:43 +00:00
README.set_transaction.txt	Add documentation for the extra options exposed through the DSQL's SET TRANSACTION command. This functionality exists already using TPB's.	2005-12-01 02:34:47 +00:00
README.similar_to.txt	SIMILAR TO also requires escape of }	2021-04-27 15:29:47 +02:00
README.sql_security.txt	Fixed CORE-5892: SQL SECURITY DEFINER context is not properly evaluated for monitoring tables (#196 )	2019-03-18 14:17:02 +03:00
README.statistical_functions.txt	Feature CORE-4717 - Aggregate statistical functions COVAR_SAMP, COVAR_POP and CORR - contributed by Hajime Nakagami.	2015-03-23 02:55:42 +00:00
README.subroutines.txt	Improvement #4769 - Allow sub-routines to access variables/parameters	2022-01-30 12:15:03 -03:00
README.substring_similar.txt	Fix syntax error in substring-similar doc	2021-04-05 13:07:39 +02:00
README.time_zone.md	Update README.time_zone.md	2021-11-07 18:00:42 +03:00
README.trim	Merge INTL branch into HEAD	2005-05-27 22:45:31 +00:00
README.universal_triggers	Fixed typos.	2005-12-03 11:09:39 +00:00
README.update_or_insert	Feature #6815 - Support multiple rows for DML RETURNING.	2021-08-25 14:56:20 -03:00
README.user_management	Implemented CORE-6279: Put options in user management statements in any order	2020-04-08 14:17:16 +03:00
README.view_updates	Placeholders for further docs.	2005-01-22 21:18:33 +00:00
README.window_functions.md	Improve named window documentation re. name scopes.	2018-12-16 18:27:00 -02:00

README.window_functions.md

Window Functions

By the SQL specification, window functions (also know as analytical functions) are a kind of aggregation, but which does not "filter" the result set of a query. The aggregated data is mixed with the query result set.

That sort of functions are used with the OVER clause. Window functions may appear only in the select list or the ORDER BY clause of a query.

Additional to the OVER clause, Firebird window functions may use partitions, order and frames (FB 4.0).

Syntax:

<window function> ::=
  <window function name>([<expr> [, <expr> ...]])
    OVER {<window specification> | <existing window name>}

<window specification> ::=
  ([<existing window name>] [<window partition>] [<window order>] [<window frame>])

<window partition> ::=
  PARTITION BY <expr> [, <expr> ...]

<window order> ::=
  ORDER BY <expr> [<direction>] [<nulls placement>] [, <expr> [<direction>] [<nulls placement>]] ...

<window frame> ::=
  {RANGE | ROWS} <window frame extent>

<window frame extent> ::=
  {<window frame start> | <window frame between>}

<window frame start> ::=
  {UNBOUNDED PRECEDING | <expr> PRECEDING | CURRENT ROW}

<window frame between> ::=
  BETWEEN {UNBOUNDED PRECEDING | <expr> PRECEDING | <expr> FOLLOWING | CURRENT ROW} AND
          {UNBOUNDED FOLLOWING | <expr> PRECEDING | <expr> FOLLOWING | CURRENT ROW}

<direction> ::=
  {ASC | DESC}

<nulls placement> ::=
  NULLS {FIRST | LAST}

<query spec> ::=
  SELECT
    [<limit clause>]
    [<distinct clause>]
    <select list>
    [<where clause>]
    [<group clause>]
    [<having clause>]
    [<named windows clause>]
    [<plan clause>]

<named windows clause> ::=
  WINDOW <window definition> [, <window definition>] ...

<window definition> ::=
  <new window name> AS <window specification>

1. Aggregate functions used as window functions

All aggregate functions may be used as window functions, adding the OVER clause. Imagine a table EMPLOYEE with columns ID, NAME and SALARY, and the need to show each employee with his respective salary and the percentage of his salary over the payroll. With a "normal" query, this is possible in the following manner:

select
    id,
    department,
    salary,
    salary / (select sum(salary) from employee) percentage
  from employee
  order by id;

Results:

id	department	salary	percentage
1	R & D	10.00	0.2040
2	SALES	12.00	0.2448
3	SALES	8.00	0.1632
4	R & D	9.00	0.1836
5	R & D	10.00	0.2040

It's necessary to repeat the query in a subquery and wait so much to see the results, specially if EMPLOYEE is a complex view.

The same query could be specified in much more elegant and faster way using a window function:

select
    id,
    department,
    salary,
    salary / sum(salary) over () percentage
  from employee
  order by id;

Here, sum(salary) over () is computed with the sum of all SALARY from the query (the employee table).

2. Partitioning

Like aggregate functions, that may operate alone or in relation to a group, window functions may also operate on a group, which is called "partition". Its syntax is:

When aggregation is done over a group, it could produce more than one row. So the result set generated by a partition is joined with the main query using the same expression list of the partition.

Continuing the employee example, instead of get the percentage of the employee salary over all employees, we would like to get the percentage based only on the employees in the same department:

select
    id,
    department,
    salary,
    salary / sum(salary) over (partition by department) percentage
  from employee
  order by id;

Results:

id	department	salary	percentage
1	R & D	10.00	0.3448
2	SALES	12.00	0.6000
3	SALES	8.00	0.4000
4	R & D	9.00	0.3103
5	R & D	10.00	0.3448

3. Ordering

The ORDER BY sub-clause can be used with or without partitions, and used with the standard aggregate functions and without frames, make them return the partial aggregations as the records are being processed. Example:

select
    id,
    salary,
    sum(salary) over (order by salary) running_salary
  from employee
  order by salary;

The result set produced will be:

id	salary	running_salary
3	8.00	8.00
4	9.00	17.00
1	10.00	37.00
5	10.00	37.00
2	12.00	49.00

Then running_salary returns the partial/accumulated (or running) aggregation (of the SUM function). It may appear strange that 37.00 is repeated for the ids 1 and 5, but that is how it should work. The ORDER BY keys are grouped together and the aggregation is computed once (but summing the two 10.00). To avoid this, you can add the ID field to the end of the ORDER BY clause.

The fact of running aggregations and grouping results by the order key happens because when an explicit frame is not used, it computes the window functions as a RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame. More on frames later in this document.

It's possible to use multiple windows with different orders, and ORDER BY parts like ASC / DESC and NULLS FIRST / NULLS LAST.

With a partition, ORDER BY works the same way, but at each partition boundary the aggregation is reset.

All aggregation functions are usable with ORDER BY, except the LIST function.

4. Exclusive window functions

Beyond aggregate functions, there is also exclusive window functions, currently divided in ranking and navigational categories.

Both set of functions can be used with/without partition/ordering, but the usage does not make much sense without ordering.

4.1. Ranking functions

Syntax:

<ranking window function> ::=
    DENSE_RANK() |
    RANK() |
    PERCENT_RANK() |
    CUME_DIST() |
    NTILE(<expr>) |
    ROW_NUMBER()

The rank functions compute the ordinal rank of a row within the window partition. In this category are the functions: DENSE_RANK, RANK and ROW_NUMBER.

With these functions, one can create different type of incremental counters. Think about SUM(1) OVER (ORDER BY SALARY), these functions do this type of thing, but all of them in different ways. Following is an example query, also comparing with the SUM behavior.

select
    id,
    salary,
    dense_rank() over (order by salary),
    rank() over (order by salary),
    percent_rank() over (order by salary),
    cume_dist() over (order by salary),
    ntile(3) over (order by salary),
    row_number() over (order by salary),
    sum(1) over (order by salary)
  from employee
  order by salary;

And the result set:

id	salary	dense_rank	rank	percent_rank	cume_dist	ntile	row_number	sum
3	8.00	1	1	0.000000000000000	0.2000000000000000	1	1	1
4	9.00	2	2	0.250000000000000	0.4000000000000000	1	2	2
1	10.00	3	3	0.500000000000000	0.8000000000000000	2	3	4
5	10.00	3	3	0.500000000000000	0.8000000000000000	2	4	4
2	12.00	4	5	1.000000000000000	1.0000000000000000	3	5	5

The difference between DENSE_RANK and RANK is that there is a gap related to duplicate rows (in relation to the window ordering) only in RANK. DENSE_RANK continues assigning sequential numbers after the duplicate salary. On the other hand, ROW_NUMBER always assigns sequential numbers, even when there is duplicate values.

PERCENT_RANK is a ratio of RANK to group count.

CUME_DIST is cumulative distribution of a value in a group.

NTILE distributes the rows into a specified number of groups. NTILE argument is restricted to integral positive literal, variable (:var) and DSQL parameter (?).

4.2. Navigational functions

Syntax:

<navigational window function> ::=
    FIRST_VALUE(<expr>) |
    LAST_VALUE(<expr>) |
    NTH_VALUE(<expr>, <offset>) [FROM FIRST | FROM LAST] |
    LAG(<expr> [ [, <offset> [, <default> ] ] ) |
    LEAD(<expr> [ [, <offset> [, <default> ] ] )

The navigational functions gets the simple (non-aggregated) value of an expression from another row (inside the same partition) of the query.

It's important to note that FIRST_VALUE, LAST_VALUE and NTH_VALUE also operates on a window frame, and the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This is likely to get strange results for NTH_VALUE and specially LAST_VALUE. More on frames later in this document.

select
    id,
    salary,
    first_value(salary) over (order by salary),
    last_value(salary) over (order by salary),
    nth_value(salary, 2) over (order by salary),
    lag(salary) over (order by salary),
    lead(salary) over (order by salary)
  from employee
  order by salary;

And the result set:

id	salary	first_value	last_value	nth_value	lag	lead
3	8.00	8.00	8.00	<null>	<null>	9.00
4	9.00	8.00	9.00	9.00	8.00	10.00
1	10.00	8.00	10.00	9.00	9.00	10.00
5	10.00	8.00	10.00	9.00	10.00	12.00
2	12.00	8.00	12.00	9.00	10.00	<null>

FIRST_VALUE and LAST_VALUE gets respectively the first and last value of the ordered partition.

NTH_VALUE gets the n-th value, starting from the first (default) or the last record, from the ordered partition. If offset is 1 from first, it's equivalent to FIRST_VALUE. If offset is 1 from last, it's equivalent to LAST_VALUE.

LAG and LEAD get the value within a distance respect to the current row and the offset (which defaults to 1) passed. In the case the offset points to outside of the partition, the default parameter (which defaults to NULL) is returned. LAG looks for a preceding row, and LEAD for a following row.

5. Frames (FB 4.0)

It's possible to specify the frame that some window functions work. The frame is divided in three piecies: unit, start bound and end bound.

The unit RANGE or ROWS defines how the bounds <expr> PRECEDING, <expr> FOLLOWING and CURRENT ROW works.

With RANGE, the ORDER BY should specify only one expression, and that expression should be of a numeric, date, time or timestamp type. For <expr> PRECEDING and <expr> FOLLOWING bounds, <expr> is respectively subtracted or added to the order expression, and for CURRENT ROW only the order expression is used. Then, all rows (inside the partition) between the bounds are considered part of the resulting window frame.

With ROWS, order expressions is not limited by number or types. In this case, <expr> PRECEDING, <expr> FOLLOWING and CURRENT ROW relates to the row position under the partition, and not to the order keys values.

UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING work identically with RANGE and ROWS. UNBOUNDED PRECEDING looks for the first row and UNBOUNDED FOLLOWING the last one, always inside the partition.

The frame syntax with <window frame start> specifies the start frame, with the end frame being CURRENT ROW.

When ORDER BY window clause is used but frame clause is omitted, it defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This fact makes the query below to produce "weird" behaviour for the "sum_salary" column. It sums from the partition start to the current key, instead of sum the whole partition.

select
    id,
    salary,
    sum(salary) over (order by salary) sum_salary
  from employee
  order by salary;

id	salary	sum_salary
3	8.00	8.00
4	9.00	17.00
1	10.00	37.00
5	10.00	37.00
2	12.00	49.00

You can explicitely set a frame to sum the whole partition as following query:

select
    id,
    salary,
    sum(salary) over (
      order by salary
      rows between unbounded preceding and unbounded following
    ) sum_salary
  from employee
  order by salary;

id	salary	sum_salary
3	8.00	49.00
4	9.00	49.00
1	10.00	49.00
5	10.00	49.00
2	12.00	49.00

While this query "fixes" the weird nature of the default frame clause, it produces a result similar to a simple OVER () clause without ORDER BY.

We can use a range frame to compute the count of employees with salaries between his salary - 1 and his salary + 1 with this query:

select
    id,
    salary,
    count(*) over (
      order by salary
      range between 1 preceding and 1 following
    ) range_count
  from employee
  order by salary;

id	salary	range_count
3	8.00	2
4	9.00	4
1	10.00	3
5	10.00	3
2	12.00	1

Some window functions discard frames. ROW_NUMBER, LAG and LEAD always work as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. And DENSE_RANK, RANK, PERCENT_RANK and CUME_DIST always work as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

FIRST_VALUE, LAST_VALUE and NTH_VALUE respect frames, but the RANGE unit works identically as ROWS.

6. Named windows (FB 4.0)

To avoid write repetitive or confusing expressions, windows can be named in a query with the WINDOW clause. A named window can be used in OVER to reference a window definition and can also be used as a base window of another named or inline (OVER) window. A window with frame (ROWS or RANGE clauses) can't be used as base window (but can be used with OVER <window name>). And a window with a base window can't have PARTITION BY nor can override ORDER BY of a base window.

In a query with multiple SELECT and WINDOW clauses (for example, with subqueries), the window name scope is bound only to its query context, that is, a window name from an inner or outer context could not be used in another context. As such, the same window name definition could be used at different contexts.

Example query with named windows:

select
    id,
    department,
    salary,
    count(*) over w1,
    first_value(salary) over w2,
    last_value(salary) over w2
  from employee
  window w1 as (partition by department),
         w2 as (w1 order by salary)
  order by department, salary;

Author: Adriano dos Santos Fernandes