8
0
mirror of https://github.com/FirebirdSQL/firebird.git synced 2025-01-23 00:03:02 +01:00
firebird-mirror/doc/sql.extensions/README.similar_to.txt
asfernandes 17c20e8392 1) Fixed CORE-1935 – SIMILAR TO character classes are incorrectly recognized.
2) Improve the documentation after some questions from Claudio.
2008-06-14 02:17:20 +00:00

223 lines
7.9 KiB
Plaintext

====================
SIMILAR TO predicate
====================
Function:
SIMILAR TO predicate verifies if a given regular expression (per the SQL standard) matches a
string. It may be used in all the places that accept boolean expressions, like in WHERE and check
constraints.
Author:
Adriano dos Santos Fernandes <adrianosf@uol.com.br>
Syntax:
<similar predicate> ::=
<value> [ NOT ] SIMILAR TO <similar pattern> [ ESCAPE <escape character> ]
<similar pattern> ::= <character value expression: regular expression>
<regular expression> ::=
<regular term>
| <regular expression> <vertical bar> <regular term>
<regular term> ::=
<regular factor>
| <regular term> <regular factor>
<regular factor> ::=
<regular primary>
| <regular primary> <asterisk>
| <regular primary> <plus sign>
| <regular primary> <question mark>
| <regular primary> <repeat factor>
<repeat factor> ::=
<left brace> <low value> [ <upper limit> ] <right brace>
<upper limit> ::= <comma> [ <high value> ]
<low value> ::= <unsigned integer>
<high value> ::= <unsigned integer>
<regular primary> ::=
<character specifier>
| <percent>
| <regular character set>
| <left paren> <regular expression> <right paren>
<character specifier> ::=
<non-escaped character>
| <escaped character>
<regular character set> ::=
<underscore>
| <left bracket> <character enumeration>... <right bracket>
| <left bracket> <circumflex> <character enumeration>... <right bracket>
| <left bracket> <character enumeration include>... <circumflex> <character enumeration exclude>... <right bracket>
<character enumeration include> ::= <character enumeration>
<character enumeration exclude> ::= <character enumeration>
<character enumeration> ::=
<character specifier>
| <character specifier> <minus sign> <character specifier>
| <left bracket> <colon> <character class identifier> <colon> <right bracket>
<character specifier> ::=
<non-escaped character>
| <escaped character>
<character class identifier> ::=
ALPHA
| UPPER
| LOWER
| DIGIT
| SPACE
| WHITESPACE
| ALNUM
Note:
1) <non-escaped character> is any character except <left bracket>, <right bracket>, <left paren>,
<right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>,
<percent>, <question mark>, <left brace> and <escape character>.
2) <escaped character> is the <escape character> succeeded by one of <left bracket>, <right bracket>,
<left paren>, <right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>,
<underscore>, <percent>, <question mark>, <left brace> or <escape character>.
Syntax description and examples:
Returns true for strings that matches <regular expression> or <regular term>:
<regular expression> <vertical bar> <regular term>
'ab' SIMILAR TO 'ab|cd|efg' -- true
'efg' SIMILAR TO 'ab|cd|efg' -- true
'a' SIMILAR TO 'ab|cd|efg' -- false
Matches zero or more occurrences of <regular primary>:
<regular primary> <asterisk>
'' SIMILAR TO 'a*' -- true
'a' SIMILAR TO 'a*' -- true
'aaa' SIMILAR TO 'a*' -- true
Matches one or more occurrences of <regular primary>:
<regular primary> <plus sign>
'' SIMILAR TO 'a+' -- false
'a' SIMILAR TO 'a+' -- true
'aaa' SIMILAR TO 'a+' -- true
Matches zero or one occurrence of <regular primary>:
<regular primary> <question mark>
'' SIMILAR TO 'a?' -- true
'a' SIMILAR TO 'a?' -- true
'aaa' SIMILAR TO 'a?' -- false
Matches exact <low value> occurrences of <regular primary>:
<regular primary> <left brace> <low value> <right brace>
'' SIMILAR TO 'a{2}' -- false
'a' SIMILAR TO 'a{2}' -- false
'aa' SIMILAR TO 'a{2}' -- true
'aaa' SIMILAR TO 'a{2}' -- false
Matches <low value> or more occurrences of <regular primary>:
<regular primary> <left brace> <low value> <comma> <right brace>
'' SIMILAR TO 'a{2,}' -- false
'a' SIMILAR TO 'a{2,}' -- false
'aa' SIMILAR TO 'a{2,}' -- true
'aaa' SIMILAR TO 'a{2,}' -- true
Matches <low value> to <high value> occurrences of <regular primary>:
<regular primary> <left brace> <low value> <comma> <high value> <right brace>
'' SIMILAR TO 'a{2,4}' -- false
'a' SIMILAR TO 'a{2,4}' -- false
'aa' SIMILAR TO 'a{2,4}' -- true
'aaa' SIMILAR TO 'a{2,4}' -- true
'aaaa' SIMILAR TO 'a{2,4}' -- true
'aaaaa' SIMILAR TO 'a{2,4}' -- false
Matches any (non-empty) character:
<underscore>
'' SIMILAR TO '_' -- false
'a' SIMILAR TO '_' -- true
'1' SIMILAR TO '_' -- true
'a1' SIMILAR TO '_' -- false
Matches a string of any length (including empty strings):
<percent>
'' SIMILAR TO '%' -- true
'az' SIMILAR TO 'a%z' -- true
'a123z' SIMILAR TO 'a%z' -- true
'azx' SIMILAR TO 'a%z' -- false
Groups a complete <regular expression> to use as one single <regular primary> as a sub-expression:
<left paren> <regular expression> <right paren>
'ab' SIMILAR TO '(ab){2}' -- false
'aabb' SIMILAR TO '(ab){2}' -- false
'abab' SIMILAR TO '(ab){2}' -- true
Matches a character identical to one of <character enumeration>:
<left bracket> <character enumeration>... <right bracket>
'b' SIMILAR TO '[abc]' -- true
'd' SIMILAR TO '[abc]' -- false
'9' SIMILAR TO '[0-9]' -- true
'9' SIMILAR TO '[0-8]' -- false
Matches a character not identical to one of <character enumeration>:
<left bracket> <circumflex> <character enumeration>... <right bracket>
'b' SIMILAR TO '[^abc]' -- false
'd' SIMILAR TO '[^abc]' -- true
Matches a character identical to one of <character enumeration include> but not identical to one
of <character enumeration exclude>:
<left bracket> <character enumeration include>... <circumflex> <character enumeration exclude>...
'3' SIMILAR TO '[[:DIGIT:]^3]' -- false
'4' SIMILAR TO '[[:DIGIT:]^3]' -- true
Matches a character identical to one character included in <character class identifier>.
See the table below. May be used with <circumflex> to invert the logic as above:
<left bracket> <colon> <character class identifier> <colon> <right bracket>
'4' SIMILAR TO '[[:DIGIT:]]' -- true
'a' SIMILAR TO '[[:DIGIT:]]' -- false
'4' SIMILAR TO '[^[:DIGIT:]]' -- false
'a' SIMILAR TO '[^[:DIGIT:]]' -- true
Character class identifiers:
Identifier Description
ALPHA All characters that are simple latin letters (a-z, A-Z).
Note: includes latin letters with accents when using accent-insensitive collation.
UPPER All characters that are simple latin uppercase letters (A-Z).
Important: Includes lowercase latters when using case-insensitive collation.
LOWER All characters that are simple latin lowercase letters (a-z).
Important: Includes uppercase latters when using case-insensitive collation.
DIGIT All characters that are numeric digits (0-9).
SPACE All characters that are the space character (ASCII 32).
WHITESPACE All characters that are whitespaces (vertical tab (9), newline (10),
horizontal tab (11), carriage return (13), formfeed (12), space (32)).
ALNUM All characters that are simple latin letters (ALPHA) or numeric digits (DIGIT).
Functional example:
create table department (
number numeric(3) not null,
name varchar(25) not null,
phone varchar(14) check (phone similar to '\([0-9]{3}\) [0-9]{3}\-[0-9]{4}' escape '\')
);
insert into department values ('000', 'Corporate Headquarters', '(408) 555-1234');
insert into department values ('100', 'Sales and Marketing', '(415) 555-1234');
insert into department values ('140', 'Field Office: Canada', '(416) 677-1000');
insert into department values ('600', 'Engineering', '(408) 555-123'); -- check constraint violation
select * from department
where phone not similar to '\([0-9]{3}\) 555\-%' escape '\';