What are different Types of Regular Expressions in
ORACLE. Explain with Examples?
REGEXP_SUBSTR - Returns the string
matching the regular expression. Not really similar to SUBSTR.
REGEXP_COUNT - Returns the number
of occurrences of the regular expression in the string.
REGEXP_INSTR - Similar to INSTR
except it uses a regular expression rather than a literal as the search string.
REGEXP_LIKE - Similar to LIKE
except it uses a regular expression as the search string. REGEXP_LIKE is really
an operator, not a function.
REGEXP_REPLACE - Similar to REPLACE
except it uses a regular expression as the search string.
SELECT
ENAME,
REGEXP_SUBSTR(ENAME,'RA') SUBSTR,
REGEXP_INSTR(ENAME, 'A') INSTR,
REGEXP_REPLACE(ENAME,'RA','@') REPLACE,
REGEXP_COUNT(ENAME, 'H') COUNT
FROM
EMP_TEST WHERE REGEXP_LIKE(ENAME,'R');
REGEXP_SUBSTR returns the substring
RA if found, REGEXP_INSTR returns the position of the first 'A', REGEXP_REPLACE
replaces the strings 'RA' with ‘@’ and REGEXP_COUNT counts the occurrences of
'H'. REGEXP_LIKE returns the strings that contain the pattern 'R'.
Let’s understand them in detail:
REGEXP_SUBSTR
EXAMPLE 1: We will create a Table and will
populate it with some data containing Years. Then we need to identify text with
Satisfying condition for the years. Let’s see how it works:
SQL> CREATE TABLE Tab1(DATA VARCHAR2(50));
Table created.
SQL> INSERT INTO Tab1 VALUES ('FALL 2014');
1 row created.
SQL> INSERT INTO Tab1 VALUES ('2014 CODE-B');
1 row created.
SQL> INSERT INTO Tab1 VALUES ('CODE-A 2014 CODE-D');
1 row created.
SQL> INSERT INTO Tab1 VALUES ('ADSHLHSALK');
1 row created.
SQL> INSERT INTO Tab1 VALUES ('FALL 2004');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * From Tab1;
DATA
------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
ADSHLHSALK
FALL 2004
If we needed to return rows
containing a specific year we could use the
LIKE
operator
(
WHERE data
LIKE '%2014%'), but how do we return rows using a comparison (<, <=,
>, >=, <>)?
One way to approach this is to pull out the 4-figure year and
convert it to a number, so we don't accidentally do an ASCII comparison. That's
pretty easy using regular expressions.
We can identify digits using the "\d" or
"[0-9]" operators. We want a group of four of them, which is
represented by the "{4}" operator. So our regular expression will be
"\d{4}" or "[0-9]{4}". The REGEXP_SUBSTR function
returns the string matching the regular expression, so that can be used to
extract the text of interest. We then just need to convert it to a number and
perform our comparison.
SELECT * From Tab1
WHERE TO_NUMBER (REGEXP_SUBSTR(data, '\d{4}')) >= 2004;
SQL> SELECT * From Tab1 WHERE TO_NUMBER (REGEXP_SUBSTR(data, '\d{4}')) >= 2004;
DATA
------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
FALL 2004
EXAMPLE 2: Given a source string, how do we
split it up into separate columns, based on changes of case and
alpha-to-numeric
ArtADB1234567e9876540
Becomes this.
Art
ADB 1234567 e 9876540
SQL> CREATE
TABLE Tab1(DATA VARCHAR2(50));
Table created.
SQL> INSERT
INTO Tab1 Values('ArtADB1234567e9876540');
1 row created.
The first part of the string
is an initcap word, so it starts with a capital letter between "A"
and "Z". We identify a single character using the "[]"
operator, and ranges are represented using "-", like "A-Z",
"a-z" or "0-9". So if we are looking for a single character
that is a capital letter, we need to look for "[A-Z]". That needs to
be followed by lower case letters, which we now know is "[a-z]", but
we need 1 or more of them, which is signified by the "+" operator. So
to find an initcap word, we need to search for "[A-Z][a-z]+". Since
we want the first occurrence of this, we can use the following.
REGEXP_SUBSTR(Data, '[A-Z][a-z]+', 1, 1)
The second part of the string
is a group of 1 or more uppercase letters. We know we need to use the
"[A-Z]+" pattern, but we need to make sure we don't get the first
capital letter, so we look for the second occurrence.
REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)
The next part is the first
occurrence of a group of numbers.
REGEXP_SUBSTR(data, '[0-9]+', 1, 1)
The next part is a group of
lower case letters. We don't to pick up those from the initcap word, so we must
look for the second occurrence of lower case letters.
REGEXP_SUBSTR(data, '[a-z]+', 1, 2)
Finally, we have a group of
numbers, which is the second occurrence of this pattern.
REGEXP_SUBSTR(data, '[0-9]+', 1, 2)
Putting that all together, we have
the following query, which splits the data into separate columns.
COLUMN col1 FORMAT A15
COLUMN col2 FORMAT A15
COLUMN col3 FORMAT A15
COLUMN col4 FORMAT A15
COLUMN col5 FORMAT A15
SELECT REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1) col1,
REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,
REGEXP_SUBSTR(data, '[0-9]+', 1, 1) col3,
REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,
REGEXP_SUBSTR(data, '[0-9]+', 1, 2) col5
FROM TAB1;
COL1 COL2 COL3 COL4 COL5
--------------- --------------- --------------- --------------- -------
Art ADB 1234567 e 9876540
1 row selected.
EXAMPLE 3: We need to pull out a group of characters
from a "/" delimited string, optionally enclosed by double quotes.
The data looks like this.
SQL> TRUNCATE TABLE TAB1;
Table truncated.
SQL> INSERT INTO TAB1 VALUES ('978/955086/GZ120804/10-FEB-12');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('97/95508/BANANA/10-FEB-12');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('97/95508/"APPLE"/10-FEB-12');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * From TAB1;
DATA
------------------------------
978/955086/GZ120804/10-FEB-12
97/95508/BANANA/10-FEB-12
97/95508/"APPLE"/10-FEB-12
We are looking for 1 or more
characters that are not "/", which we do using "[^/]+". The
"^" in the brackets represents NOT and "+" means 1 or more.
We also want to remove optional double quotes, so we add that as a character we
don't want, giving us "[^/"]+". So if we want the data from the
third column, we need the third occurrence of this pattern.
SELECT REGEXP_SUBSTR(Data, '[^/"]+', 1, 3) AS Element3
FROM TAB1;
ELEMENT3
-----------------------------------------------------------------------
GZ120804
BANANA
APPLE
3 rows selected.
REGEXP_REPLACE
EXAMPLE: We need to take an initcap string and
separate the words. The data looks like this.
We need to find each uppercase
character "[A-Z]". We want to keep that character we find, so we will
make that pattern a sub-expression "([A-Z])", allowing us to refer to
it later. For each match, we want to replace it with a space, plus the matching
character. The space is pretty obvious, but we need to use "\1" to
signify the text matching the first sub expression. So, we will replace the
matching pattern with a space and itself, " \1". We don't want to
replace the first letter of the string, so we will start at the second
occurrence.
SELECT REGEXP_REPLACE(Data, '([A-Z])', ' \1', 2) AS Hyphen_Text
FROM TAB1;
HYPHEN_TEXT
-----------------------------------------------------------------------
Social Security Number
House Number
2 rows selected.
REGEXP_INSTR
EXAMPLE: We have a specific pattern of digits (9
99:99:99) and we want to know the location of the pattern in our data.
SQL> TRUNCATE TABLE TAB1;
Table truncated.
SQL> INSERT INTO TAB1 VALUES ('1 01:01:01');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('.2 02:02:02');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('..3 03:03:03');
1 row created.
SQL> COMMIT;
Commit complete.
SELECT
REGEXP_INSTR(data, '[0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}') AS Loc_1,
REGEXP_INSTR(data, '\d \d{2}:\d{2}:\d{2}') AS Loc_2 FROM TAB1;
LOC_1 LOC_2
------------ ------------
1 1
2 2
3 3
3 rows selected.
REGEXP_LIKE
and REGEXP_SUBSTR
EXAMPLE: We have strings containing parentheses. We
want to return the text within the parentheses for those rows that contain
parentheses.
SQL> INSERT INTO TAB1 VALUES ('This is some text (with parentheses) in it.');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('This text has no parentheses.');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('This text has (parentheses too).');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * From TAB1;
DATA
--------------------------------------------------
This is some text (with parentheses) in it.
This text has no parentheses.
This text has (parentheses too).
The basic pattern for text between parentheses is "\(.*\)". The "\" characters are escapes for the parentheses, making them literals. Without the escapes they would be assumed to define a sub-expression. That pattern alone is fine to identify the rows of interest using a REGEXP_LIKE operator, but it is not appropriate in a REGEXP_SUBSTR, as it would return the parentheses also. To omit the parentheses we need to include a sub-expression inside the literal parentheses "\((.*)\)". We can then REGEXP_SUBSTR using the first sub expression.
COLUMN With_Parentheses FORMAT A20
COLUMN Without_Parentheses FORMAT A20
SELECT Data,
REGEXP_SUBSTR(Data, '\(.*\)') AS With_Parentheses,
REGEXP_SUBSTR(Data, '\((.*)\)', 1, 1, 'i', 1) AS Without_Parentheses
FROM TAB1
WHERE REGEXP_LIKE(Data, '\(.*\)');
DATA WITH_PARENTHESES WITHOUT_PARENTHESES
-------------------------------------------------- -------------------------------------------------------------------------------------------------------
This is some text (with parentheses) in it. (with parentheses) with parentheses
This text has (parentheses too). (parentheses too) parentheses too
2 rows selected.
REGEXP_COUNT
EXAMPLE: We need to know how many times a block of
4 digits appears in text. The data looks like this.
SQL> TRUNCATE TABLE TAB1;
Table truncated.
SQL> INSERT INTO TAB1 VALUES ('1234 1234');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('1234');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('1234 1234 1234 1234 1234');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('1234 1234 1234');
1 row created.
SQL> COMMIT;
Commit complete.
We can identify digits using
"\d" or "[0-9]" and the "{4}" operator signifies
4 of them, so using "\d{4}" or "[0-9]{4}" with the REGEXP_COUNT function seems
to be a valid option.
SQL> SELECT * From TAB1;
DATA
--------------------------------------------------
1234 1234
1234
1234 1234 1234 1234 1234
1234 1234 1234
SQL> SELECT REGEXP_COUNT(Data, '[0-9]{4}') AS Pattern_Count_1,
2 REGEXP_COUNT(Data, '\d{4}') AS Pattern_Count_2 FROM TAB1;
PATTERN_COUNT_1 PATTERN_COUNT_2
--------------- ---------------
2 2
1 1
5 5
3 3
4 rows selected.
REGEXP_LIKE
EXAMPLE: We need to identify invalid email
addresses. The data looks like this.
SQL> TRUNCATE TABLE TAB1;
Table truncated.
SQL> INSERT INTO TAB1 VALUES ('me@example.com');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('me@example');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('@example.com');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('me.me@example.com');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('me.me@ example.com');
1 row created.
SQL> INSERT INTO TAB1 VALUES ('me.me@example-example.com');
1 row created.
SQL> COMMIT;
Commit complete.
SELECT Data FROM TAB1
WHERE NOT REGEXP_LIKE(data, '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}', 'i');
DATA
--------------------------------------------------
me@example
@example.com
me.me@ example.com
3 rows selected.
Source: https://oracle-base.com
Nice blog, Thanks For Sharing this informative Article.
ReplyDeleteOracle Fusion SCM Online Training
Oracle Fusion Financials Online Training