2

I was trying to make a Regex to extract some information with the SUBSTRING function, but the REGEX that I've tried on https://regex101.com/r/cFy11t/1 return an error on postgresql : "quantifier operand invalid"

The goal of this substring is to extract the last number in the string, and include complement as "A, B... G, or BIS, TER" if the number is like "12B of street X". If several numbers are in the string, it should pick the last one, except if the last one is at the end of the string.

For example, in the string "123 47F ABC 33 BIS", the result should be "47F"

I've already changed a lot the code since I wasn't getting the expected output, but I only get an error since I've forced the greedy quantifier "++" and "?+" to get the letter if possible

So here is the last version of my code where postgres raise an error: (but you can click on the link up to see it with color it might be more clear)

SELECT SUBSTRING(Adresse, '(\d++((?:\s)?([A-G]|BIS|TER|QUARTER)?+\s)(?!$))(?!(.*\d+(\s)?[A-G]?+\D))')  Numero,

...

Thank you for your time !

2
  • 1
    I think that is due to the possessive quantifiers. Try '^(?:.*\D)?(\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y)' in PostgreSQL (it won't work at regex101). Or, '^.*\y(\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y)' might do, too. Commented Jan 3, 2019 at 14:07
  • @WiktorStribiżew Thank you very much, I've tried the first one and from what I've seen it seems to get exactly what I need ! But I do not understand some of the code can you explain it to me ? For example, how does the (?:BIS|TER|QUARTER|[A-G])$) manage to appear in the result, as it shoult be non capturing, what does the \y stand for ? Finally, I'm curious to know if there is so place on the internet to know the differences between sur regex used on regex101 and one postgresql ? Thank you ! Commented Jan 3, 2019 at 14:29

2 Answers 2

1

You may use

'^.*\y(\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y)'

See the online demo

The main point is to get rid of possessive quantifiers and make sure you apply the end of string check taking into account that the trailing patterns are optional.

Details

  • ^ - start of string
  • .*\y - any 0+ chars, as many as possible, up to the last word boundary (followed with...)
  • (\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y) - Group 1 (this will be returned by the SUBSTRING):
    • \d+ - 1+ digits
    • (?!\s?(?:BIS|TER|QUARTER|[A-G])$) - not followed with an optional whitespace and then BIS, or TER, or QUARTER or a letter from A through G at the end of the string
    • \s? - an optional whitespace
    • (?:BIS|TER|QUARTER|[A-G])? - an optional match for BIS, or TER, or QUARTER or a letter from A through G
    • \y - a word boundary.
Sign up to request clarification or add additional context in comments.

Comments

0

I couldn't understand your question clearly but still, I gave it a try. If my answer is wrong, I would appreciate a few examples with the outcome you want.

Regex: '(\d+[A-G]+)'

Link to example: https://regex101.com/r/cFy11t/4

Update:

New Regex: '(?:\d+\w(?=\s\d+[A-z]{2,}))|(\d+[A-z])(?=\s\d+[A-z])'

Link to example: https://regex101.com/r/cFy11t/11

I hope this will fulfil your requirement. If not let me know.

1 Comment

Hello, thanks for your answer but it was not what I was searching for. I was looking for the last number before some other text, but some character were supposed to be considered as part of the text, sorry if my explanation isn't clear ! For exemple, '11 12F 13F' should return '12F', and '11 12FF 13F' should return '11'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.