1

I am writing a custom search engine for my website. I am trying to make use of MySQL REGEXP feature. I would like to be able to search for a word separated by spaces to avoid the chances of getting suffixes or prefixes on a word. For example I am trying to search for "appreciate" I want appreciate, not appreciated or unappreciate or unappreciated. Any ideas on how I could do this with MySQL's REGEXP? My idea for this was to look for spaces like maybe so:

^appreciate$|^appreciate[:space:]|[:space:]appreciate$|[:space:]appreciate[:space:]

I am sure they is a better way of doing it and I have no idea if that even works

2
  • What about if I wanted to add a mutiple words and wanted padding incase somthing appears between them. For example: I want something like "Fish Stick" but want to allow for something such as "I love Fish and a Stick". My idea was: [[:<:]]Fish[[:>:]]?.{1,}[[:<:]]Stick[[:>:]] Commented Dec 20, 2011 at 15:50
  • This seems to have worked for me. Well it appears like the more I play with RegEx the easier it becomes to understand and write it. I honestly felt like it was impossible for me to learn it! Commented Dec 20, 2011 at 15:56

3 Answers 3

2

I think what you want is something like this:

SELECT 'I appreciate you' REGEXP '[[:<:]]appreciate[[:>:]]'; /* matches */

[[<:]] and [[>:]] are word boundaries. From the manual:

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

Edit: just to clarify, this also deals with situations where there's a newline character after the word, or a comma, etc

Sign up to request clarification or add additional context in comments.

1 Comment

Works perfect thank you! I didnt read up on but I saw those ones and couldnt understand at the time what 'word boundaries' ment. Thanks!
0

What about:

^\s*appreciate(\s+.*)*$
  • Between the start and the word there may be 0+ whitespace parts
  • then comes the word
  • then if something comes after that, it has to start with whitespace

1 Comment

MySQL's REGEXP function doesn't support \s.
0

You can seek for non-alphabetic characters:

[^[:alpha:]]+

... or just word boundaries:

[[:<:]]foo[[:>:]]

Before making a choice, don't forget to make some tests with commas, dots and non-English chars. Also, take into account that MySQL does not fully support regular expressions in multi-byte strings (such as UTF-8).

1 Comment

Works perfect thank you! I didnt read up on but I saw those ones and couldnt understand at the time what 'word boundaries' ment. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.