5

I am having difficulty to match two text strings. One contains some hidden characters from a text string.

I have a text string: "PR & Communications" stored on an SQL database. When pulled from there, into $database_version, var_dump($database_version) reveals the string to have 19 bytes.

I have scraped (with permission) from a website, some text into a variable, $web_version. Ostensibly the string is "PR & Communications" but it does not match the database version, i.e if($database_version == $web_version) is NOT true.

var_dump() reveals $web_version to have 23 bytes. trim() has no effect, nor does strip_tags() but preg_replace( '/[^\PC\s]/u', $web_version ) removes something because afterwards string_var($web_version) reveals the string to comprise 14 bytes only. It has clearly removed something, possibly too much, as the string still does not match with $database_version.

Any ideas how to:

  1. find out what has been removed
  2. strip out just enough to match $database_version?

PS I don't know how to view the variable in hexadecimal code

4
  • Wen your trying to compare if( $database_version == $web_version ) is both variable are coming in string format ? Try with some typecasting and trim method. Commented Jan 27, 2016 at 16:09
  • 1
    You can try using utf8-decode($web_version) - php.net/manual/en/function.utf8-decode.php. Commented Jan 27, 2016 at 16:27
  • 1
    debugging: to see the string as hex bytes then use var_dump($web_version, bin2hex($web_version), __FILE__.__LINE__);. To see what the character represent then: ASCII Table and Description and Complete Character List for UTF-8 Commented Jan 27, 2016 at 17:01
  • 1
    Thank you Ryan, your var_dump formula revealed that one value had the '&' as an ampersand and the other as &, hence the two values did not match. This helped me solve the problem. Commented Feb 3, 2016 at 13:17

1 Answer 1

3
$v = preg_replace('/\s+|[[:^print:]]/', '', $string);

trim() removes only " \t\n\r\0\x0B" (see docs), so use snippet above to remove non-printed characters from string.

Sign up to request clarification or add additional context in comments.

3 Comments

this helped me resolve a slightly different issue. Perhaps you could clarify on non printed characters and what this regex actually does?
[[:print:]] is PCRE print character class (alias for complex regex, more of them: php.net/manual/en/regexp.reference.character-classes.php) syntax. Print character means visible on page render. ^ symbol inside character class or group means negation, so [[:^print:]] means non-printable character - ones that are not visible after page render (like BOM-mark, for example). Other parts of regex is very easy - \s stands for "any space symbols", (space, tab, new line ,etc.), + means "repeat one or more times", pipe (|) means "or".
So, as whole, it could be read as "find any space symbol or non-printable character".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.