How to remove hidden characters from text string in PHP?

Question

I am having difficulty to match two text strings. One contains some hidden characters from a text string.

I have a text string: "PR & Communications" stored on an SQL database. When pulled from there, into $database_version, var_dump($database_version) reveals the string to have 19 bytes.

I have scraped (with permission) from a website, some text into a variable, $web_version. Ostensibly the string is "PR & Communications" but it does not match the database version, i.e if($database_version == $web_version) is NOT true.

var_dump() reveals $web_version to have 23 bytes. trim() has no effect, nor does strip_tags() but preg_replace( '/[^\PC\s]/u', $web_version ) removes something because afterwards string_var($web_version) reveals the string to comprise 14 bytes only. It has clearly removed something, possibly too much, as the string still does not match with $database_version.

Any ideas how to:

find out what has been removed
strip out just enough to match $database_version?

PS I don't know how to view the variable in hexadecimal code

Wen your trying to compare if( $database_version == $web_version ) is both variable are coming in string format ? Try with some typecasting and trim method. — Drone
– Drone, Commented Jan 27, 2016 at 16:09
You can try using utf8-decode($web_version) - php.net/manual/en/function.utf8-decode.php. — Scott
– Scott, Commented Jan 27, 2016 at 16:27
debugging: to see the string as hex bytes then use var_dump($web_version, bin2hex($web_version), __FILE__.__LINE__);. To see what the character represent then: ASCII Table and Description and Complete Character List for UTF-8 — Ryan Vincent
– Ryan Vincent, Commented Jan 27, 2016 at 17:01
Thank you Ryan, your var_dump formula revealed that one value had the '&' as an ampersand and the other as &, hence the two values did not match. This helped me solve the problem. — heroicadventures
– heroicadventures, Commented Feb 3, 2016 at 13:17

Miguel V. · Accepted Answer · 2021-10-23 16:41:46Z

3

$v = preg_replace('/\s+|[[:^print:]]/', '', $string);

trim() removes only " \t\n\r\0\x0B" (see docs), so use snippet above to remove non-printed characters from string.

edited Oct 23, 2021 at 16:41

Miguel V.

515 bronze badges

answered Jan 27, 2016 at 16:34

Aleksey Ratnikov

5593 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Thomas Clowes Over a year ago

this helped me resolve a slightly different issue. Perhaps you could clarify on non printed characters and what this regex actually does?

Aleksey Ratnikov Over a year ago

[[:print:]] is PCRE print character class (alias for complex regex, more of them: php.net/manual/en/regexp.reference.character-classes.php) syntax. Print character means visible on page render. ^ symbol inside character class or group means negation, so [[:^print:]] means non-printable character - ones that are not visible after page render (like BOM-mark, for example). Other parts of regex is very easy - \s stands for "any space symbols", (space, tab, new line ,etc.), + means "repeat one or more times", pipe (|) means "or".

Aleksey Ratnikov Over a year ago

So, as whole, it could be read as "find any space symbol or non-printable character".

Collectives™ on Stack Overflow

How to remove hidden characters from text string in PHP?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related