1

I am getting a string from an API that has some sort of spacing, string2. On string2 the spaces are not regular spaces, I don't even know if they are tabs, but if I try to replace them still not equal to the spaced string1.

// This string has normal spaces charCodeAt(4) displays '32'
const string1 = 'long string with spaces'
// This string has different spaces charCodeAt(4) displays '160'
const string2 = 'long string with spaces'.replace(/\s+/g, ' ')

console.log(string1)
console.log(string2)
console.log(string1 === string2)

--- Update

The problem was that I had a mixture of normal spaces and non-breaking spaces on string1 so it will never be equal to string2 no matter how much I changed string2

Since I do have control of the string1, I have corrected it to have normal spaces and now it works.

8
  • that will return true not false Commented Jan 30, 2021 at 10:56
  • This answer may help. Commented Jan 30, 2021 at 10:56
  • 1
    This probably means you have some invisible character in either string (maybe in the non-ASCII range). I turned your code into a snippet and it returns true. So please provide code in your question that reproduces the issue. Commented Jan 30, 2021 at 10:58
  • 2
    Check the .charCode() of the characters. You will find some "spaces" with a code of 160 which is a non-breaking space ( ) Commented Jan 30, 2021 at 11:01
  • 1
    Indeed, inspection of string2 shows that it doesn't have regular spaces. If you would have applied the same replacement on it, the strings would have been equal. Commented Jan 30, 2021 at 11:04

3 Answers 3

1

Codepoint 160 (\u00a0) is a non-breaking space.

If you don't need to support IE, you can use the Unicode property escape /\p{White_Space}+/gu as a Unicode-aware alternative to /\w+/. This will match \u00a0 along with any other whitespace character.

If you need to support IE, you can generate your own whitespace-matching regex instead, using an environment that does support Unicode property escapes. For example, running the following in the Chrome browser console:

const toUnicodeEscape = x => '\\u' + x.toString(16).padStart(4, '0')

const last = arr => arr.slice(-1)[0]

const charGroupings = [...new Array(0xffff).keys()]
    .map(k => String.fromCodePoint(k))
    .filter(x => /^\p{White_Space}+$/u.test(x))
    .map(x => x.codePointAt(0))
    .reduce((acc, n) => {
        const prev = last(acc)

        if (prev && last(prev) === n - 1) {
            prev.push(n)
        } else {
            acc.push([n])
        }

        return acc
    }, [])
    .map(x => x.length <= 2
        ? x.map(toUnicodeEscape).join('')
        : `${toUnicodeEscape(x[0])}-${toUnicodeEscape(last(x))}`)
    .join('')

new RegExp(`[${charGroupings}]+`, 'g')

Generates the regex /[\u0009-\u000d\u0020\u0085\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/g, which is exactly equivalent to /\p{White_Space}+/gu.

Sign up to request clarification or add additional context in comments.

Comments

0

I can just turn all characters with charCode with more than 126 into spacebar

function formatWhiteSpaces(string){
  return string.split('')
  .map(a=>{ if(a.charCodeAt()<127){return(a)}return(" ") }) //when charcode of something > 126 (127+), it returns strange characters.. String.fromCharCode(160) returns something looking like " " and String.fromCharCode(173 returns something looking like "")
  .join('')
}
var string1 = formatWhiteSpaces('long string with spaces')
var string2 = formatWhiteSpaces('long string with spaces')
console.log(string1)
console.log(string2)
console.log(string1 === string2)

JUST IN CASE I actually turn wanted characters into spacebar, I can make it very specific(since only String.fromCharCode(160) makes the spacebar looking thing)

function formatWhiteSpaces(string){
  return string.split('')
  .map(a=>{
    if(a.charCodeAt()==160){return(" ")} //160 similar to " "
    if(a.charCodeAt()==173){return("")}//173 similar to ""
    return(a)
  })
  .join('')
}
var string1 = formatWhiteSpaces('long string with spaces')
var string2 = formatWhiteSpaces('long string with spaces')
console.log(string1)
console.log(string2)
console.log(string1 === string2)

Comments

0

If the string contain words and spaces (different), We can extract the words and rebuild the string then compare.

const cleanStr = str => [...str.matchAll(/\w+/g)].map(x => x[0]).join(' ')

// This string has normal spaces charCodeAt(4) displays '32'
const string1 = 'long string with spaces'
// This string has different spaces charCodeAt(4) displays '160'
const string2 = 'long string with spaces'

console.log(string1)
console.log(string2)
console.log(cleanStr(string1) === cleanStr(string2))

1 Comment

This will work the same for spaces const cleanStr = str => str.replace(/\s+/g, ' ') without having to convert the string to array

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.