54

I have a column eventDate which contains trailing spaces. I am trying to remove them with the PostgreSQL function TRIM(). More specifically, I am running:

SELECT TRIM(both ' ' from eventDate) 
FROM EventDates;

However, the trailing spaces don't go away. Furthermore, when I try and trim another character from the date (such as a number), it doesn't trim either. If I'm reading the manual correctly this should work. Any thoughts?

3
  • 2
    Are you sure its actually a space character and not some other non-visible whitespace character(s)? Commented Mar 27, 2014 at 21:40
  • @CodyCaughlan You're correct. It was some other non-visible whitespace character. Commented Mar 28, 2014 at 1:35
  • how did u fix this non-visible character Commented Jan 7, 2022 at 6:16

5 Answers 5

108

There are many different invisible characters. Many of them have the property WSpace=Y ("whitespace") in Unicode. But some special characters are not considered "whitespace" and still have no visible representation. The excellent Wikipedia articles about space (punctuation) and whitespace characters should give you an idea.

<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>

The standard SQL trim() function by default only trims the basic Latin space character (Unicode: U+0020 / ASCII 32). Same with the rtrim() and ltrim() variants. Your call also only targets that particular character.

Use regular expressions with regexp_replace() instead.

Trailing

To remove all trailing white space (but not white space inside the string):

SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;

The regular expression explained:
\s ... regular expression class shorthand for [[:space:]]
    - which is the set of white-space characters - see limitations below
+ ... 1 or more consecutive matches
$ ... end of string

Demo:

SELECT regexp_replace('inner white   ', '\s+$', '') || '|'

Returns:

inner white|

Yes, that's a single backslash (\). Details in this related answer:

Leading

To remove all leading white space (but not white space inside the string):

regexp_replace(eventdate, '^\s+', '')

^ .. start of string

Both

To remove both, you can chain above function calls:

regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')

Or you can combine both in a single call with two branches.
Add 'g' as 4th parameter to replace all matches, not just the first:

regexp_replace(eventdate, '^\s+|\s+$', '', 'g')

But that should typically be faster with substring():

substring(eventdate, '\S(?:.*\S)*')

\S ... everything but white space
(?:re) ... non-capturing set of parentheses
.* ... any string of 0-n characters

Or one of these:

substring(eventdate, '^\s*(.*\S)')
substring(eventdate, '(\S.*\S)')  -- only works for 2+ printing characters

(re) ... Capturing set of parentheses

Effectively takes the first non-whitespace character and everything up to the last non-whitespace character if available.

Whitespace?

There are more related characters which are not classified as "whitespace" in Unicode - so not contained in the character class [[:space:]].

Here are some common ones that print as invisible glyphs in pgAdmin for me:
"no-break space", "mongolian vowel", "figure space", "zero width space", "zero width non-joiner", "zero width joiner", "narrow no-break space", "word joiner", "zero width non-breaking space"

SELECT E'\u00A0\u180E\u2007\u200B\u200C\u200D\u202F\u2060\uFEFF';

To remove all of these trailing behind your string:

SELECT regexp_replace(eventdate, '[\s\u00A0\u180E\u2007\u200B\u200C\u200D\u202F\u2060\uFEFF]+$', '')

You may have to add more characters to catch all "whitespace" characters.
Ultimately, whether characters are rendered visible or not also depends on the font used for display. Once you go beyond basic ASCII characters, things get complicated.

Limitations

There is also the Posix character class [[:graph:]] supposed to represent "visible characters". Example:

substring(eventdate, '([[:graph:]].*[[:graph:]])')

It works for ASCII characters in every setup (where it boils down to [\x21-\x7E]), but beyond that you currently (incl. pg 10) depend on information provided by the underlying OS (to define ctype) and possibly locale settings.

That's the case for every reference to a character class, but there seems to be more disagreement with the less commonly used ones like graph. The manual:

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others.

Bold emphasis mine.

Another limitation has been fixed in Postgres 10:

Fix regular expressions' character class handling for large character codes, particularly Unicode characters above U+7FF (Tom Lane)

Previously, such characters were never recognized as belonging to locale-dependent character classes such as [[:alpha:]].

Sign up to request clarification or add additional context in comments.

10 Comments

so...would SELECT regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '') FROM eventdates; work to strip all leading and trailing spaces?
@heliotrope; If it's just "spaces" use trim(). Else, consider the added bits above.
@ErwinBrandstetter There's an extra closing parenthesis on the last SQL query.
for trimming Non-breaking spaces as well (for an UTF-8 encoded PostreSQL-DB), do SELECT regexp_replace(regexp_replace(eventdate, '^(\s|\u00a0|\ufeff|\u2007|\u180e|\u202f)+', ''), '(\s|\u00a0|\ufeff|\u2007|\u180e|\u202f)+$', '') FROM eventdates;
@ErwinBrandstetter I'm running PostgreSQL 9.5.11 and \s did not catch \u00a0, that's why I added it to the regex. Thanks for clarifying this great answer!
|
9

It should work the way you're handling it, but it's hard to say without knowing the specific string.

If you're only trimming leading spaces, you might want to use the more concise form:

SELECT RTRIM(eventDate) 
FROM EventDates;

This is a little test to show you that it works. Tell us if it works out!

Comments

5

If your whitespace is more than just the space meta value than you will need to use regexp_replace:

 SELECT '(' || REGEXP_REPLACE(eventDate, E'[[:space:]]', '', 'g') || ')' 
 FROM EventDates;

In the above example I am bounding the return value in ( and ) just so you can easily see that the regex replace is working in a psql prompt. So you'll want to remove those in your code.

1 Comment

This wouldn't stop at trailing spaces, but kill all white space characters in the string.
1

A tested one that works like a charm:

UPDATE company SET name = TRIM (BOTH FROM name) where id > 0

2 Comments

JFI: TRIM won't remove \t.
simple and effective
0
SELECT  replace(('       devo    system      ') ,' ','');

It gives: devosystem

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.