Ruby UTF8 encoding problem

Question

I have a Ruby/Rails app.

I have an artists table in my postgresql database which I want to query by name. I have some artists with portuguese characters etc. and am having some issues querying them.

For example one band is called Legião Urbana. If I query with the string "legiã" from my app I get the following params:

{"action"=>"search_artist", "q"=>"legi\343", "controller"=>"home"}

However I get an error from the query

Artist.all(:conditions => "name LIKE '%#{params[:q]}%'")

PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xe32527

What should I be doing to convert into UTF8 or fix this from happening somehow?

yfeldblum · Accepted Answer · 2010-12-02 17:07:15Z

5

You need to know what the encoding is of that parameter in the query-string.

Ruby 1.9 includes support for strings tagged with their encodings. In Ruby 1.9, you could:

params[:q].encoding # Rails 3 on 1.9 generally presents strings in UTF-8
params[:q].encode('utf-8') # ask Ruby to re-encode it to UTF-8

Then you need to convert the parameter from that encoding into UTF-8 before doing string-interpolation (#{...} syntax).

Or you need to pass the parameter as a SQL parameter, not using string-interpolation.

Of course, this brings up the security consideration that, unless you know how to properly encode text for usage in SQL, you should never do string-interpolation to build SQL string fragments. Because SQL-fragments with parameters are quick and easy to do in Rails, you should use them.

# Rails 2
Artist.all(:conditions => ['name like ?', "%#{params[:q]}%"])
Artist.all(:conditions => ['name like :q', { :q=> "%#{params[:q]}%" }])

# Rails 3
Artist.where('name like ?', "%#{params[:q]}")
Artist.where('name like :q', :q => "%#{params[:q]}")

SQL injection is the security problem that occurs when you do string-interpolation and encode strings in a way that builds correct SQL fragments for some input strings, but not for others. In languages/frameworks where parameters are more difficult to work with, it would be acceptable to do string-interpolation or string-building (if it remains easy to do string-interpolation or string-building), so long as you research exhaustively how you are required to encode the interpolated strings to build correct SQL fragments, regardless of the input string. Because SQL injection is so easy to avoid with Rails via ordered or named parameters (see the four samples above), you should not have any problems ensuring that your SQL fragments are all safe.

edited Dec 2, 2010 at 17:07

answered Dec 2, 2010 at 16:28

yfeldblum

65.5k12 gold badges132 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

johnnymire Over a year ago

Ok, it's more the encoding/converting that I was interested. How do you find the encoding of the parameter? I'm aware of the SQL injection issue but I just wanted to give an example of the query.

the Tin Man Over a year ago

+1 for "you need to pass the parameter as a SQL parameter, not using string-interpolation".

yfeldblum Over a year ago

Unfortunately I'm not too familiar with the 1.8 string-encoding situation. I do know that it can be rather difficult to deal with encoding issues in 1.8, and that if you are facing encoding problems, the best bet is to move to 1.9.

johnnymire · Accepted Answer · 2010-12-02 17:40:24Z

4

I think this might do it

require 'iconv'
Iconv.conv("UTF8", "LATIN1", params[:q])

answered Dec 2, 2010 at 17:40

johnnymire

1,1032 gold badges12 silver badges25 bronze badges

Collectives™ on Stack Overflow

Ruby UTF8 encoding problem

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related