2

I am running a EC2 instance (AMAZON LINUX X64 running Postgresql) and need to take a file from EC2 and load it into a Postgresql DB running on RDS. I am not sure how to go about doing that, does anybody have experience doing so? Or can somebody point me to instructions, whitepaper, etc..

Thank you in advance..

5
  • Usually, what you do is you write a script that reads the file, and talks to Postgres. You can use e.g. Python. In which case you'd use open to open a file, and maybe the csv module if your file is CSV. Then, you'd use the psycopg module to talk to Postgres. Commented Nov 30, 2013 at 17:31
  • What format is the file you are loading? Commented Nov 30, 2013 at 17:35
  • Amazon to the rescue! docs.aws.amazon.com/AmazonRDS/latest/UserGuide/… Commented Nov 30, 2013 at 18:00
  • Datasage - The file is a csv file Commented Dec 1, 2013 at 0:41
  • are you trying to load it as a table? As a field in a table? Commented Dec 21, 2013 at 3:27

1 Answer 1

1

Better late than never :-)

I've found that by far the easiest way to do this is to load the data from a web server or file based end point using the 'COPY FROM' command

Postgres manual for copy

Practical example

Lets imagine you have the following CSV data:

1,Fred,Flintstone
2,Barney,Rubble
3,Willma,Flintstone
4,Betty,Rubble

With the columns respectively being pkid, firstname and surname

If you create this file on a web server (Perhaps a server your running locally but can be reached from the outside), you should then be able to type:

http://myserver.blah/flintstones.csv

into your browser and see the file appear.

Once your able to do this, and assuming the server you've used is public facing (So that amazons servers can see it), you then need to fire up a tool such as PGAdmin or anything else that allows you to run sql on your postgres install.

how you run these commands is a matter for debate, I've used all manner of methods in the past.

One that works really well is to set up ssh login on your Amazon appliance host, then use an SSH client that allows you to tunnel from your local host to the RDS instance, doing it this way allows you to use programs such as PGAdmin.

If you can't use a tunnel, then you could always hack together a quick ruby/php/nodejs script that allows you to run the 2 sql commands you need.

Once you have the ability to run SQL commands against your RDS instance, you need to do 2 things:

  • 1) Create the destination table
  • 2) Use the copy command to import the data

Creating the destination table is easy, that's just a simple create table command.

For our example:

CREATE TABLE theflintstones
(
  pkid integer primary key,
  firstname text,
  surname text
)

The second command is a little more tricky

If your going to load the data from a file system, then you need to make sure that you copy the CSV file to a file system location that RDS has access to.

In my past experience however, I can't recall ever getting access to the direct file system on an RDS instance, so you'll highly likley have to use the remote http method.

The problem with using the http method is that the rds instance may not have either the wget or the curl tool installed.

In practice I've yet to come across one that does not have at least wget installed, as wget is quite often needed by the underlying OS to grab things it needs from the web. Often curl is installed too.

once your ready to import the data, you then need to use the following command:

COPY theflintstones FROM PROGRAM 'curl -s http://myserver/flintstones.csv' WITH(format csv)

Where 'myserver' should be replaced with the web or ip address where you stored the CSV data file, and the 'flintstones.csv' should be replaced with the actual file name you want to load.

'curl -s [url]' is used to run curl in silent mode, if you have to use wget then you should specify the program as 'wget -qO- [url]' instead

If all goes well, postgres should load the CSV from the remote source, then use the contents of that file to populate the columns in your table.

If you only need to populate some columns in your table, then use the table and column syntax:

COPY table(column, column, column ... )

and the csv will only populate those columns that are named setting the rest to their default values.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.