7

I'm trying to find a more efficient method of reading a file from a remote URL and saving it into a byte array. Here is what I currently have:

private byte[] fetchRemoteFile(String location) throws Exception {
  URL url = new URL(location);
  InputStream is = null;
  byte[] bytes = null;
  try {
    is = url.openStream ();
    bytes = IOUtils.toByteArray(is);
  } catch (IOException e) {
    //handle errors
  }
  finally {
    if (is != null) is.close();
  }
  return bytes;
}

As you can see, I currently pass the URL into the method, where it uses an InputStream object to read in the bytes of the file. This method uses Apache Commons IOUtils. However, this method call tends to take a relatively long time to run. When retrieving hundreds, thousands, or hundreds of thousands of files one right after another, it gets quite slow. Is there a way I could improve this method so that it runs more efficiently? I have considered multithreading but I would like to save that as a last resort.

5
  • Without multithreading, you're limited to one right after another. Commented Nov 18, 2014 at 18:57
  • Theres nothing wrong with the code (aside from the general limitation that the byte[] must obviously fit the heap and the inherent 2GB limit, but I assume you don't mind either). The perceived "slowness" probably comes from the URL's being http's which require a new network connection to retrieve each file (the overhead is notable if there are many small files). Aside from using multiple requests (that is multithreading) or working directly with http 1.1 keeping the connection open there isn't much potential to speed this up. Commented Nov 18, 2014 at 19:06
  • @Sotirios Yes, and I'm wondering if there's a way to make the above code more efficient so that even running one right after another, it goes faster than it does now. I don't know if there's really anything I can do other than multithreading, but that's why I asked Commented Nov 18, 2014 at 19:06
  • First find out where the bottle neck is. Is the network slow or the server or something else. Only when you know that you can think about optimizations. For example, if the network connection is slow, you will not have benefits from multithreading. Commented Nov 18, 2014 at 19:07
  • Thanks guys, That's what I was leaning towards. I am still working on putting together some performance tests to determine which parts are the slowest, but just wanted to see if there was an obvious improvement I could make to what I already have. Much appreciated! Commented Nov 18, 2014 at 19:09

1 Answer 1

2

Your way of doing it seems like absolutely ok.

But if you saying:

"However, this method call tends to take a relatively long time to run"

You can have follow problems :

  • Network, connection issue

  • Are you sure that download each file in separate thread?

If you are using multithreading for that, be sure that VM args -XmsYYYYM and -XmxYYYYM configured well, because if not you can face problem , that your processor not using all cores. I have faced this problem some time ago.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.