3

The following code uploads multiple files from a browser in a serial fashion to a host. In order to make the process faster, how could the files be processed in parallel so there are multiple streams being sent to the server? Is this worth pursuing if the files are being sent to the same host? Will a browser allow multiple upload streams to the same host? If so, how might this work?

I assume that is not possible to break the file into parts and write them parallel similar to this example since they are being submitted by a browser rather then a Python client.

#!/usr/bin/python
import cgi, os
import shutil
import cgitb; cgitb.enable()  # for troubleshooting

form = cgi.FieldStorage()

print """\
Content-Type: text/html\n
<html><body>
"""

if 'file' in form:
   filefield = form['file']
   if not isinstance(filefield, list):
      filefield = [filefield]

   for fileitem in filefield:
       if fileitem.filename:
          fn = os.path.basename(fileitem.filename)
          # save file
          with open('/var/www/site/files/' + fn, 'wb') as f:
              shutil.copyfileobj(fileitem.file, f)
          # line breaks are not occuring between interations
          print 'File "' + fn + '" was uploaded successfully<br/>'
          message = 'All files uploaded'
else:
   message = 'No file was uploaded'

print """
<p>%s</p>
</body></html>
""" % (message)

2 Answers 2

3

A form submit produces a single HTTP request that naturally uses a single TCP connection to deliver the request.

To upload multiple files in parallel make multiple requests in parallel. You might need flash or java applet on the client. Check whether javascript (AJAX) allows concurrent multiple form submissions.

Sign up to request clarification or add additional context in comments.

Comments

-1

Improving performance in general is an interesting subject. The key is measuring to find out what the bottleneck actually is.

I would imagine that the bottleneck either the browser loading the files to upload or the available bandwidth during load (or both).

If the bottleneck is saving the files from buffers (or moving temp files to their final location) then converting the for loop into threads may help - presuming that the web server has the I/O capacity to write all those files. But threading would only help if there are a good number files to write otherwise work to repeatedly spawn threads (or load a queue and start a set of threads) would be slower than looping to processing a few small files.

For the fun of it, here is an article on python threads and a work queue: http://www.ibm.com/developerworks/aix/library/au-threadingpython/

Reference on Python CGI and file uploading and multiple file uploading via POST:

3 Comments

there is no point to use threads on the server if browser uploads files one by one sequentially. The code in the question already accepts multiple files
True, but running parallel uploads won't help either if the situation is large files over a low bandwith connect - sequential uploads will be just as slow.
Practically uploading files in parallel is about 6 times faster than uploading them one by one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.