3

I'm trying to download a .png image via HTTP requests and upload it via HTTP to another location. My objective is to avoid saving the file on the disk so it's processed in-memory.

I have the code below:

  1. Download the file and convert it into a byte array:
resp = requests.get(
    'http://www.personal.psu.edu/crd5112/photos/PNG%20Example.png',
    stream=True)

img = BytesIO(resp.content)
  1. Upload the file to a remote HTTP repository
data=open(img.getvalue()).read()

r = requests.post(url=url, data=data, headers=headers, auth=HTTPBasicAuth('user', 'user'))

I'm getting a ValueError exception "embedded null byte" when reading the byte array.

If I save the file onto the disk and load it as below, then there is no error:

with open('file.png', 'wb') as pic:
  pic.write(img.getvalue())

Any advice on how I could achieve it without saving the file on the disk ?

3 Answers 3

7

I believe that the embedded null byte error is caused by a filename input requirement of a library that is supporting whatever operation is being executed in your code. By using a BytesIO object this presents itself to that library "as if" it is wrapped inside a file.

Here is sample code that I used when trying to address this same issue with a tar file. This code should be able to satisfy most file input requirements for various other libraries.

The key that I found here was using the BytesIO object around the remote_file.content being passed into the tarfile.open as a file object. Other techniques I attempted did not work.

from io import BytesIO
import requests
import tarfile

remote_file=requests.get ('https://download.site.com/files/file.tar.gz')

#Extract tarball contents to memory
tar=tarfile.open(fileobj=BytesIO(remote_file.content))
#Optionally print all folders / files within the tarball
print(tar.getnames())
tar.extractall('/home/users/Documents/target_directory/')

This eliminated the ValueError: embedded null byte and expected str, bytes or os.PathLike object, not _io.BytesIO errors that I was experiencing with other methods.

Sign up to request clarification or add additional context in comments.

Comments

3

Yes, you can do this without saving to the disk. Before that, the error occurred in line

data=open(img.getvalue()).read()

Since the inbuild string operation is not good with different encodings this error occured. use the pillow library to meddle with image realated situations

from io import BytesIO
from PIL import Image    
img = BytesIO(resp.content)
-#data=open(img).read()
+data = Image.open(img)

this will give you a following object type

<class 'PIL.PngImagePlugin.PngImageFile'>

you can use this data variable as your data in the upload post request

2 Comments

How would this be solved if the file was a PDF instead of an image?
@rafi you can use a method like this pdf_file = StringIO(r.content) existing_pdf = PdfFileReader(pdf_file) for PdfFileReader install PyPDF2 by pip install PyPDF2
1

@AmilaMGunawardana Thanks for the pointer.

I just had to save the image into a separate byte stream to get it uploaded properly:

img = BytesIO(resp.content)

data = Image.open(img, 'r')

buf = BytesIO()

data.save(buf, 'PNG')

r = requests.post(url=url, data=buf.getvalue(), headers=headers, auth=HTTPBasicAuth('user', 'user'))

1 Comment

Thats good but if you look into memory management use img variable to store the empty byte stream it will help towards speed and memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.