0

I'm trying to run a python script inside a docker container. The version is

 Cloud integration: 1.0.4
 Version:           20.10.2
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:12:42 2020
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:23 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

And I'm trying to read around 99k files from a directory using this function

import os
  
# Prepare a list of file names
corpus_path = 'data/cnn/'
corpus_filenames = []
i = 0
limit = 10000
for entry in os.scandir('data/cnn'):
    if not entry.name.startswith('.') and entry.is_file():
        print(entry)
        if limit is not None:
            if i >= limit:
                break
        corpus_filenames.append(os.path.join(corpus_path, entry.name))
        i += 1
# What did we find?
N_files = len(corpus_filenames)
print(N_files)

And I'm getting the error

    for entry in os.scandir(corpus_path):
OSError: [Errno 5] Input/output error: 'data/cnn/'

This error occurs only inside the docker container. But if i run this script outside it doesn't show any error and simply reads the files from the directory.

Also here's the Dockefile

FROM ubuntu:18.04
ENTRYPOINT [ "/bin/bash", "-l", "-i", "-c" ]

# Set the mirror for `apt-get` to talk to.  This seems to have helps a situation where some packages below
# will sometimes work and sometimes give an IP Not Found error.  It's still not perfect.
RUN sed --in-place --regexp-extended "s/(\/\/)(archive\.ubuntu)/\us.\2/" /etc/apt/sources.list && \
    apt-get update && apt-get upgrade --yes

# delete all the apt list files since they're big and get stale quickly
RUN rm -rf /var/lib/apt/lists/*
# this forces "apt-get update" in dependent images, which is also good
# (see also https://bugs.launchpad.net/cloud-images/+bug/1699913)

# enable the universe
RUN sed -i 's/^#\s*\(deb.*universe\)$/\1/g' /etc/apt/sources.list

# make systemd-detect-virt return "docker"
# See: https://github.com/systemd/systemd/blob/aa0c34279ee40bce2f9681b496922dedbadfca19/src/basic/virt.c#L434
RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container

# Clean cache and basic repository setup
RUN apt-get clean
RUN apt-get update && apt-get update --fix-missing
RUN apt-get install -y software-properties-common
RUN printf 'Y' | apt-get install apt-utils
RUN printf 'Y' | apt-get install vim
RUN apt-get update && export PATH
RUN apt-get install bc

# `libpython3.6-dev` is required for `python3-pip`
RUN printf 'Y' | apt-get install libpython3.6-dev
RUN printf 'Y' | apt-get install python3-pip

# AWS Python SDK and CLI installations
RUN apt-get install -y unzip
RUN apt-get install -y curl
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install

# Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# NLTK
RUN python3.6 -c "import nltk; nltk.download('stopwords'); \ 
    nltk.download('punkt'); \
    nltk.download('averaged_perceptron_tagger'); \
    nltk.download('maxent_ne_chunker'); \
    nltk.download('words');"
RUN cp -r /root/nltk_data /usr/share/nltk_data

# Set python 3.7 as the default for the container
RUN ln -s /usr/bin/python3.6 /usr/bin/python

# Set root password
RUN echo "root:##abc%%" | chpasswd

# Install sudo
RUN apt-get update && apt-get -y install sudo

# overwrite this with 'CMD []' in a dependent Dockerfile
CMD ["/bin/bash"]

# Create and boot into a development user instead of working as root
RUN groupadd -r sophia -g 901
RUN useradd -u 901 -r -g sophia sophia
RUN echo "sophia:##abc%%" | chpasswd
RUN adduser rmarkbio sudo
RUN mkdir /home/sophia
RUN mkdir /home/sophia/project
RUN mkdir /home/sophia/logs
RUN chown -R sophia /home/sophia
USER sophia
WORKDIR /home/sophia/project

PLEASE HELP! I have been trying to fix this for a while!!!

EDIT: Seems like my docker is not able to mount local directories correctly. My docker run script looks like this.

docker run -i -t \
            --entrypoint /bin/bash \
            --net="host" \
            --name=$CONTAINER_NAME \
            -v $PWD:/home/sophia/project \
            -v $PWD/../logs:/home/sophia/logs \
            -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
            -e GEMFURY_TOKEN=$GEMFURY_TOKEN \
            $USER_NAME/$IMAGE_NAME:$VERSION
        ;;
4
  • Can you include your Dockerfile and how you're running the container? Have you created the data/cnn directory in the container? Commented Jan 7, 2021 at 3:14
  • @tentative Yes the container is present inside the docker container. Also I have added the dockerfile contents above. Any ideas? TY! Commented Jan 7, 2021 at 22:59
  • Are you running the container with stdout and tty flags? Like this: docker run -it CONTAINER_NAME /bin/bash This question makes it seem like you'll have errors running python scripts without stdout. Commented Jan 8, 2021 at 2:29
  • I found the problem but I don't know the solution. I can read my files outside the docker container but not inside so it seems like the directory isn't mounting correctly. Inside the docker container if i 'ls' the file count is empty. I've added the docker run script above! Commented Jan 11, 2021 at 20:14

1 Answer 1

0

os.scandir() supposed to return a generator. But it seems not the case if run in docker. I have to run it outside and generate the results and mount the results.

Sign up to request clarification or add additional context in comments.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.