0

I have been trying to setup logging using the logging module in a Python script, and I have got it working properly. It can now log to both the console and a log file. But if fails when I setup a Dask cluster. The first log message is now replaced with invalid/null values in the log file, while the console output is fine.

A minimal working example is given below:

import logging

from dask.distributed import LocalCluster as Cluster

logfile_name='test.log'
logger=logging.getLogger(__name__)
logger.setLevel('DEBUG')
log_format=logging.Formatter(
        '%(asctime)s %(levelname)s: %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
        )
console_handler=logging.StreamHandler()
console_handler.setLevel('DEBUG')
console_handler.setFormatter(
        log_format)
file_handler=logging.FileHandler(
        logfile_name, mode='w')
file_handler.setLevel('DEBUG')
file_handler.setFormatter(log_format)
logger.addHandler(file_handler)
logger.addHandler(console_handler)

if __name__=='__main__':

    logger.info('First log message')

    cluster=Cluster(
            name='test', n_workers=1,
            threads_per_worker=1, memory_limit='1GiB',
            dashboard_address=':8000',
            )
    logger.info('Second log message')
    logger.info('Third log message')
    logger.info('Fourth log message')
    logger.info('Fifth log message')

    cluster.close()
    logging.shutdown()

The console log:

2024-09-22 15:28:30 INFO: First log message
2024-09-22 15:28:31 INFO: Second log message
2024-09-22 15:28:31 INFO: Third log message
2024-09-22 15:28:31 INFO: Fourth log message
2024-09-22 15:28:31 INFO: Fifth log message

And the log file contents (copy-pasted from Gedit):

\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\002024-09-22 15:41:46 INFO: Second log message
2024-09-22 15:41:46 INFO: Third log message
2024-09-22 15:41:46 INFO: Fourth log message
2024-09-22 15:41:46 INFO: Fifth log message

Notice the invalid/null values in the first line, and the first log message is missing.

After some digging around, I found out that if I remove the Dask cluster code, everything is fine. What is strange is that if I change the mode to append in FileHandler, the issue goes away without removing the Dask cluster initialisation.

How do I fix this issue without removing the Dask code or changing the file open mode?

4
  • This will happen if some other process truncates the file while it's still open by the python process. Commented Sep 22, 2024 at 14:33
  • @Barmar I don't have any other programs using the file, except a text editor to view the file. The same issue happens even if the file is not open in the text editor. Commented Sep 23, 2024 at 5:46
  • The log file (test.log) looks good until cluster.close(). Also, null values appears instead of actual text logged before cluster=Cluster( … ). Move that cluster=Cluster( … ) before the 1st logger.info( … ). Commented Sep 23, 2024 at 14:11
  • @JosefZ Huh, that's weird. Moving the cluster initialisation before any log messages fixed the issue. Any number of log messages I put before cluster initialisation have the same issue. But if I were to put one before if __name__=='__main__' block, it is correctly logged even though it is before the cluster initialisation. Do you know why it happens? Commented Sep 23, 2024 at 17:15

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.