0

Question here for you. I have this script for python that checks large datasets for emails and extracts them. On my mac it just displays all the email addresses in the terminal. Sometimes the files are 1-2 gigs so it can take a bit and the output is insane. I was wondering how easy in Python is it to have it just save to a file instead of printing it all out in terminal.

I dont even need to see it all being dumped into the terminal.

Here is the script I am working with

#!/usr/bin/env python
#
# Extracts email addresses from one or more plain text files.
#
# Notes:
# - Does not save to file (pipe the output to a file if you want it saved).
# - Does not check for duplicates (which can easily be done in the terminal).
#


from optparse import OptionParser
import os.path
import re

regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
                    "\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))

def file_to_str(filename):
    """Returns the contents of filename as a string."""
    with open(filename) as f:
        return f.read().lower() # Case is lowered to prevent regex mismatches.

def get_emails(s):
    """Returns an iterator of matched emails found in string s."""
    # Removing lines that start with '//' because the regular expression
    # mistakenly matches patterns like 'http://[email protected]' as '//[email protected]'.
    return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))

if __name__ == '__main__':
    parser = OptionParser(usage="Usage: python %prog [FILE]...")
    # No options added yet. Add them here if you ever need them.
    options, args = parser.parse_args()

    if not args:
        parser.print_usage()
        exit(1)

    for arg in args:
        if os.path.isfile(arg):
            for email in get_emails(file_to_str(arg)):
                print email
        else:
            print '"{}" is not a file.'.format(arg)
            parser.print_usage()
2
  • 2
    myProgram.py > outputFile.txt? Don't need to touch the program itself at all. Commented Aug 22, 2018 at 17:54
  • Thank you so much. Commented Aug 22, 2018 at 17:57

3 Answers 3

2

Instead of printing, just write to a file instead.

with open('filename.txt', 'w') as f: f.write('{}\n'.format(email))

Sign up to request clarification or add additional context in comments.

Comments

0

First, you need to open a file: file = open('output', 'w')

Then, instead of printing the email, write it in the file: file.write(email + '\n')

You can also just redirect the output of the program to a file at execution time as jasonharper said.

Comments

0

While printing , replace with write statement

    for arg in args:
    if os.path.isfile(arg):
        for email in get_emails(file_to_str(arg)):
            print email

In that ,just replace with

    for arg in args:
    if os.path.isfile(arg):
        for email in get_emails(file_to_str(arg)):
            with open (tempfile , 'a+') as writefile:
                writefile.write(name+'\n')

tempfile is location of your output file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.