0

I'm trying to read a log file from a github url, add some geographic info using the IP as a lookup key, and then write some log info and the geographic info to a file. I've got the reading from and writing to file from the log, but I'm not sure what lib to use for looking up coordinates and such from an IP address, nor how to really go about this part. I found the regex module, and by the time I started to understand it, I found out it's deprecated. Here's what I've, got, any help would be great.

import urllib2 
apacheLog = 'https://raw.githubusercontent.com/myAccessLog.log'

data = urllib2.urlopen(apacheLog)
for line in data:
    with open('C:\LogCopy.txt','a') as f:
        f.write(line)
3
  • So, you are now trying to parse 'C:\LogCopy.txt'? Show what you have tried. Commented Feb 7, 2015 at 19:27
  • I'm writing to C:\LogCopy.txt from the file on github. The manipulation will happen before I write to LogCopy. I don't know what to use to break the lines up, besides some messy slicing, maybe. It looks like file is in Common Log Format, and I think I can use %x to pull pieces out, but I don't know if that is just for use with regex or what. I'm just not sure where to start. I'm not asking for the answer, just a push in the right direction. Commented Feb 7, 2015 at 19:39
  • without knowing what output you expect it is pretty had to give any reasonable answer, there is a re module you can use. Commented Feb 7, 2015 at 19:47

2 Answers 2

1
  1. The re module isn't deprecated, and is part of the standard library. Edit: here's the link for the 2.7 module
  2. Your for loop is opening and closing the file at each iteration. Probably not a big deal but it might be faster for large files to open the file once and write what needs to be written. Just swap the locations of the for and with lines.

So

data = urllib2.urlopen(apacheLog)
for line in data:
    with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
        f.write(line)

becomes

data = urllib2.urlopen(apacheLog)
with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
    for line in data.splitlines():
        f.write(line) # might need a newline character
        # f.write(line + '\n')
  1. Similar question regarding geolocation Python library

Best of luck!

Edit: added the data.splitlines() call after reading Piotr Kempa's answer

Sign up to request clarification or add additional context in comments.

Comments

1

Well the first part is simple. Just use for line in data.split('\n') assuming the lines end with a normal newline (they should).

Then you use the re module (import re) - I hope it was still in use in python 2.7... You can extract the IP address with something like re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line), look up the re.search() function for details how to use it.

As for locating the IP geographically, it was already asked I think, try this question: What python libraries can tell me approximate location and time zone given an IP address?

1 Comment

Oops, we posted two similar answers :) The part about moving the open() outside of the loop is a great suggestion in the other answer, you should follow it too!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.