0

Guys please help me to overcome the issue I've faced while using set() function. when I run the bellow code the output of the file "iplist.txt" expected to be:

192.168.248.2
192.168.248.20

but it is as bellow:

1
.
4
2
0
9
6
8

And, output of print (a) is as bellow:

192.168.248.2
192.168.248.2
192.168.248.20
192.168.248.20

Here is the code:

for key, group in groupby(logfile, key=lambda e: e.split('.',1)[0]):
    for entry in group:
        c.update(re.findall(r'[0-9]+(?:\.[0-9]+){3}', entry))
    for ip, cnt in c.items():
       if cnt >= 5 and cnt <=10:
          newip.append(ip)
       elif cnt > 10:
          match = re.search(r'->\s*([0-9]+(?:\.[0-9]+){3})', entry)
          if match:
              a = match.group(1)
              print (a)

          with open("C:\\Users\Raz\\Desktop\\pythondemo\\iplist.txt", 'w+') as f:
              f.write('\n' .join(set(a))+'\n\n')
              f.close()
       else:
           print ("There are no malicious packets yet")

Here is the log.txt file containing IPs:

12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41676 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21

12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41676 -> 192.168.248.20:21
12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41673 -> 192.168.248.20:21

Now my question is:

  1. why print (a) shows duplicated IPs (not more and not less)?
  2. why set(a) extracts unique characters while I want unique IPs
9
  • Your code is wrongly indented and your output does not match the code. Please show the code to your output. Commented Jan 4, 2017 at 11:09
  • @Daniel here it is so, because of spacing... but it is right when I run it Commented Jan 4, 2017 at 11:18
  • 1
    please show the correct code. Indentation is important to understand the problem. Commented Jan 4, 2017 at 11:20
  • print(a) is executed many times and every time it prints only one IP - it doesn't know other IPs to compare. set(a) does set("192.168.248.2") because a is not list of all IPs but string with single IP. You have to keep all a on some list (ie. all_IP) and after you leave for loop do set(all_IP) Commented Jan 4, 2017 at 11:29
  • @Daniel I have edited with correct indentation Commented Jan 4, 2017 at 11:34

2 Answers 2

1

If the format of you log file remains exactly the same and doesn't changes then you can implement it with pandas as well, like this:

import pandas as pd

df = pd.read_csv('log.txt' , sep='\s+', header=None)

df[16]=df[16].apply(lambda x: x.split(':')[0])
print df[16].unique().tolist()

Output:

['192.168.248.2', '192.168.248.20']

If you don't want to use pandas then wait for other incoming answers.

Sign up to request clarification or add additional context in comments.

Comments

0

Your first problem is, that a is a string:

>>> set('192.168.248.20')
set(['.', '1', '0', '2', '4', '6', '9', '8'])

your second problem is, that you overwrite your file each time, a new entry is found (mode 'w+' instead of 'a')

The third problem is, that you never collect all IPs to build a set.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.