How to use set() in python

Question

Guys please help me to overcome the issue I've faced while using set() function. when I run the bellow code the output of the file "iplist.txt" expected to be:

192.168.248.2
192.168.248.20

but it is as bellow:

And, output of print (a) is as bellow:

192.168.248.2
192.168.248.2
192.168.248.20
192.168.248.20

Here is the code:

for key, group in groupby(logfile, key=lambda e: e.split('.',1)[0]):
    for entry in group:
        c.update(re.findall(r'[0-9]+(?:\.[0-9]+){3}', entry))
    for ip, cnt in c.items():
       if cnt >= 5 and cnt <=10:
          newip.append(ip)
       elif cnt > 10:
          match = re.search(r'->\s*([0-9]+(?:\.[0-9]+){3})', entry)
          if match:
              a = match.group(1)
              print (a)

          with open("C:\\Users\Raz\\Desktop\\pythondemo\\iplist.txt", 'w+') as f:
              f.write('\n' .join(set(a))+'\n\n')
              f.close()
       else:
           print ("There are no malicious packets yet")

Here is the log.txt file containing IPs:

12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41676 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21

12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41676 -> 192.168.248.20:21
12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41673 -> 192.168.248.20:21

Now my question is:

why print (a) shows duplicated IPs (not more and not less)?
why set(a) extracts unique characters while I want unique IPs

Your code is wrongly indented and your output does not match the code. Please show the code to your output. — Daniel
– Daniel, Commented Jan 4, 2017 at 11:09
@Daniel here it is so, because of spacing... but it is right when I run it — Raz Hamraz
– Raz Hamraz, Commented Jan 4, 2017 at 11:18
please show the correct code. Indentation is important to understand the problem. — Daniel
– Daniel, Commented Jan 4, 2017 at 11:20
print(a) is executed many times and every time it prints only one IP - it doesn't know other IPs to compare. set(a) does set("192.168.248.2") because a is not list of all IPs but string with single IP. You have to keep all a on some list (ie. all_IP) and after you leave for loop do set(all_IP) — furas
– furas, Commented Jan 4, 2017 at 11:29

Mohammad Yusuf · Accepted Answer · 2017-01-04 11:10:51Z

1

If the format of you log file remains exactly the same and doesn't changes then you can implement it with pandas as well, like this:

import pandas as pd

df = pd.read_csv('log.txt' , sep='\s+', header=None)

df[16]=df[16].apply(lambda x: x.split(':')[0])
print df[16].unique().tolist()

Output:

['192.168.248.2', '192.168.248.20']

If you don't want to use pandas then wait for other incoming answers.

answered Jan 4, 2017 at 11:10

Mohammad Yusuf

17.1k12 gold badges60 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel · Accepted Answer · 2017-01-04 11:13:31Z

0

Your first problem is, that a is a string:

>>> set('192.168.248.20')
set(['.', '1', '0', '2', '4', '6', '9', '8'])

your second problem is, that you overwrite your file each time, a new entry is found (mode 'w+' instead of 'a')

The third problem is, that you never collect all IPs to build a set.

answered Jan 4, 2017 at 11:13

Daniel

42.9k4 gold badges57 silver badges82 bronze badges

Collectives™ on Stack Overflow

How to use set() in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related