0

I'm using sqlite with python. I'm implementing the POP3 protocol. I have a table

msg_id text
date text
from_sender text
subject text
body text
hashkey text

Now I need to check for duplicate messages by checking the message id of the message retrieved against the existing msg_id's in the table. I encrypted the msg_id using md5 and put it in the hashkey column. Whenever I retrieve mail, I hash the message id and check it with the table values. Heres what I do.

def check_duplicate(new):
    conn = sql.connect("mail")
    c = conn.cursor()
    m = hashlib.md5()
    m.update(new)
    c.execute("select hashkey from mail")
    for row in c:
        if m.hexdigest() == row:
            return 0
        else:
            continue

    return 1

It just refuses to work correctly. I tried printing the row value, it shows it in unicode, thats where the problem lies as it cannot compare properly.

Is there a better way to do this, or to improve my method?

3
  • Just curious - why are you hashing the msg_id field before doing the comparison? Is there some reason you can't compare the msg_id's? Commented Nov 17, 2010 at 19:25
  • @Bob: O(1) for each comparison against existing strings in the table. (Instead of O(n).) This is known as interning strings, see: en.wikipedia.org/wiki/String_interning . Commented Nov 17, 2010 at 19:29
  • Also: MD5 is a hash algorithm, not "encryption". You're hashing the msg_id, not encrypting it. Commented Nov 17, 2010 at 19:30

3 Answers 3

4

Well, if your only problem is with the comparison, then you could try:

if m.hexdigest() == row[0]:

since row is a tuple and not a string, but your basic strategy seems wrong to me. You're retrieving the hashkey for every row from the database, and then doing your own search for the right one. Much better to make the database do the search for you. The database is likely to be better at searching (since it probably has an index on the hashkey field—you did create an index for this field, didn't you?) and it only has to send one result to you, saving time. So you could issue a query like this to determine if the message exists:

m.execute('select exists(select * from mail where hashkey=?)', m.hexdigest())

A final point of style: Python has True and False, so there's no need to use 1 and 0 for Booleans.

Sign up to request clarification or add additional context in comments.

1 Comment

Though, curiously enough, it hasn't always had True and False. So you can do fun things like (False + 1) == 1, which is True. =)
0

Might be an idea to ask MySQL to search for the hash key:

select count(*) from mail where hashkey = 'TheHashKey'

Comments

0

The main issue is that you're trying to compare a Python string (m.hexdigest()) with a tuple.

Additionally, another poster's suggestion that you use SQL for the comparison is probably good advice. Another SQL suggestion would be to fix your columns -- TEXT for everything probably isn't what you want; an index on your hashkey column is very likely a good thing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.