0

I am setting up a hash function that takes the MD5 of an object and tacks on the first four bytes of the object to prevent collisions. These objects can be quite large so I'd prefer to avoid serializing the entire object. What is the most space/time efficient way I can do this?

I've been looking at ObjectOutputStream and while it appears that there is a partial write function, it seems to require that I've already converted the object into a byte array.

9
  • 1
    What is the purpose of the serialization? Normally you'd like to be able to deserialize the byte stream to get the original object. Did you consider using transient fields? You can mark any field in your class as transient if you don't want to persist it. Commented Jul 11, 2014 at 21:36
  • 2
    What do you mean by "MD5 of an object"? What do you mean by "prevent collisions"? I don't think you're expressing your question very well. I also think ObjectOutputStream is already doing all of that, but you could be asking for something else, I'm not really sure. Commented Jul 11, 2014 at 21:38
  • writeObject(Object obj) is the function u are looking for where u can write logic to serialize object partially. Commented Jul 11, 2014 at 21:40
  • 2
    But tacking on the 'first four bytes of the object' won't prevent collisions. They aren't unique. MD5 is already almost certainly strong enough. You won't get the first four bytes of the object via serialization easily, as there is a stream header, serialization protocol tags, etc. to be navigated first. 'Prevent collisions' and 'space efficient' are contrary goals. You don't need this. Commented Jul 11, 2014 at 21:57
  • 1
    I've answered that. 'MD5 is almost certainly strong enough'. Commented Jul 11, 2014 at 22:04

1 Answer 1

2

I am setting up a hash function that takes the MD5 of an object and tacks on the first four bytes of the object to prevent collisions.

But tacking on the 'first four bytes of the object' won't prevent collisions. They aren't unique. MD5 is already almost certainly strong enough.

These objects can be quite large so I'd prefer to avoid serializing the entire object. What is the most space/time efficient way I can do this?

I've been looking at ObjectOutputStream and while it appears that there is a partial write function, it seems to require that I've already converted the object into a byte array.

You won't get the first four bytes of the object via serialization easily, as there is a stream header, serialization protocol tags, etc. to be navigated first. Not that there any point.

Re your comments, 'prevent collisions' and 'space efficient' are contrary goals.

You don't need this.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.