Hex diff encoder

Question

Write a codec that

encodes list of bytes by calculating the difference between the values (0-255) of consecutive bytes, converting those differences to their hexadecimal representation, and outputting them as fixed 2-character hex values. If a byte has a smaller value than the previous one, the difference should be treated as though the values "wrap around" using an overflow mechanism, as if subtracting within a circular range from 0 to 255.
decodes an encoded hexadecimal string back to its original form by reversing the encoding process

In both encoding and decoding, the first value is directly converted, while the following values are calculated as currentValue - previousValue

Encoding Example

Input: Hi!

Output: 4821B8

Explanation:

H has an ASCII value of 72 → 0x48.
The difference between i (ASCII 105) and H (ASCII 72) is 105 - 72 = 33 → 0x21.
The difference between ! (ASCII 33) and i (ASCII 105) is 33 - 105 = -72 + 256 = 184 → 0xB8.

Decoding Example

Input: 4821B8

Output: Hi!

Explanation:

The first 2 bytes 0x48 convert to decimal 72 → H.
The next 2 bytes 0x21 are 33 in decimal. Adding this to 72 gives 105 → i.
The last 2 bytes 0xB8 are 184 in decimal. Adding this to the previous value 105, accounting for the overflow, gives 105 + 184 = 289 mod 256 = 33 → !.

More examples

Hi! <-> 4821B8
Hello World! <-> 481D070003B1371803FAF8BD
aaaaaaaaaa <-> 61000000000000000000
9876543210 <-> 39FFFFFFFFFFFFFFFFFF
abcdefghijklmnopqrstuvwxyz <-> 6101010101010101010101010101010101010101010101010101
ZYXWVUTSRQPONMLKJIHGFEDCBA <-> 5AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Rules

Hex values can be upper or lower case (i.e. 42 → 2A or 2a), as long as encoder and decoder are consistent with the format
The encoding input can be a string, a list/array of bytes, or any type in the language that can represent text. The output must be a string
The decoding input must be a string. The output can be a string, a list/array of bytes, or any type in the language that can represent text
All inputs can be considered valid, so no null-checking, empty strings or 0-sized arrays/lists
You can chose to implement the codec as two separate programs or as a single one, as long as both encoding and decoding are present
This is a code-golf challenge, shortest answer in bytes wins. If encoding and decoding are separated, the sum of the two will be added to calculate the score

No the decoding input must be the same as the encoding output, a string — mastaH
– mastaH, Commented Oct 22, 2024 at 11:30
Why would this even make sense? Arguably, the string form "4821B8" is more arbitrary than the byte form b"H!\xB8". — squareroot12621
– squareroot12621, Commented Oct 22, 2024 at 11:33
A byte sequence as input for the decoder renders the encoding into hex strings unnecessary, as the byte values can be directly interpreted in their decimal form. This eliminates the need to first convert hex pairs back into decimal, trivialising the decoding process by working with the raw byte data instead of an intermediary hex representation — mastaH
– mastaH, Commented Oct 22, 2024 at 11:53
The usual term for this is "delta encoding" (The wikipedia article conflates it with distributing files as a diff vs. the last version, but the code example is this byte-delta codec, without the hexdump part of course.) — Peter Cordes
– Peter Cordes, Commented Oct 23, 2024 at 22:08

Arnauld · Accepted Answer · 2024-10-22 16:44:55Z

4

JavaScript (Node.js), 85 bytes

Both functions expect and return a string.

Encoder (47 bytes)

s=>Buffer(s).map(c=>-p+(p=c),p=0).toString`hex`

Try it online!

Decoder (38 bytes)

s=>Buffer(s,"hex").map(c=>p+=c,p=0)+""

Try it online!

edited Oct 22, 2024 at 16:44

answered Oct 22, 2024 at 10:44

Arnauld

206k21 gold badges187 silver badges670 bronze badges

Add a comment |

Neil · Accepted Answer · 2024-10-22 15:25:24Z

Charcoal, 38 bytes

Encoder, 20 bytes

⭆θΦ⍘⁺⁵¹²⁻℅ι℅§⁺ψθκ¹⁶μ

Try it online! Link is to verbose version of code. Explanation:

 θ                      Input string
⭆                       Map over characters and join
          ι             Current character
         ℅              Take the ASCII code
        ⁻               Subtract
              ψ         Predefined variable null byte
             ⁺          Concatenated with
               θ        Input string
            §           Indexed by
                κ       Current index
           ℅            Take the ASCII code
    ⁺                   Plus
     ⁵¹²                Literal integer `512`
   ⍘                    Converted to base
                 ¹⁶     Literal integer `16`
  Φ                     Filtered where
                   μ    Not first character
                        Implicitly print

Decoder, 18 bytes:

Ｆ⪪Ｓ²℅﹪Σ⊞Ｏυ↨ι¹⁶¦²⁵⁶

Try it online! Link is to verbose version of code. Explanation:

Ｆ⪪Ｓ²

Split the input into pairs of hex digits and loop over each pair in turn.

℅﹪Σ⊞Ｏυ↨ι¹⁶¦²⁵⁶

Convert the hex to decimal and add it to the running total, reducing that modulo 256 and converting back to ASCII.

Peter Cordes · Accepted Answer · 2024-10-24 03:11:45Z

x86 32-bit machine code functions, 58 bytes

Encoder is 27 bytes, using Peter Ferrie's DAS trick for integer to hex.
(Would be 26 in 16-bit mode, with stosw not needing a prefix).

x86 doesn't have a reverse-subtract like ARM's rsb, so I ended up using sub/neg which feels like it could be smaller. (Future x86 with APX will have 3-operand support even for old instructions, but that's 64-bit mode only and IIRC might need a 4-byte EVEX prefix, or at least a 2-byte REX2, so no help here.) Maybe I could have used neg / xadd? But xadd is 3 bytes, same as sub+xchg

NASM listing with line numbers and addresses stripped, just machine code and source

                         ; EDI: char *output    ESI: const uint8_t *input     ECX: length (of input in bytes)
                         hexdelta_encode:
 31 D2                       xor  edx,edx         ; last = 0
                           .loop:
 AC                             lodsb            ; current = *input++
 92                             xchg    eax, edx  ; last(DL)=current , current=old_last
 28 D0                          sub     al, dl
 F6 D8                          neg     al        ; AL = old_last - current
 D4 10                          aam     16        ; AH = AL/16 = high nibble.  AL %= 16

 3C 0A                          cmp al, 10
 1C 69                          sbb al, 69h
 2F                             das              ; AL = ASCII hex digit

 86 E0                          xchg  al, ah     ; least-significant nibble to AH, giving correct printing order
 3C 0A                          cmp al, 10
 1C 69                          sbb al, 69h
 2F                             das
 66 AB                          stosw            ; 2 bytes.  *EDI++ = AX

 E2 E8                          loop  .loop
 C3                           ret

Decoder is 31 bytes; would be 29 in 16-bit code (both 66 prefix bytes disappear, the 32-bit xchg can use 16-bit operand-size).
I don't know a similar hack for hex->integer decoding so I just did it the obvious way. sub ax, '00' is 4 bytes, same as if I'd done sub al, '0' twice.

                         ; EDI: uint8_t *output    ESI: char *input     ECX: length (of output in bytes)
                         ;align 16
                         hexdelta_decode:
 31 D2                       xor  edx, edx
                          .loop:
 66 AD                       lodsw                       ; high digit in AL, low digit in AH
                             ; decode to a byte in AL
 66 2D 30 30                 sub  ax, '00'

 3C 09                       cmp  al, 9
 76 02                       jbe  .low_decimal
 2C 07                       sub  al, 7
                          .low_decimal:
 86 E0                      xchg al, ah            ; reverse from printing order and put the other digit in AL for short-form encodings
 3C 09                       cmp  al, 9
 76 02                       jbe  .high_decimal
 2C 07                       sub  al, 7
                          .high_decimal:

 D5 10                       aad  16               ; AL += AH<<4   AH becomes high nibble
 00 D0                       add  al, dl
 AA                          stosb
 92                          xchg eax, edx         ; save the reconstructed value for next time
 E2 E4                       loop .loop
 C3                          ret

Tested with 'Hi!' and "Hello World" on Linux, nasm -felf32 / ld -m elf_i386 to link a static executable.

align 16
writebuf:
  mov   ebx, 1
  mov   ecx, outbuf
  mov   eax, 4
  int   0x80       ; write(1, outbuf, 8)
  ret

global _start
_start:
  push 'Hi!'
  mov   esi, esp
  mov   edi, outbuf
  mov   ecx, 3 
  call  hexdelta_encode

  mov   edx, 6
  call  writebuf

  mov   esi, hello_hexdelta
  mov   edi, outbuf
  mov   ecx, hello_hexdelta.size / 2
  call  hexdelta_decode
  
  mov   edx, hello_hexdelta.size / 2
  call  writebuf

  mov  eax, 1
  xor  ebx, ebx
  int  0x80

section .data
  hello_hexdelta: db "481D070003B1371803FAF8BD"
  .size: equ $-hello_hexdelta

section .bss
  outbuf: resb 1024

If you wanted this to run fast for non-tiny inputs, encode could process a SIMD vector in parallel (two loads offset by 1 byte to feed vpsubb), with AVX-512VBMI2 making conversion to hex possible in a couple single-uop instructions per vector of data. (https://stackoverflow.com/questions/53823756/how-to-convert-a-binary-integer-number-to-a-hex-string)

Decode isn't trivially parallel, but SIMD prefix-sums are a thing, e.g. see this for 4-byte integers, and my comments there for links to other stuff. And vpsadbw against a zeroed vector to sum bytes within each qword of a vector gives fast lookahead which could help ILP.

doubleunary · Accepted Answer · 2024-10-24 13:21:36Z

2

Google Sheets, 221 bytes

Encoder, 106 bytes

=sort(let(l,len(A1),a,code(mid(A1,sequence(1,l),1)),b,join(,dec2hex(mod({a,0}-{0,a},256),2)),left(b,2*l)))

Decoder, 115 bytes

=sort(reduce(,hex2dec(mid(B1,sequence(1,len(B1)/2,1,2),2)),lambda(a,c,a&char(mod(iferror(code(right(a)))+c,256)))))

screenshot

Ungolfed:

=sort(let( 
  a, code(mid(A1, sequence(1, len(A1)), 1)), 
  b, join(, dec2hex(mod({ a, 0 } - { 0, a }, 256), 2)), 
  left(b, len(b) - 2) 
))

=sort(let( 
  v, hex2dec(mid(B1, sequence(1, len(B1) / 2, 1, 2), 2)), 
  reduce(, v, lambda(a, c, a & char(mod(iferror(code(right(a))) + c, 256)))) 
))

edited Oct 24, 2024 at 13:21

answered Oct 22, 2024 at 14:35

doubleunary

4,6721 gold badge6 silver badges16 bronze badges

Add a comment |

squareroot12621 · Accepted Answer · 2024-10-22 11:49:28Z

1

Python 3.8 (pre-release), 177 bytes

Encoder, 98 bytes

lambda s:bytes([*s]and[ord(s[0])]+[(ord(j)-ord(i))%256for i,j in zip(s[:-1],s[1:])]).hex().upper()

-8 bytes if the .upper() isn't required, and -7 bytes if the empty strings don't have to be handled.

Try it online!

Decoder, 79 bytes

lambda b:''.join(chr(sum(bytes.fromhex(b)[:i+1])%256)for i in range(len(b)//2))

-12 bytes (lambda b:[sum(bytes.fromhex(b)[:i+1])%256for i in range(len(b)//2)]) if the result can be a list of Unicode codepoints.

Try it online!

answered Oct 22, 2024 at 11:49

squareroot12621

1,2371 gold badge3 silver badges18 bronze badges

Add a comment |

lyxal · Accepted Answer · 2024-10-23 00:51:28Z

Vyxal, 24 bytes

Encoder: 16 bytes

C₌h¨p‡$-₈%JH2↳›∑

Try it Online!

Decoder: 8 bytes

2ẇH¦₈%C∑

Try it Online!

Try a test suite of encode and decode at the same time

Encoder Explained

C₌h¨p‡$-₈%JH2↳›∑⁡‎‎⁪⁡⁪⁠⁪⁡⁪‏‏⁡⁠⁡‌⁢‎‎⁪⁡⁪⁠⁪⁢⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁪‏‏⁡⁠⁡‌⁣‎‎⁪⁡⁪⁠⁪⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁢⁪‏‏⁡⁠⁡‌⁤‎‎⁪⁡⁪⁠⁪⁢⁣⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁤⁪‏‏⁡⁠⁡‌⁢⁡‎‎⁪⁡⁪⁠⁪⁣⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁢⁪‏‏⁡⁠⁡‌⁢⁢‎‎⁪⁡⁪⁠⁪⁣⁣⁪‏‏⁡⁠⁡‌⁢⁣‎‎⁪⁡⁪⁠⁪⁣⁤⁪‏‏⁡⁠⁡‌⁢⁤‎‎⁪⁡⁪⁠⁪⁤⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁤⁢⁪‏⁠‎⁪⁡⁪⁠⁪⁤⁣⁪‏‏⁡⁠⁡‌⁣⁡‎‎⁪⁡⁪⁠⁪⁤⁤⁪‏‏⁡⁠⁡‌
C                 # ‎⁡Convert input to list of character codes
 ₌h               # ‎⁢Extract the head of that list for later, and also
   ¨p‡            # ‎⁣to each overlapping pair, reduce by:
      $-          # ‎⁤  subtraction with reversed arguments
        ₈%        # ‎⁢⁡Modulo each value by 256 in the result
          J       # ‎⁢⁢And put the original head character back
           H      # ‎⁢⁣Convert each number to hexadecimal
            2↳›   # ‎⁢⁤Pad with 0s as needed
               ∑  # ‎⁣⁡and join into a single string
💎

Created with the help of Luminespire.

Decoder Explained

2ẇH¦₈%C∑⁡‎‎⁪⁡⁪⁠⁪⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁪‏‏⁡⁠⁡‌⁢‎‎⁪⁡⁪⁠⁪⁣⁪‏‏⁡⁠⁡‌⁣‎‎⁪⁡⁪⁠⁪⁤⁪‏‏⁡⁠⁡‌⁤‎‎⁪⁡⁪⁠⁪⁢⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁢⁪‏‏⁡⁠⁡‌⁢⁡‎‎⁪⁡⁪⁠⁪⁢⁣⁪‏‏⁡⁠⁡‌⁢⁢‎‎⁪⁡⁪⁠⁪⁢⁤⁪‏‏⁡⁠⁡‌
2ẇ        # ‎⁡Split the input into pairs
  H       # ‎⁢Convert each pair from hexadecimal
   ¦      # ‎⁣Cumulative sums
    ₈%    # ‎⁤modulo 256
      C   # ‎⁢⁡Convert each number to the character corresponding to the character code
       ∑  # ‎⁢⁢Join into a single string
💎

Created with the help of Luminespire.

Xcali · Accepted Answer · 2024-10-23 16:05:40Z

1

Perl 5 77 bytes

Encoder:

Perl 5 `-F`, 47 bytes

map{printf'%02x',(256-ord($p)+ord)%256;$p=$_}@F

Try it online!

Decoder:

Perl 5 `-p`, 30 bytes

s/../chr($t=($t+hex$&)%256)/ge

Try it online!

answered Oct 23, 2024 at 16:05

Xcali

17.1k2 gold badges17 silver badges42 bronze badges

\$\begingroup\$ Shaves 6 bytes off encoder: Try it online! And 2 bytes off decoder: Try it online! \$\endgroup\$

Kjetil S
– Kjetil S

2024-10-24 23:01:41 +00:00
Commented Oct 24, 2024 at 23:01

Add a comment |

jdt · Accepted Answer · 2024-10-23 16:42:24Z

1

C (gcc), 116 bytes

Encoder: 50 bytes

f(char*t){*t&&printf("%02X",*t-t[-1]&255)+f(t+1);}

Exploiting a quirk in GCC when run in TIO, where string constants appear to be null-terminated at both the beginning and the end.

Try it online!

Decoder: 66 bytes

l;f(char*h){!*h?l=0:putchar(l+=strtol(strndup(h,2),0,16))+f(h+2);}

Try it online!

edited Oct 23, 2024 at 16:42

answered Oct 23, 2024 at 14:29

jdt

5,1441 gold badge16 silver badges21 bronze badges

1

\$\begingroup\$ re: the quirk: is it possible that this happens because string constants are placed after other, also null-t string constants, thus str[-1] is the null terminator of the previous string constant stored there? \$\endgroup\$

Themoonisacheese
– Themoonisacheese

2024-10-24 09:15:32 +00:00
Commented Oct 24, 2024 at 9:15

Add a comment |

NotEvenIndian · Accepted Answer · 2024-10-24 10:39:42Z

Python 3.8 (pre-release), 149 139 bytes

Encoder (72 69 bytes)

Contributors: Arnauld -2 bytes; Lucenaposition -1 byte;

lambda t:"".join(map(lambda a,b:f"{ord(b)-ord(a)&255:02x}","\0"+t,t))

Try it online!

Takes a string as input and returns a string. Can handle empty string.

Decoder (77 70 bytes)

Contributors: squareroot12621 -5 bytes; Lucenaposition -2 bytes;

f=lambda h,l=0:h and chr(int(h[:2],16)+l&255)+f(h[2:],int(h[:2],16)+l)

Try it online!

Takes a string as input and returns a string. Can handle empty string.

Expanded code:

# This is for the original submission
def decoder(hex_string, last_value = 0):

    # Stop recursion if end of string is reached
    if hex_string == "":
        return ""

    # Take first character from the hex input
    decoded_from_hex = int(hex_string[:2],16)

    # Calculate the current character's value:
    current_value = decoded_from_hex + last_value

    # Recursively evaluate the remaining part of the input string and return decoded message
    return chr(current_value % 256) + decoder(hex_string[2:], current_value)

You can do ord(b)-ord(a)&255 to get rid of the parentheses. — Arnauld
– Arnauld, Commented Oct 22, 2024 at 16:15
72 bytes for the decoder: f=lambda h,l=0:h and chr((int(h[:2],16)+l)%256)+f(h[2:],int(h[:2],16)+l) — squareroot12621
– squareroot12621, Commented Oct 22, 2024 at 21:14

Kevin Cruijssen · Accepted Answer · 2024-10-24 13:14:58Z

05AB1E, 21 20 (12+8) bytes

Encoder:

Ç0š¥₁%₁+h€¦J

Input as string (or character-list); output as string.

Try it online or verify all test cases.

Decoder:

2ôHηO₁%ç

Input as string; output as character-list.

Try it online or verify all test cases.

Explanation:

Ç             # Convert each char of the (implicit) input to a codepoint-integer
 0š           # Prepend a 0 to this list
   ¥          # Pop and take its forward differences
    ₁%        # Modulo-256 each
      ₁+      # Add 256 to each
        h     # Convert those integers to hexadecimal triplets
         €¦   # Remove the leading "1" from each triplet
           J  # Join this list of hexadecimal-pairs together
              # (after which the result is output implicitly)

2ô            # Split the (implicit) input-string into pairs
  H           # Convert each pair from hexadecimal to a base-10 integer
   η          # Get the prefixes of this list
    O         # Sum each inner prefix
     ₁%       # Modulo-256
       ç      # Convert each integer to an ASCII character with this codepoint
              # (after which the result is output implicitly)

Antonio Ponce · Accepted Answer · 2024-10-24 15:26:07Z

0

Haskell, 130 bytes

f x n=mapM(\_->x)[0..n]
e s=do(x,y)<-zip<*>(0:)$fromEnum<$>s;f"0123456789ABCDEF"1!!mod(x-y)256
d s=[y|y<-[0..]>>=f['\0'..],e y==s]

e is the encoder, and d is the decoder. The output of the decoder is represented as answer:invalid1:invalid2:...:_|_. If this representation is invalid, the decoder may be rewritten as:

d s=head[y|y<-[0..]>>=f['\0'..],e y==s]

for an additional 4 bytes.

Try it online!

edited Oct 24, 2024 at 15:26

answered Oct 24, 2024 at 15:07

Antonio Ponce

3111 silver badge5 bronze badges

Add a comment |

Hex diff encoder

Encoding Example

Decoding Example

More examples

Rules

11 Answers 11

JavaScript (Node.js), 85 bytes

Encoder (47 bytes)

Decoder (38 bytes)

Charcoal, 38 bytes

Encoder, 20 bytes

Decoder, 18 bytes:

x86 32-bit machine code functions, 58 bytes

Google Sheets, 221 bytes

Encoder, 106 bytes

Decoder, 115 bytes

Python 3.8 (pre-release), 177 bytes

Encoder, 98 bytes

Decoder, 79 bytes

Vyxal, 24 bytes

Encoder: 16 bytes

Decoder: 8 bytes

Encoder Explained

Decoder Explained

Perl 5 77 bytes

Encoder:

Perl 5 -F, 47 bytes

Decoder:

Perl 5 -p, 30 bytes

C (gcc), 116 bytes

Encoder: 50 bytes

Decoder: 66 bytes

Python 3.8 (pre-release), 149 139 bytes

Encoder (72 69 bytes)

Decoder (77 70 bytes)

Expanded code:

05AB1E, 21 20 (12+8) bytes

Haskell, 130 bytes

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions

Perl 5 `-F`, 47 bytes

Perl 5 `-p`, 30 bytes