6
\$\begingroup\$

Write a codec that

  • encodes list of bytes by calculating the difference between the values (0-255) of consecutive bytes, converting those differences to their hexadecimal representation, and outputting them as fixed 2-character hex values. If a byte has a smaller value than the previous one, the difference should be treated as though the values "wrap around" using an overflow mechanism, as if subtracting within a circular range from 0 to 255.
  • decodes an encoded hexadecimal string back to its original form by reversing the encoding process

In both encoding and decoding, the first value is directly converted, while the following values are calculated as currentValue - previousValue

Encoding Example

Input: Hi!

Output: 4821B8

Explanation:

  • H has an ASCII value of 720x48.
  • The difference between i (ASCII 105) and H (ASCII 72) is 105 - 72 = 330x21.
  • The difference between ! (ASCII 33) and i (ASCII 105) is 33 - 105 = -72 + 256 = 1840xB8.

Decoding Example

Input: 4821B8

Output: Hi!

Explanation:

  • The first 2 bytes 0x48 convert to decimal 72H.
  • The next 2 bytes 0x21 are 33 in decimal. Adding this to 72 gives 105i.
  • The last 2 bytes 0xB8 are 184 in decimal. Adding this to the previous value 105, accounting for the overflow, gives 105 + 184 = 289 mod 256 = 33!.

More examples

Hi! <-> 4821B8
Hello World! <-> 481D070003B1371803FAF8BD
aaaaaaaaaa <-> 61000000000000000000
9876543210 <-> 39FFFFFFFFFFFFFFFFFF
abcdefghijklmnopqrstuvwxyz <-> 6101010101010101010101010101010101010101010101010101
ZYXWVUTSRQPONMLKJIHGFEDCBA <-> 5AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Rules

  • Hex values can be upper or lower case (i.e. 422A or 2a), as long as encoder and decoder are consistent with the format
  • The encoding input can be a string, a list/array of bytes, or any type in the language that can represent text. The output must be a string
  • The decoding input must be a string. The output can be a string, a list/array of bytes, or any type in the language that can represent text
  • All inputs can be considered valid, so no null-checking, empty strings or 0-sized arrays/lists
  • You can chose to implement the codec as two separate programs or as a single one, as long as both encoding and decoding are present
  • This is a code-golf challenge, shortest answer in bytes wins. If encoding and decoding are separated, the sum of the two will be added to calculate the score
\$\endgroup\$
5
  • \$\begingroup\$ Can the decoding input also be a list of bytes? \$\endgroup\$ Commented Oct 22, 2024 at 10:52
  • \$\begingroup\$ No the decoding input must be the same as the encoding output, a string \$\endgroup\$ Commented Oct 22, 2024 at 11:30
  • \$\begingroup\$ Why would this even make sense? Arguably, the string form "4821B8" is more arbitrary than the byte form b"H!\xB8". \$\endgroup\$ Commented Oct 22, 2024 at 11:33
  • 1
    \$\begingroup\$ A byte sequence as input for the decoder renders the encoding into hex strings unnecessary, as the byte values can be directly interpreted in their decimal form. This eliminates the need to first convert hex pairs back into decimal, trivialising the decoding process by working with the raw byte data instead of an intermediary hex representation \$\endgroup\$ Commented Oct 22, 2024 at 11:53
  • \$\begingroup\$ The usual term for this is "delta encoding" (The wikipedia article conflates it with distributing files as a diff vs. the last version, but the code example is this byte-delta codec, without the hexdump part of course.) \$\endgroup\$ Commented Oct 23, 2024 at 22:08

11 Answers 11

4
\$\begingroup\$

JavaScript (Node.js), 85 bytes

Both functions expect and return a string.

Encoder (47 bytes)

s=>Buffer(s).map(c=>-p+(p=c),p=0).toString`hex`

Try it online!

Decoder (38 bytes)

s=>Buffer(s,"hex").map(c=>p+=c,p=0)+""

Try it online!

\$\endgroup\$
2
\$\begingroup\$

Charcoal, 38 bytes

Encoder, 20 bytes

⭆θΦ⍘⁺⁵¹²⁻℅ι℅§⁺ψθκ¹⁶μ

Try it online! Link is to verbose version of code. Explanation:

 θ                      Input string
⭆                       Map over characters and join
          ι             Current character
         ℅              Take the ASCII code
        ⁻               Subtract
              ψ         Predefined variable null byte
             ⁺          Concatenated with
               θ        Input string
            §           Indexed by
                κ       Current index
           ℅            Take the ASCII code
    ⁺                   Plus
     ⁵¹²                Literal integer `512`
   ⍘                    Converted to base
                 ¹⁶     Literal integer `16`
  Φ                     Filtered where
                   μ    Not first character
                        Implicitly print

Decoder, 18 bytes:

F⪪S²℅﹪Σ⊞Oυ↨ι¹⁶¦²⁵⁶

Try it online! Link is to verbose version of code. Explanation:

F⪪S²

Split the input into pairs of hex digits and loop over each pair in turn.

℅﹪Σ⊞Oυ↨ι¹⁶¦²⁵⁶

Convert the hex to decimal and add it to the running total, reducing that modulo 256 and converting back to ASCII.

\$\endgroup\$
2
\$\begingroup\$

x86 32-bit machine code functions, 58 bytes

Encoder is 27 bytes, using Peter Ferrie's DAS trick for integer to hex.
(Would be 26 in 16-bit mode, with stosw not needing a prefix).

x86 doesn't have a reverse-subtract like ARM's rsb, so I ended up using sub/neg which feels like it could be smaller. (Future x86 with APX will have 3-operand support even for old instructions, but that's 64-bit mode only and IIRC might need a 4-byte EVEX prefix, or at least a 2-byte REX2, so no help here.) Maybe I could have used neg / xadd? But xadd is 3 bytes, same as sub+xchg

NASM listing with line numbers and addresses stripped, just machine code and source

                         ; EDI: char *output    ESI: const uint8_t *input     ECX: length (of input in bytes)
                         hexdelta_encode:
 31 D2                       xor  edx,edx         ; last = 0
                           .loop:
 AC                             lodsb            ; current = *input++
 92                             xchg    eax, edx  ; last(DL)=current , current=old_last
 28 D0                          sub     al, dl
 F6 D8                          neg     al        ; AL = old_last - current
 D4 10                          aam     16        ; AH = AL/16 = high nibble.  AL %= 16

 3C 0A                          cmp al, 10
 1C 69                          sbb al, 69h
 2F                             das              ; AL = ASCII hex digit

 86 E0                          xchg  al, ah     ; least-significant nibble to AH, giving correct printing order
 3C 0A                          cmp al, 10
 1C 69                          sbb al, 69h
 2F                             das
 66 AB                          stosw            ; 2 bytes.  *EDI++ = AX

 E2 E8                          loop  .loop
 C3                           ret 

Decoder is 31 bytes; would be 29 in 16-bit code (both 66 prefix bytes disappear, the 32-bit xchg can use 16-bit operand-size).
I don't know a similar hack for hex->integer decoding so I just did it the obvious way. sub ax, '00' is 4 bytes, same as if I'd done sub al, '0' twice.

                         ; EDI: uint8_t *output    ESI: char *input     ECX: length (of output in bytes)
                         ;align 16
                         hexdelta_decode:
 31 D2                       xor  edx, edx
                          .loop:
 66 AD                       lodsw                       ; high digit in AL, low digit in AH
                             ; decode to a byte in AL
 66 2D 30 30                 sub  ax, '00'

 3C 09                       cmp  al, 9
 76 02                       jbe  .low_decimal
 2C 07                       sub  al, 7
                          .low_decimal:
 86 E0                      xchg al, ah            ; reverse from printing order and put the other digit in AL for short-form encodings
 3C 09                       cmp  al, 9
 76 02                       jbe  .high_decimal
 2C 07                       sub  al, 7
                          .high_decimal:

 D5 10                       aad  16               ; AL += AH<<4   AH becomes high nibble
 00 D0                       add  al, dl
 AA                          stosb
 92                          xchg eax, edx         ; save the reconstructed value for next time
 E2 E4                       loop .loop
 C3                          ret

Tested with 'Hi!' and "Hello World" on Linux, nasm -felf32 / ld -m elf_i386 to link a static executable.

align 16
writebuf:
  mov   ebx, 1
  mov   ecx, outbuf
  mov   eax, 4
  int   0x80       ; write(1, outbuf, 8)
  ret

global _start
_start:
  push 'Hi!'
  mov   esi, esp
  mov   edi, outbuf
  mov   ecx, 3 
  call  hexdelta_encode

  mov   edx, 6
  call  writebuf

  mov   esi, hello_hexdelta
  mov   edi, outbuf
  mov   ecx, hello_hexdelta.size / 2
  call  hexdelta_decode
  
  mov   edx, hello_hexdelta.size / 2
  call  writebuf

  mov  eax, 1
  xor  ebx, ebx
  int  0x80

section .data
  hello_hexdelta: db "481D070003B1371803FAF8BD"
  .size: equ $-hello_hexdelta

section .bss
  outbuf: resb 1024 

If you wanted this to run fast for non-tiny inputs, encode could process a SIMD vector in parallel (two loads offset by 1 byte to feed vpsubb), with AVX-512VBMI2 making conversion to hex possible in a couple single-uop instructions per vector of data. (https://stackoverflow.com/questions/53823756/how-to-convert-a-binary-integer-number-to-a-hex-string)

Decode isn't trivially parallel, but SIMD prefix-sums are a thing, e.g. see this for 4-byte integers, and my comments there for links to other stuff. And vpsadbw against a zeroed vector to sum bytes within each qword of a vector gives fast lookahead which could help ILP.

\$\endgroup\$
2
\$\begingroup\$

Google Sheets, 221 bytes

Encoder, 106 bytes

=sort(let(l,len(A1),a,code(mid(A1,sequence(1,l),1)),b,join(,dec2hex(mod({a,0}-{0,a},256),2)),left(b,2*l)))

Decoder, 115 bytes

=sort(reduce(,hex2dec(mid(B1,sequence(1,len(B1)/2,1,2),2)),lambda(a,c,a&char(mod(iferror(code(right(a)))+c,256)))))

screenshot

Ungolfed:

=sort(let( 
  a, code(mid(A1, sequence(1, len(A1)), 1)), 
  b, join(, dec2hex(mod({ a, 0 } - { 0, a }, 256), 2)), 
  left(b, len(b) - 2) 
))
=sort(let( 
  v, hex2dec(mid(B1, sequence(1, len(B1) / 2, 1, 2), 2)), 
  reduce(, v, lambda(a, c, a & char(mod(iferror(code(right(a))) + c, 256)))) 
))
\$\endgroup\$
1
\$\begingroup\$

Python 3.8 (pre-release), 177 bytes

Encoder, 98 bytes

lambda s:bytes([*s]and[ord(s[0])]+[(ord(j)-ord(i))%256for i,j in zip(s[:-1],s[1:])]).hex().upper()

-8 bytes if the .upper() isn't required, and -7 bytes if the empty strings don't have to be handled.

Try it online!

Decoder, 79 bytes

lambda b:''.join(chr(sum(bytes.fromhex(b)[:i+1])%256)for i in range(len(b)//2))

-12 bytes (lambda b:[sum(bytes.fromhex(b)[:i+1])%256for i in range(len(b)//2)]) if the result can be a list of Unicode codepoints.

Try it online!

\$\endgroup\$
1
\$\begingroup\$

Vyxal, 24 bytes

Encoder: 16 bytes

C₌h¨p‡$-₈%JH2↳›∑

Try it Online!

Decoder: 8 bytes

2ẇH¦₈%C∑

Try it Online!

Try a test suite of encode and decode at the same time


Encoder Explained

C₌h¨p‡$-₈%JH2↳›∑­⁡​‎‎⁡⁠⁡‏‏​⁡⁠⁡‌⁢​‎‎⁡⁠⁢‏⁠‎⁡⁠⁣‏‏​⁡⁠⁡‌⁣​‎‎⁡⁠⁤‏⁠‎⁡⁠⁢⁡‏⁠‎⁡⁠⁢⁢‏‏​⁡⁠⁡‌⁤​‎‎⁡⁠⁢⁣‏⁠‎⁡⁠⁢⁤‏‏​⁡⁠⁡‌⁢⁡​‎‎⁡⁠⁣⁡‏⁠‎⁡⁠⁣⁢‏‏​⁡⁠⁡‌⁢⁢​‎‎⁡⁠⁣⁣‏‏​⁡⁠⁡‌⁢⁣​‎‎⁡⁠⁣⁤‏‏​⁡⁠⁡‌⁢⁤​‎‎⁡⁠⁤⁡‏⁠‎⁡⁠⁤⁢‏⁠‎⁡⁠⁤⁣‏‏​⁡⁠⁡‌⁣⁡​‎‎⁡⁠⁤⁤‏‏​⁡⁠⁡‌­
C                 # ‎⁡Convert input to list of character codes
 ₌h               # ‎⁢Extract the head of that list for later, and also
   ¨p‡            # ‎⁣to each overlapping pair, reduce by:
      $-          # ‎⁤  subtraction with reversed arguments
        ₈%        # ‎⁢⁡Modulo each value by 256 in the result
          J       # ‎⁢⁢And put the original head character back
           H      # ‎⁢⁣Convert each number to hexadecimal
            2↳›   # ‎⁢⁤Pad with 0s as needed
               ∑  # ‎⁣⁡and join into a single string
💎

Created with the help of Luminespire.

Decoder Explained

2ẇH¦₈%C∑­⁡​‎‎⁡⁠⁡‏⁠‎⁡⁠⁢‏‏​⁡⁠⁡‌⁢​‎‎⁡⁠⁣‏‏​⁡⁠⁡‌⁣​‎‎⁡⁠⁤‏‏​⁡⁠⁡‌⁤​‎‎⁡⁠⁢⁡‏⁠‎⁡⁠⁢⁢‏‏​⁡⁠⁡‌⁢⁡​‎‎⁡⁠⁢⁣‏‏​⁡⁠⁡‌⁢⁢​‎‎⁡⁠⁢⁤‏‏​⁡⁠⁡‌­
2ẇ        # ‎⁡Split the input into pairs
  H       # ‎⁢Convert each pair from hexadecimal
   ¦      # ‎⁣Cumulative sums
    ₈%    # ‎⁤modulo 256
      C   # ‎⁢⁡Convert each number to the character corresponding to the character code
       ∑  # ‎⁢⁢Join into a single string
💎

Created with the help of Luminespire.

\$\endgroup\$
1
\$\begingroup\$

Perl 5 77 bytes

Encoder:

Perl 5 -F, 47 bytes

map{printf'%02x',(256-ord($p)+ord)%256;$p=$_}@F

Try it online!

Decoder:

Perl 5 -p, 30 bytes

s/../chr($t=($t+hex$&)%256)/ge

Try it online!

\$\endgroup\$
1
  • \$\begingroup\$ Shaves 6 bytes off encoder: Try it online! And 2 bytes off decoder: Try it online! \$\endgroup\$ Commented Oct 24, 2024 at 23:01
1
\$\begingroup\$

C (gcc), 116 bytes

Encoder: 50 bytes

f(char*t){*t&&printf("%02X",*t-t[-1]&255)+f(t+1);}

Exploiting a quirk in GCC when run in TIO, where string constants appear to be null-terminated at both the beginning and the end.

Try it online!

Decoder: 66 bytes

l;f(char*h){!*h?l=0:putchar(l+=strtol(strndup(h,2),0,16))+f(h+2);}

Try it online!

\$\endgroup\$
1
  • 1
    \$\begingroup\$ re: the quirk: is it possible that this happens because string constants are placed after other, also null-t string constants, thus str[-1] is the null terminator of the previous string constant stored there? \$\endgroup\$ Commented Oct 24, 2024 at 9:15
1
\$\begingroup\$

Python 3.8 (pre-release), 149 139 bytes

Encoder (72 69 bytes)

Contributors: Arnauld -2 bytes; Lucenaposition -1 byte;

lambda t:"".join(map(lambda a,b:f"{ord(b)-ord(a)&255:02x}","\0"+t,t))

Try it online!

Takes a string as input and returns a string. Can handle empty string.

Decoder (77 70 bytes)

Contributors: squareroot12621 -5 bytes; Lucenaposition -2 bytes;

f=lambda h,l=0:h and chr(int(h[:2],16)+l&255)+f(h[2:],int(h[:2],16)+l)

Try it online!

Takes a string as input and returns a string. Can handle empty string.

Expanded code:

# This is for the original submission
def decoder(hex_string, last_value = 0):

    # Stop recursion if end of string is reached
    if hex_string == "":
        return ""

    # Take first character from the hex input
    decoded_from_hex = int(hex_string[:2],16)

    # Calculate the current character's value:
    current_value = decoded_from_hex + last_value

    # Recursively evaluate the remaining part of the input string and return decoded message
    return chr(current_value % 256) + decoder(hex_string[2:], current_value)
\$\endgroup\$
2
  • \$\begingroup\$ You can do ord(b)-ord(a)&255 to get rid of the parentheses. \$\endgroup\$ Commented Oct 22, 2024 at 16:15
  • \$\begingroup\$ 72 bytes for the decoder: f=lambda h,l=0:h and chr((int(h[:2],16)+l)%256)+f(h[2:],int(h[:2],16)+l) \$\endgroup\$ Commented Oct 22, 2024 at 21:14
1
\$\begingroup\$

05AB1E, 21 20 (12+8) bytes

Encoder:

Ç0š¥₁%₁+h€¦J

Input as string (or character-list); output as string.

Try it online or verify all test cases.

Decoder:

2ôHηO₁%ç

Input as string; output as character-list.

Try it online or verify all test cases.

Explanation:

Ç             # Convert each char of the (implicit) input to a codepoint-integer
 0š           # Prepend a 0 to this list
   ¥          # Pop and take its forward differences
    ₁%        # Modulo-256 each
      ₁+      # Add 256 to each
        h     # Convert those integers to hexadecimal triplets
         €¦   # Remove the leading "1" from each triplet
           J  # Join this list of hexadecimal-pairs together
              # (after which the result is output implicitly)

2ô            # Split the (implicit) input-string into pairs
  H           # Convert each pair from hexadecimal to a base-10 integer
   η          # Get the prefixes of this list
    O         # Sum each inner prefix
     ₁%       # Modulo-256
       ç      # Convert each integer to an ASCII character with this codepoint
              # (after which the result is output implicitly)
\$\endgroup\$
0
\$\begingroup\$

Haskell, 130 bytes

f x n=mapM(\_->x)[0..n]
e s=do(x,y)<-zip<*>(0:)$fromEnum<$>s;f"0123456789ABCDEF"1!!mod(x-y)256
d s=[y|y<-[0..]>>=f['\0'..],e y==s]

e is the encoder, and d is the decoder. The output of the decoder is represented as answer:invalid1:invalid2:...:_|_. If this representation is invalid, the decoder may be rewritten as:

d s=head[y|y<-[0..]>>=f['\0'..],e y==s]

for an additional 4 bytes.

Try it online!

\$\endgroup\$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.