6

here is the skinny (scroll down to see the problem): I am doing Huffman Encoding to compress a file using PHP (for a project). I have made the map, and made everything into a string like so:

00101010001100001110011101001101111011111011

Now, I need to convert that into an actual binary string, in its current state, it is only a string of 1s and 0s.

Here is the problem:

The string of 1s and 0s is 17,747,595 characters long, and it is really slowing down at around 550,000

This is the code I have:

<?php

$i=0
$len = strlen($binaryString);

while ($i < $len){
    $section = substr($binaryString,$i,$i+8);
    $out .= chr(bindec($section));
    $i=$i+8;
}

?>

How can I make this efficient enough to run the 17 million character string?

Thanks very much for any support!

8
  • 1
    Did you take a look @ stackoverflow.com/questions/6382738/… Commented Nov 6, 2012 at 19:52
  • 1
    Yes, base_convert won't accept it because it is too long :P Commented Nov 6, 2012 at 19:55
  • 1
    Don't write it to a variable in whole, but to some file cache after X bytes. That way, not the whole string is loaded on each iteration to append the next few bytes. Commented Nov 6, 2012 at 19:58
  • Yes, the original file being encoded is 4MB, and then broken out via Huffman to the 17m… I know there has to be an efficient way of doing this, I just dont know what it is lol. Commented Nov 6, 2012 at 19:59
  • 1
    why didnt you try to make bit stream instead of bit string? i mean just use 8 bit of the bye from the begining. it s because of locality of reference. Commented Nov 6, 2012 at 20:06

1 Answer 1

5

You don't need to loop you can use gmp with pack

$file = "binary.txt";
$string = file_get_contents($file);
$start = microtime(true);

// Convert the string
$string = simpleConvert($string);
//echo $string ;

var_dump(number_format(filesize($file),2),microtime(true)- $start);

function simpleConvert($string) {
    return pack('H*',gmp_strval(gmp_init($string, 2), 16));
}

Output

string '25,648,639.00' (length=13) <---- Length Grater than 17,747,595
float 1.0633520126343  <---------------- Total Conversion Time 

Links

Note Solution requires GMP Functions

Sign up to request clarification or add additional context in comments.

3 Comments

Wow! I like that approach, but I seem to be getting a "Segmentation fault" upon initializing GMP in gmp_init($string, 2); any ideas what that is about? (Yes, I have GMP installed :)
What version of PHP & GMP ?
Ahh… I was running it on PHP/5.2.10 GD/4.1.4, but I moved it to my server with PHP/5.4.7 GMP/4.3.2 and it works like a charm :) Well done! Thanks @Baba

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.