2

This is a bit of a strange problem I have here. I'm working with C# 6 on the .NET platform on a binary compression algorithm. Multiple stages of compression are working great, even far better than expected! However, converting the unoptimized binary back into a file is proving to be a bit more of a headache than I expected.

The Case

Binary is being read from an arbitrary file, and passed along within the program as a string. Multiple waves of optimization work on the string, converting it into an intermediate representation, which is written as the compressed object. Then, deoptimization turns the intermediate form back into pure binary, ready to be written.

The Code

Binary Input

BinaryString = "";  Filename = filename;
StringBuilder sb = new StringBuilder();
foreach(byte b in File.ReadAllBytes(filename)) {
{
   sb.Append(Convert.ToString(b, 2).PadLeft(8, '0'));  
}
BinaryString = sb.ToString();

This is how I'm accepting input. It will return a literal binary string, in the form of 11001010110001

The conversion from its intermeidate form returns exactly the same string.

Binary

Output Currently, I'm trying to directly write a binary file as bytes, as such:

List<Byte> bytes = new List<byte>();
foreach(char c in binary)
   bytes.Add(Convert.ToByte(c));
File.WriteAllBytes(filename, bytes.ToArray());

The Problem

The method I'm trying right now for binary output is simply writing the binary outright to a text file, rather than writing a binary object to the filesystem.

We're compressing pictures, executables, text, git objects, etc. So it's obviously not feasible whatsoever to have it written like this.

Does there exist a method in C#/.NET that will easily let me translate the binary back into a file, or is this a more involved problem than I'm thinking?

7
  • 4
    Why in the world would you turn the binary data into a string in the first place instead of just dealing with the data as bytes? Commented Dec 12, 2016 at 18:31
  • ^^see comment above Commented Dec 12, 2016 at 18:32
  • Binary data usually has a format associated with the data. You just can't read of write the data. For example if you write a 32 bit integer, two 8 bit bytes. When you read you have to read the data in the same sizes. So when you write binary your have to read it back exactly the same way it was written. Commented Dec 12, 2016 at 18:36
  • 2
    I suggest you rewrite all your code that is dealing with "binary strings" to work with the raw bytes instead. I would imagine it would result in a huge performance increase. Commented Dec 12, 2016 at 18:37
  • Because of how the compression itself is working, it's impossible for the algorithm to work on bytes as opposed to strings, at least when it comes to output. Its storage as text is an integral feature of the algorithm. So, it's going to be completely untenable to make the algorithm workable? Commented Dec 12, 2016 at 18:48

1 Answer 1

2

There are 8 bits in a byte but you are trying to convert 1 bit at a time to a byte. You need to gather up 8 bits first, then convert that to byte using the Convert.ToByte() overload that accepts fromBase:

Replace this code:

// Note: I assume you meant to reference `BinaryString` here
// and not "binary" which isn't defined in your example
foreach(char c in BinaryString)
   bytes.Add(Convert.ToByte(c));

With this:

var thisByte = string.Empty;
foreach (char c in BinaryString)
{
    thisByte += c;
    if (thisByte.Length == 8)
    {
        bytes.Add(Convert.ToByte(thisByte, 2));
        thisByte = string.Empty;
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.