0

I've got an old C++ source-code to parse an input file, and I removed some unrelated parts and reproduced it (logic is the same) as followed:

#define _CRT_SECURE_NO_DEPRECATE
#include <stdio.h>
#include <iostream>

typedef struct
{
    unsigned short int id;
    unsigned long size;
    unsigned long time;
    unsigned short int day;
    unsigned short int year;
    unsigned short int source;
    unsigned short int destination;
} header;

#define READ_SIZE    1024 * 2
#define MAX_LENGTH   1024
#define BUFFER_SIZE    READ_SIZE + MAX_LENGTH
const int SIZEOF_CHAR = sizeof(unsigned char);

int main()
{
    int index = 0;
    int offset = 0;
    header* hdrPtr;
    FILE* fp;
    char buff[BUFFER_SIZE];

    fp = fopen("data.dat", "rb");
    int numRead = fread(&buff[offset], SIZEOF_CHAR, READ_SIZE, fp);
    hdrPtr = (header*)(unsigned short*)&buff[index];

    return 0;
}

The input data.dat file is a binary file with certain rules of formatting/structure (honestly at this time I'm not sure what those rules are, and my job is to figure them out, then translate into a new C#.NET codebase). To make it easier I tested with some dummy text file, and some random binary files such as a .PDF or .MP4, and still got the results (hdrPtr). However, I'm still unable to understand those results.

For example, I tested with a data.dat text file with the content of:

Hello
World

I got the pointer to a header with these values (which I don't understand how these numbers are resulted):

id          25928       unsigned short
size        1460276591  unsigned long
time        1684828783  unsigned long
day         52428       unsigned short
year        52428       unsigned short
source      52428       unsigned short
destination 52428       unsigned short

The same with other input file data.dat, where it should be a binary. I don't have experience with C++ until last week, and it seems rather weird to me! How come we could cast an address of the first element of the array to, a pointer to an unsighed short, then cast to a pointer to the header struct (!?).

I'm struggling to convert the above code to C#. Any help/hint and recommendation is appreciated!

6
  • Technically this code is illegal. Any object can be viewed as an array of char, but the reverse is not true. What's happened is the bit pattern read from the file is being viewed as a structure, and since the bit pattern in the file isn't a structure the results are insane. Commented Aug 28, 2021 at 0:21
  • (header*) is an explicit type conversion. You should avoid these most of the time because they tell the compiler to turn off its brain and do exactly what you told it to do. You have to be absolutely certain you are correct and you have to actually be correct because if you aren't, there will be no warning. The compiler produces code that does exactly what you asked for no matter what the outcome will be at runtime. My rule of thumb when I see one of these casts is to assume there's a bug and examine the code more closely. Commented Aug 28, 2021 at 0:25
  • 1
    The int numRead = fread(&buff[offset], SIZEOF_CHAR, READ_SIZE, fp); would have been a lot safer as header* hdr; int numRead = fread(&hdr, sizeof(hdr), 1, fp);, but there are still a few gotchas. The file might not contain a valid header and there's no way to check other than to read the header, check numRead to ensure you read enough bytes, and then sanity-check the values read. The byte order of the numbers could be backward. The size of the integers could be different from what was written. Believe it or not, I've seen 32 bit short. Commented Aug 28, 2021 at 0:36
  • @user If you are going to use fread(&hdr, sizeof(hdr), 1, fp); then header* hdr; should be header hdr; instead Commented Aug 28, 2021 at 2:16
  • Yep. Error above. header* hdr; int numRead = fread(&hdr, sizeof(hdr), 1, fp); Should be header hdr; int numRead = fread(&hdr, sizeof(hdr), 1, fp); I removed the ptr from the identifier, but neglected to remove the * that made the variable a pointer in the first place. Commented Aug 29, 2021 at 15:45

1 Answer 1

1

Something like this?

public struct Header
{
    public ushort id;
    public ulong size;
    public ulong time;
    public ushort day;
    public ushort year;
    public ushort source;
    public ushort destination;

    public static Header ReadHeader(FileStream fs, int index = 0)
    {
        var br = new BinaryReader(fs);
        Header header = default(Header);
        int offset = Marshal.SizeOf(header) * index;
        br.ReadBytes(offset); // eat up bytes until header with index is reached
        header.id = br.ReadUInt16();
        header.size = br.ReadUInt64();
        header.time = br.ReadUInt64();
        header.day = br.ReadUInt16();
        header.year = br.ReadUInt16();
        header.source = br.ReadUInt16();
        header.destination = br.ReadUInt16();
        return header;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var fs = File.OpenRead("data.dat");
        int index = 0;
        var header = Header.ReadHeader(fs, index);
        
        Debug.WriteLine($"ID={header.id}, Size={header.size}");
    }
}
Sign up to request clarification or add additional context in comments.

7 Comments

My C# is virtually non-existent, so I gotta ask: Is endian taken into consideration at all?
I assume the magic behind ReadUInt64() read in the file with the correct endianness but I could be wrong.
I can't see how it can do that since it doesn't know the endianness of the numbers in the file.
@PaulSanders ReadUInt64() assumes the number is stored in little-endian only. So if it is stored in big-endian instead, you are going to have trouble.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.