30

I am looking to convert a string array to a byte array in GO so I can write it down to a disk. What is an optimal solution to encode and decode a string array ([]string) to a byte array ([]byte)?

I was thinking of iterating the string array twice, first one to get the actual size needed for the byte array and then a second one to write the length and actual string ([]byte(str)) for each element.

The solution must be able to convert it the other-way; from a []byte to a []string.

1
  • 1
    We need a little more to go on to suggest a good solution. Do you only need to read and write this file from Go? If so, encoding/gob is a great solution. Is there a delimiter such as \n that you could use instead of writing the lengths? If so, strings.Join and strings.Split might be good. Otherwise, what are the requirements for the file format? Does it need to be human readable? Note that hardly any solutions require you to convert []string to []byte before writing to the disk. More generally I think you want to serialize []string to a file and then be able to read it back in again. Commented Nov 26, 2012 at 22:32

8 Answers 8

31

Lets ignore the fact that this is Go for a second. The first thing you need is a serialization format to marshal the []string into.

There are many option here. You could build your own or use a library. I am going to assume you don't want to build your own and jump to serialization formats go supports.

In all examples, data is the []string and fp is the file you are reading/writing to. Errors are being ignored, check the returns of functions to handle errors.

Gob

Gob is a go only binary format. It should be relatively space efficient as the number of strings increases.

enc := gob.NewEncoder(fp)
enc.Encode(data)

Reading is also simple

var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)

Gob is simple and to the point. However, the format is only readable with other Go code.

Json

Next is json. Json is a format used just about everywhere. This format is just as easy to use.

enc := json.NewEncoder(fp)
enc.Encode(data)

And for reading:

var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)

XML

XML is another common format. However, it has pretty high overhead and not as easy to use. While you could just do the same you did for gob and json, proper xml requires a root tag. In this case, we are using the root tag "Strings" and each string is wrapped in an "S" tag.

type Strings struct {
    S []string
}

enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})

var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S

CSV

CSV is different from the others. You have two options, use one record with n rows or n records with 1 row. The following example uses n records. It would be boring if I used one record. It would look too much like the others. CSV can ONLY hold strings.

enc := csv.NewWriter(fp)
for _, v := range data {
    enc.Write([]string{v})
}
enc.Flush()

To read:

var err error
var data string
dec := csv.NewReader(fp)
for err == nil {        // reading ends when an error is reached (perhaps io.EOF)
    var s []string

    s, err = dec.Read()
    if len(s) > 0 {
        data = append(data, s[0])
    }
}

Which format you use is a matter of preference. There are many other possible encodings that I have not mentioned. For example, there is an external library called bencode. I don't personally like bencode, but it works. It is the same encoding used by bittorrent metadata files.

If you want to make your own encoding, encoding/binary is a good place to start. That would allow you to make the most compact file possible, but I hardly thing it is worth the effort.

Sign up to request clarification or add additional context in comments.

Comments

12

The gob package will do this for you http://godoc.org/encoding/gob

Example to play with http://play.golang.org/p/e0FEZm-qiS

same source code is below.

package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
)

func main() {
    // store to byte array
    strs := []string{"foo", "bar"}
    buf := &bytes.Buffer{}
    gob.NewEncoder(buf).Encode(strs)
    bs := buf.Bytes()
    fmt.Printf("%q", bs)

    // Decode it back
    strs2 := []string{}
    gob.NewDecoder(buf).Decode(&strs2)
    fmt.Printf("%v", strs2)
}

Comments

2

to convert []string to []byte

var str = []string{"str1","str2"}
var x = []byte{}

for i:=0; i<len(str); i++{
    b := []byte(str[i])
    for j:=0; j<len(b); j++{
        x = append(x,b[j])
    }
}

to convert []byte to string

str := ""
var x = []byte{'c','a','t'}
for i := 0; i < len(x); i++ {
    str += string(x[i])
}

3 Comments

The code doesn't compile; it's not valid Go code. The loop could be written more idiomatically and simply as: for _, s := range str { x = append(x, s...) }. It doesn't solve the problem: "The solution must be able to convert it the other-way; from a []byte to a string[]."
No, convert (encode) from []string to []byte then convert (decode) from []byte to []string. This is harder than you think.
I've posted an simple solution, as an answer, to illustrate the problem.
2

To illustrate the problem, convert []string to []byte and then convert []byte back to []string, here's a simple solution:

package main

import (
    "encoding/binary"
    "fmt"
)

const maxInt32 = 1<<(32-1) - 1

func writeLen(b []byte, l int) []byte {
    if 0 > l || l > maxInt32 {
        panic("writeLen: invalid length")
    }
    var lb [4]byte
    binary.BigEndian.PutUint32(lb[:], uint32(l))
    return append(b, lb[:]...)
}

func readLen(b []byte) ([]byte, int) {
    if len(b) < 4 {
        panic("readLen: invalid length")
    }
    l := binary.BigEndian.Uint32(b)
    if l > maxInt32 {
        panic("readLen: invalid length")
    }
    return b[4:], int(l)
}

func Decode(b []byte) []string {
    b, ls := readLen(b)
    s := make([]string, ls)
    for i := range s {
        b, ls = readLen(b)
        s[i] = string(b[:ls])
        b = b[ls:]
    }
    return s
}

func Encode(s []string) []byte {
    var b []byte
    b = writeLen(b, len(s))
    for _, ss := range s {
        b = writeLen(b, len(ss))
        b = append(b, ss...)
    }
    return b
}

func codecEqual(s []string) bool {
    return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}

func main() {
    var s []string
    fmt.Println("equal", codecEqual(s))
    s = []string{"", "a", "bc"}
    e := Encode(s)
    d := Decode(e)
    fmt.Println("s", len(s), s)
    fmt.Println("e", len(e), e)
    fmt.Println("d", len(d), d)
    fmt.Println("equal", codecEqual(s))
}

Output:

equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true

1 Comment

how is something this simple not included in the go standard library?
2

It can be done easily using strings package. First you need to convert the slice of string to a string.

func Join(elems []string, sep string) string

You need to pass the slice of strings and the separator you need to separate the elements in the string. (examples: space or comma)

Then you can easily convert the string to a slice of bytes by type conversion.

package main

import (
    "fmt"
    "strings"
)

    func main() {
    //Slice of Strings
    sliceStr := []string{"a","b","c","d"}
    fmt.Println(sliceStr) //prints [a b c d]

    //Converting slice of String to String
    str := strings.Join(sliceStr,"")
    fmt.Println(str)  // prints abcd

    //Converting String to slice of Bytes
    sliceByte := []byte(str) //prints [97 98 99 100]
    fmt.Println(sliceByte)

    //Converting slice of bytes a String
    str2 := string(sliceByte)
    fmt.Println(str2) // prints abcd

    //Converting string to a slice of Strings
    sliceStr2 := strings.Split(str2,"")
    fmt.Println(sliceStr2) //prints [a b c d]
}

1 Comment

Hm! Now that's a truly intriguing solution. I'm curious only about the performance issues — it's scary under many of the other answers. Yours, however, seems to be ok, and leaves the nitty-gritty details to the strings package. I'm using strings. Join() all that time but never thought to use it to convert the whole array of strings into an array of bytes... you need more upvotes on this answer :-)
1

I would suggest to use PutUvarint and Uvarint for storing/retrieving len(s) and using []byte(str) to pass str to some io.Writer. With a string length known from Uvarint, one can buf := make([]byte, n) and pass the buf to some io.Reader.

Prepend the whole thing with length of the string array and repeat the above for all of its items. Reading the whole thing back is again reading first the outer length and repeating n-times the item read.

Comments

1

You can do something like this:

var lines = []string
var ctx = []byte{}
for _, s := range lines {
    ctx = append(ctx, []byte(s)...)
}

Comments

0

I had a similar problem and ran across this question. This may not address all use-cases, but I preferred it to using gob, json, etc.. This will convert an array of strings into one long string with newlines separating the original string array, and convert the string into a byte array.

    mystrings := []string{"ay", "bee", "see"} // array of strings
    longstr := strings.Join(mystrings, "\n") // long string with newlines

    bytes := []byte(longstr) // the resulting byte array.

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.