December 15, 2017

Fixing corrupt archive in Golang

Recently I had an issue sending JSON via API to our analytics tool. After few days of troubleshooting, I realized what was the problem during the weekend when the file being uploaded was of lower size due to less amount of data during weekends. Obviously, I decided to zip its contents, afterward facing a problem with a corrupted archive.

The process which we use to upload data to our analytics tools is the following:

  1. Query data from datastore (adding some filters, calculations etc.) for the last 24 hours

  2. Upload file to GCP Storage (in case of failure, we can manually upload it)

  3. Download file from GCP Storage, then send it via HTTP request

A Cron job doing this runs every midnight. As mentioned earlier, one of the entities had troubles with exporting, due to its size.

I decided to compress the file using archive/zip but had an issue with Corrupted archives. On Mac, I would get the following error when trying to open the archive:

This is how I tried to compress my file:

 1func GetZip(c context.Context, path string) ([]byte, error) {
 2    client, err := storage.NewClient(c)
 3    if err != nil {
 4        return nil, err
 5    }
 6
 7    appID := appengine.AppID(c)
 8    bucketH := client.Bucket(bucketName[appID])
 9
10    // Downloading file from bucket (GCP Storage)
11    reader, err := bucketH.Object(path + "/~.json").NewReader(c)
12    if err != nil {
13        return nil, err
14    }
15
16    defer reader.Close()
17
18    bts, err := ioutil.ReadAll(reader)
19
20    // Fixing JSON issues we had
21    bts = bytes.TrimSuffix(bts, []byte(","))
22    content := bytes.NewReader(bts)
23
24    buf := new(bytes.Buffer)
25    w := zip.NewWriter(buf)
26
27    defer w.Close()
28
29    f, err := w.Create("report.json")
30    if err != nil {
31        return nil, err
32    }
33
34    _, err = f.Write([]byte("["))
35    if err != nil {
36        return nil, err
37    }
38
39    _, err = io.Copy(f, content)
40    if err != nil {
41        return nil, err
42    }
43
44    _, err = f.Write([]byte("]"))
45    if err != nil {
46        return nil, err
47    }
48
49    return buf.Bytes(), nil
50}

Notice the highlighted line. Searching for a solution brought me to test of archive/zip on golang.org, which instead of deferring closing the zip writer, manually closed it at the end. By doing this little change to my code (err := w.Close() instead of defer w.Close()) I was able to fix my issue. At the end, this is what I got (minus the JSON alteration part):

 1func GetZip(c context.Context, path string) ([]byte, error) {
 2    client, err := storage.NewClient(c)
 3    if err != nil {
 4        return nil, err
 5    }
 6
 7    appID := appengine.AppID(c)
 8    bucketH := client.Bucket(bucketName[appID])
 9
10    reader, err := bucketH.Object(path + "/~.json").NewReader(c)
11    if err != nil {
12        return nil, err
13    }
14
15    defer reader.Close()
16
17    buf := new(bytes.Buffer)
18    w := zip.NewWriter(buf)
19
20    f, err := w.Create("report.json")
21    if err != nil {
22        return nil, err
23    }
24
25    _, err = io.Copy(f, reader)
26    if err != nil {
27        return nil, err
28    }
29
30    err = w.Close()
31    if err != nil {
32        return nil, err
33    }
34
35    return buf.Bytes(), nil
36}

This occurs because explicitly closing the zip writer “finishes writing the zip file by writing the central directory”. On the other hand, defers occur after the return statement, so call to buf.Bytes() is happening before the call to Close() and therefore doesn’t have the “central directory.”

2018 © Emir Ribic - Some rights reserved; please attribute properly and link back. Code snippets are MIT Licensed