Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
May 26, 2021 03:26 pm GMT

Fetching and reading files from S3 using Go

Trying to figure out how to do simple tasks using the AWS SDK for particular services can be difficult given that sometimes the AWS documentation is limited and gives you the bare minimum. Today I'll show you how to fetch and read particular files from S3 using Go. This tutorial collates many hours of research into what should be a simple problem.

Prerequisites include:

  • Go installed / previous experience with Go.
  • AWS-SDK set up / previous development with AWS-SDK.

Basic imports

import (    "encoding/json"    "fmt"    "io/ioutil"    "log"    "github.com/aws/aws-lambda-go/lambda"    "github.com/aws/aws-sdk-go/aws"    "github.com/aws/aws-sdk-go/aws/session"    "github.com/aws/aws-sdk-go/service/s3")

Defining global variables and structs.

Start off by defining some basic structs and global variables.

type S3Bucket struct {    Bucket string `json:"bucket"`    Key    string `json:"key"`}type Metrics struct {    RMSE         string      `json:"rmse"`    MAE        string      `json:"mae"`    MAPE        string      `json:"mape"`}var pageNum int = 0var s3Buckets []S3Bucketvar finalMetrics []Metricsvar sess *session.Session

Initiating a session.

Firstly we initialise a session that the SDK uses to load credentials from the shared credentials file ~/.aws/credentials, and create a new Amazon S3 service client.

sess, err := session.NewSession(&aws.Config{        Region: aws.String(conf.AWS_REGION),    })if err != nil {    exitErrorf("Unable to create a new session %v", err)}

Listing items in a bucket with pagination.

The AWS docs only give an example of accessing a bucket's files using ListObjectsV2 function. Now the problem I encountered with this function it does not allow us to apply our own custom function to the results in order for us to filter them even more. Another problem is it returns (up to 1,000) of the objects in a bucket with each request. This includes sub-paths to the files you wish to read.

ListObjectsV2 lists all objects in our S3 bucket tree, even objects that do not contain files. If I want to target certain objects we have to apply a function. So, instead we'll use ListObjectsV2Pages. ListObjectsV2Pages iterates over the pages of a ListObjectsV2 operation, calling the function with the response data for each page. To stop iterating, we return false.

As shown below I wish to target only the .json files in the page and append them to an s3Bucket slice. This part is important as it will allow us to know the location of each file so we can then access the contents!

We pass our main bucket name as S3_BUCKET and our object path if there is one into S3_PREFIX.

svc := s3.New(sess)err = svc.ListObjectsV2Pages(&s3.ListObjectsV2Input{Bucket: aws.String(S3_BUCKET), Prefix: aws.String(S3_PREFIX)},    func(page *s3.ListObjectsV2Output, lastPage bool) bool {        pageNum++        for _, item := range page.Contents {            if strings.Contains(*item.Key, "json") {                s3Buckets = append(s3Buckets, S3Bucket{Bucket: conf.S3_BUCKET, Key: *item.Key})            }        }        return pageNum < 100    })if err != nil {    exitErrorf("Unable to list items in bucket %q, %v", conf.S3_BUCKET, err)}

Accessing the object contents.

Using the s3buckets slice, we will access the Bucket and Key from the struct and request the 'Object' information (or in other words the file) and then fetch the object based on the object information.

for _, item := range s3Buckets {    requestInput := &s3.GetObjectInput{        Bucket: aws.String(item.Bucket),        Key:    aws.String(item.Key),    }    result, err := svc.GetObject(requestInput)    if err != nil {        log.Print(err)    }

Reading the contents into slice

The JSON file 'result' is read with the ioutil.Readall()function, which returns a byte slice that is decoded into the Metrics struct instance using the json.Unmarshal() function.

The best tutorial I have found regarding reading JSON into a struct is this one: Parsing JSON

    defer result.Body.Close()    body, err := ioutil.ReadAll(result.Body)    if err != nil {        log.Print(err)    }    bodyString := fmt.Sprintf("%s", body)    var metrics Metrics    err = json.Unmarshal([]byte(bodyString), &metrics)    if err != nil {        fmt.Println("twas an error")    }    finalMetrics = append(finalMetrics, metrics)}

And that's it! You have now fetched JSON files from a certain bucket and parsed the results into a struct. In my opinion, especially in machine learning, fetching the contents of an S3 file is hugely important as engineers we are constantly wanting to see and compare for example past models' performance or fetching additional data features to append to our models.


Original Link: https://dev.to/seanyboi/fetching-and-reading-files-from-s3-using-go-4180

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To