Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 2, 2022 08:35 pm GMT

Streaming files from AWS S3 using NodeJS Stream API with Typescript

AWS s3 SDK and NodeJS read/write streams makes it easy to download files from an AWS bucket. However, what if you wanted to stream the files instead?

There is a timeout on connections to an AWS s3 instance set to 120000ms (2 minutes). Unless you have very small files, this just won't cut it for streaming.

One option is to simply raise that timeout, but then how much should you raise it? Since the timeout is for the total time a connection can last; you would have to either make the timeout some ridiculous amount, or guess how long it will take to stream the file and update the timeout accordingly. This is also not taking into account the stream closing due to HTTP(S)'s own timeout reasons as well.

Instead of making guesses and fighting random bugs, we can make use of the NodeJS Stream API and create our very own custom readable "smart stream".

Before we begin

I am assuming you have used AWS s3 SDK to download files successfully and are now wanting to convert that functionality to a proper stream. As such, I will omit the AWS implementation and instead show a simple example of how, and where, to instantiate this "smart stream" class.

I am also assuming you have a (basic) understanding of NodeJS and NodeJS read/write streams.

Smart Streaming

The idea is to create a stream that uses the power of AWS s3
ability to grab a range of data with a single request. We can then grab another range of data with a new request and so on. This stream will pause when its buffer is full, only requesting new data on an as needed basis. This allows us to process the data as we are grabbing it without fear of running into an issue with the timeout!

We will start by creating the "smart stream" class:

import {Readable, ReadableOptions} from 'stream';import type {S3} from 'aws-sdk';export class SmartStream extends Readable {    _currentCursorPosition = 0; // Holds the current starting position for our range queries    _s3DataRange = 64 * 1024; // Amount of bytes to grab    _maxContentLength: number; // Total number of bites in the file    _s3: S3; // AWS.S3 instance    _s3StreamParams: S3.GetObjectRequest; // Parameters passed into s3.getObject method    constructor(        parameters: S3.GetObjectRequest,        s3: S3,        maxLength: number,        // You can pass any ReadableStream options to the NodeJS Readable super class here        // For this example we wont use this, however I left it in to be more robust        nodeReadableStreamOptions?: ReadableOptions    ) {        super(nodeReadableStreamOptions);        this._maxContentLength = maxLength;        this._s3 = s3;        this._s3StreamParams = parameters;    }    _read() {        if (this._currentCursorPosition > this._maxContentLength) {            // If the current position is greater than the amount of bytes in the file            // We push null into the buffer, NodeJS ReadableStream will see this as the end of file (EOF) and emit the 'end' event            this.push(null);        } else {            // Calculate the range of bytes we want to grab            const range = this._currentCursorPosition + this._s3DataRange;            // If the range is greater than the total number of bytes in the file            // We adjust the range to grab the remaining bytes of data            const adjustedRange = range < this._maxContentLength ? rangeEnd : this._maxContentLength;            // Set the Range property on our s3 stream parameters            this._s3StreamParams.Range = `bytes=${this._currentCursorPosition}-${adjustedRange}`;            // Update the current range beginning for the next go             this._currentCursorPosition = adjustedRange + 1;            // Grab the range of bytes from the file            this._s3.getObject(this._s3StreamParams, (error, data) => {                if (error) {                    // If we encounter an error grabbing the bytes                    // We destroy the stream, NodeJS ReadableStream will emit the 'error' event                    this.destroy(error);                } else {                    // We push the data into the stream buffer                    this.push(data.Body);                }            });        }    }}

Let's break this down a bit

We are extending the Readable class from the NodeJS Stream API to add some functionality needed to implement our "smart stream". I have placed underscores (_) before some of the properties to separate our custom implementation from functionality we get, right out of the box, from the Readable super class.

The Readable class has a buffer that we can push data in too. Once this buffer is full, we stop requesting more data from our AWS s3 instance and instead push the data to another stream (or where ever we want the data to go). When we have room in the buffer, we make another request to grab a range of bites. We repeat this until the entire file is read.

The beauty of this simple implementation is that you have access to all of the event listeners and functionality you would expect from a NodeJS readStream. You can even pipe this stream into 'gzip' and stream zipped files!

Now that we have the SmartStream class coded, we are ready to wire it into a program.

Implementing with AWS S3

For this next part, as I am assuming you understand the AWS s3 SDK, I am simply going to offer an example of how to establish the stream.

import {SmartStream} from <Path to SmartStream file>;export async function createAWSStream(): Promise<SmartStream> {    return new Promise((resolve, reject) => {        const bucketParams = {            Bucket: <Your Bucket>,            Key: <Your Key>        }        try {            const s3 = resolveS3Instance();            s3.headObject(bucketParams, (error, data) => {                if (error) {                    throw error;                }                // After getting the data we want from the call to s3.headObject                // We have everything we need to instantiate our SmartStream class                // If you want to pass ReadableOptions to the Readable class, you pass the object as the fourth parameter                const stream = new SmartStream(bucketParams, s3, data.ContentLength);                resolve(stream);            });        } catch (error) {            reject(error)        }    });}

Thank you for reading! If you would like a part 2 where we use this stream (possibly to stream data to a frontend) let me know in the comments below!

Further Reading

This is only one example of the amazing things you can do with the NodeJS standard Stream API. For further reading checkout the NodeJS Stream API docs!


Original Link: https://dev.to/about14sheep/streaming-data-from-aws-s3-using-nodejs-stream-api-and-typescript-3dj0

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To