Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 6, 2022 04:40 pm GMT

How I got my new bicycle thanks to AWS!

I bought my first bicycle in 2019. The pandemic had just started and during that period all team sports were stopped. The bicycle was already a few years old and I bought it for a few hundred euros. I really started to enjoy cycling and and I joined a small cycling team.

During the summer of 2021, I started looking around for a new bike. It was time to trade my old one for a new one. I became interested in a Canyon Endurance CF SL 8 but unfortunately they were always all sold out. In September 2021, I subscribed to the mailing list. I would get an email if the model was back in stock.

Canyon CF SL 8

After a few months I started to wonder why I never got an email. I saw that similar bikes were sold and after doing a little investigation on Reddit, I read that the notification system wasn't working properly. The emails were probably sent in small badges, and it didn't take more than a few minutes before all Canyon Endurance CF SL 8s were sold out again.

It was time to look for another method. I read about a tool called Distrill, which could be used to check the site every so many seconds. The free version was working with a browser plugin. To be able to use it without your browser you needed a paid plan.

So I decided to create something similar which was cheaper.
I decided to develop a Lambda function to scrape* the web page and check if the bicycle in my size was still unavailable. I checked the content of the div.

Canyon source code

I made use of the Python module BeautifulSoup to scrape the webpage and find a match for the Coming soon text.
If the text would change, I would get notified by email using SNS.

""" Scrape Canyon site."""import requestsimport osimport boto3from bs4 import BeautifulSoupclient = boto3.client("sns")url = "https://www.canyon.com/xxx"def lambda_handler(event, context):    """Main."""    page = requests.get(url)    results = BeautifulSoup(page.content, "html.parser")    items = []    for div in results.findAll(        "div", attrs={"class": "productConfiguration__availabilityMessage"}    ):        text = div.text        items.append(text.strip())    # size small is 4th of the list    small_item = items[3]    print("item: " + small_item)    if "Soon" not in small_item:        print("alert!")        client.publish(            TopicArn=os.environ["TOPIC"],            Message="Time to buy a Canyon!",            Subject="Time to buy a Canyon!",        )

The code (and actually the whole solution) is really basic. I didn't need advanced integrations or checks.
I used a simple EventBridge rule to trigger the Lambda every minute (except when I need to sleep!).

  Event:    Type: AWS::Events::Rule    Properties:      Description: Trigger every minute      Name: ScraperEvent      # Run every minute when I don't sleep      ScheduleExpression: cron(0/1 6-23 * * ? *)      Targets:        - Arn: !GetAtt Scraper.Arn #Lambda Arn          Id: canyon-scraper  LambdaPermission:    Type: AWS::Lambda::Permission    Properties:      FunctionName: !GetAtt Scraper.Arn      Action: lambda:InvokeFunction      Principal: events.amazonaws.com      SourceArn: !GetAtt Event.Arn

SNS was used to send notifications.

  Topic:    Type: AWS::SNS::Topic    Properties:       DisplayName: canyon-topic      Subscription:         - Endpoint: [email protected]          Protocol: email      TopicName: canyon-topic

I also made use of Lambda layers to make the function as fast and lightweight as possible. I used Docker to build my layer and I uploaded it to S3.

$ docker run --rm \--volume=$(pwd):/lambda-build \-w=/lambda-build \lambci/lambda:build-python3.8 \pip install -r requirements.txt --target python$ zip -vr python.zip python/$ aws s3 cp python.zip s3://xxx-layers/python.zip

I configured my Lambda to make use of this layer.

  Scraper:    Type: AWS::Serverless::Function    Properties:      FunctionName: canyon-scraper      CodeUri: src/      Handler: lambda.lambda_handler      Runtime: python3.8      Role: !GetAtt ScraperRole.Arn      Environment:        Variables:          TOPIC: !Ref Topic      Layers:        - !Ref libs  libs:    Type: AWS::Serverless::LayerVersion    Properties:      LayerName: python-lib      Description: Dependencies for the canyon scraper      ContentUri: s3://xxx-layers/python.zip      CompatibleRuntimes:        - python3.8

The function was fast enough to run with the minimum amount of memory and with a timeout of a few seconds.

Function Architecture

This basic solution was monitoring the website ever minute. It's important to note that Cron expressions that lead to rates faster than 1 minute are not supported. If you need a faster solution, then check out this blog of a fellow Community Builder! Be sure that you're allowed to scrape and that you're not flooding the server!

Now I just had to wait, and after a month I got an email...

Phone Notification

And I was able to order my favorite bicycle! It gave me great satisfaction. Like many, I'm also interested in new AWS features, fancy integrations and big setups, but sometimes you don't need the new fancy stuff to accomplish your needs.

All code is available here. Feel free to fork it and adapt it to your needs.

*Web scraping is legal if you follow the rules (avoid scraping personal data or intellectual property, check copyrights and robots.txt of the website, you're not allowed to flood the servers, ...!


Original Link: https://dev.to/aws-builders/how-i-got-my-new-bicycle-thanks-to-aws-1d8e

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To