Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
November 23, 2022 02:00 pm GMT

Generate an XML Sitemap for a Static Website in GitHub Actions

I use GitHub Pages for my personal website, as well as for several project sites. Although some static site generators include support for sitemap generation (e.g., Jekyll has a plugin for sitemaps), my personal website is generated by a custom static site generator that I built for a few specialized reasons, and most of my project sites for Java libraries consist of a single hand-written HTML page combined with javadoc-generated documentation. So a while back I implemented a GitHub Action, generate-sitemap, that can generate an XML sitemap by crawling a GitHub repository containing the HTML of the site. It uses the last commit date of each file to produce the <lastmod> tags. By default, it includes URLs for HTML and PDF files in the sitemap, and skips other file extensions in the repository. But it can be configured to include URLs corresponding to whatever file extensions you want included. It checks the head of HTML pages for noindex meta tags, and excludes such files from the sitemap, and it likewise excludes files from the sitemap if they match a Disallow rule in your robots.txt. The generate-sitemap can be configured in a few other ways as well (see the documentation in the GitHub repository for all details). The generate-sitemap action is implemented in Python as a container action.

Table of Contents: This post is organized as follows:

  • Prerequisite Workflow Step
  • Example Workflow
  • Learn More
  • Where You Can Find Me

Prerequisite Workflow Step

In order for the <lastmod> dates to be correctly determined, the step that checks out your repository must use actions/checkout's optional input fetch-depth: 0 in order to get the full git history, such as with a step like the following:

    steps:    - name: Checkout the repo      uses: actions/checkout@v2      with:        fetch-depth: 0 

Example Workflow

Here is an example workflow. It runs on pushes to the branch main. It then starts with the checkout as described above. The generate-sitemap action assumes that the entire repository is the website by default (you can change that behavior with the input path-to-root). The most important input is probably base-url-path, which specifies the URL to the root of your site. This example workflow includes html and pdf files in the sitemap by default. There are optional inputs that can be used to exclude either of these, and an optional input additional-extensions that can be used to additionally include files of any specific type you desire in the sitemap.

name: Generate xml sitemapon:  push:    branches: [ main ]jobs:  sitemap_job:    runs-on: ubuntu-latest    name: Generate a sitemap    steps:    - name: Checkout the repo      uses: actions/checkout@v3      with:        fetch-depth: 0     - name: Generate the sitemap      uses: cicirello/generate-sitemap@v1      with:        base-url-path: https://www.example.com/    - name: Commit and push      run: |        if [[ `git status --porcelain sitemap.xml` ]]; then          git config --global user.name 'github-actions'          git config --global user.email '41898282+github-actions[bot]@users.noreply.github.com'          git add sitemap.xml          git commit -m "Automated sitemap update" sitemap.xml          git push        fi

The generate-sitemap action doesn't commit and push, so you need a step in your workflow to do that. In the above example workflow, the last step uses a simple shell script to commit and push. This example does the commit as the github-actions bot. If you'd rather be the committer, then adjust that step as necessary. There are also actions in the GitHub Marketplace that can be used for the commit and push step if you prefer.

Learn More

You can find more information about this GitHub Action in its GitHub repository:

GitHub logo cicirello / generate-sitemap

Generate an XML sitemap for a GitHub Pages site using GitHub Actions

generate-sitemap

cicirello/generate-sitemap - Generate XML sitemaps for static websites in GitHub Actions

Check out all of our GitHub Actions: https://actions.cicirello.org/

About

GitHub ActionsGitHub release (latest by date) Count of Action Users
Build Statusbuild CodeQL
Source InfoGitHub GitHub top language
SupportGitHub Sponsors Liberapay Ko-Fi

The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHubPages, and has the following features:

  • Support for both xml and txt sitemaps (you choose using one of the action's inputs).
  • When generating an xml sitemap, it uses the last commit date ofeach file to generate the <lastmod> tag in the sitemap entry. If the filewas created during that workflow run, but not yet committed, then it instead usesthe current date (however, we recommend if possible committing newly created files first).
  • Supports URLs for html and pdf files in the sitemap, and has inputsto control the included file types (defaults include both html and pdf files in the sitemap).
  • Now also supports including URLs for a user specified list ofadditional file extensions in the sitemap.

You can also find information about this GitHub Action, as well as others I've implemented and maintain at the following site (which by the way is served via GitHub Pages, and uses this action to generate its sitemap):

Vincent Cicirello - Open source GitHub Actions for workflow automation

Features information on several open source GitHub Actions for workflow automation that we have developed to automate parts of the CI/CD pipeline, and other repetitive tasks. The GitHub Actions featured include jacoco-badge-generator, generate-sitemap, user-statistician, and javadoc-cleanup.

favicon actions.cicirello.org

Where You Can Find Me

Follow me here on DEV:

Follow me on GitHub:

GitHub logo cicirello / cicirello

My GitHub Profile

Vincent A Cicirello

Vincent A. Cicirello

Sites where you can find me or my work
Web and social mediaPersonal Website LinkedIn DEV Profile
Software developmentGithub Maven Central PyPI Docker Hub
PublicationsGoogle Scholar ORCID DBLP ACM Digital Library IEEE Xplore ResearchGate arXiv

My bibliometrics

My GitHub Activity

If you want to generate the equivalent to the above for your own GitHub profile,check out the cicirello/user-statisticianGitHub Action.




Or visit my website:

Vincent A. Cicirello - Professor of Computer Science

Vincent A. Cicirello - Professor of Computer Science at Stockton University - is aresearcher in artificial intelligence, evolutionary computation, swarm intelligence,and computational intelligence, with a Ph.D. in Robotics from Carnegie MellonUniversity. He is an ACM Senior Member, IEEE Senior Member, AAAI Life Member,EAI Distinguished Member, and SIAM Member.

favicon cicirello.org

Original Link: https://dev.to/cicirello/generate-an-xml-sitemap-for-a-static-website-in-github-actions-20do

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To