An Interest In:
Web News this Week
- April 3, 2024
- April 2, 2024
- April 1, 2024
- March 31, 2024
- March 30, 2024
- March 29, 2024
- March 28, 2024
Generate an XML Sitemap for a Static Website in GitHub Actions
I use GitHub Pages for my personal website, as well as for several project sites. Although some static site generators include support for sitemap generation (e.g., Jekyll has a plugin for sitemaps), my personal website is generated by a custom static site generator that I built for a few specialized reasons, and most of my project sites for Java libraries consist of a single hand-written HTML page combined with javadoc-generated documentation. So a while back I implemented a GitHub Action, generate-sitemap, that can generate an XML sitemap by crawling a GitHub repository containing the HTML of the site. It uses the last commit date of each file to produce the <lastmod>
tags. By default, it includes URLs for HTML and PDF files in the sitemap, and skips other file extensions in the repository. But it can be configured to include URLs corresponding to whatever file extensions you want included. It checks the head of HTML pages for noindex
meta tags, and excludes such files from the sitemap, and it likewise excludes files from the sitemap if they match a Disallow
rule in your robots.txt
. The generate-sitemap can be configured in a few other ways as well (see the documentation in the GitHub repository for all details). The generate-sitemap action is implemented in Python as a container action.
Table of Contents: This post is organized as follows:
- Prerequisite Workflow Step
- Example Workflow
- Learn More
- Where You Can Find Me
Prerequisite Workflow Step
In order for the <lastmod>
dates to be correctly determined, the step that checks out your repository must use actions/checkout
's optional input fetch-depth: 0
in order to get the full git history, such as with a step like the following:
steps: - name: Checkout the repo uses: actions/checkout@v2 with: fetch-depth: 0
Example Workflow
Here is an example workflow. It runs on pushes to the branch main
. It then starts with the checkout as described above. The generate-sitemap action assumes that the entire repository is the website by default (you can change that behavior with the input path-to-root
). The most important input is probably base-url-path
, which specifies the URL to the root of your site. This example workflow includes html and pdf files in the sitemap by default. There are optional inputs that can be used to exclude either of these, and an optional input additional-extensions
that can be used to additionally include files of any specific type you desire in the sitemap.
name: Generate xml sitemapon: push: branches: [ main ]jobs: sitemap_job: runs-on: ubuntu-latest name: Generate a sitemap steps: - name: Checkout the repo uses: actions/checkout@v3 with: fetch-depth: 0 - name: Generate the sitemap uses: cicirello/generate-sitemap@v1 with: base-url-path: https://www.example.com/ - name: Commit and push run: | if [[ `git status --porcelain sitemap.xml` ]]; then git config --global user.name 'github-actions' git config --global user.email '41898282+github-actions[bot]@users.noreply.github.com' git add sitemap.xml git commit -m "Automated sitemap update" sitemap.xml git push fi
The generate-sitemap action doesn't commit and push, so you need a step in your workflow to do that. In the above example workflow, the last step uses a simple shell script to commit and push. This example does the commit as the github-actions
bot. If you'd rather be the committer, then adjust that step as necessary. There are also actions in the GitHub Marketplace that can be used for the commit and push step if you prefer.
Learn More
You can find more information about this GitHub Action in its GitHub repository:
cicirello / generate-sitemap
Generate an XML sitemap for a GitHub Pages site using GitHub Actions
generate-sitemap
Check out all of our GitHub Actions: https://actions.cicirello.org/
About
The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHubPages, and has the following features:
- Support for both xml and txt sitemaps (you choose using one of the action's inputs).
- When generating an xml sitemap, it uses the last commit date ofeach file to generate the
<lastmod>
tag in the sitemap entry. If the filewas created during that workflow run, but not yet committed, then it instead usesthe current date (however, we recommend if possible committing newly created files first). - Supports URLs for html and pdf files in the sitemap, and has inputsto control the included file types (defaults include both html and pdf files in the sitemap).
- Now also supports including URLs for a user specified list ofadditional file extensions in the sitemap.
You can also find information about this GitHub Action, as well as others I've implemented and maintain at the following site (which by the way is served via GitHub Pages, and uses this action to generate its sitemap):
Where You Can Find Me
Follow me here on DEV:
Follow me on GitHub:
Vincent A Cicirello
If you want to generate the equivalent to the above for your own GitHub profile,check out the cicirello/user-statisticianGitHub Action.
Or visit my website:
Original Link: https://dev.to/cicirello/generate-an-xml-sitemap-for-a-static-website-in-github-actions-20do
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To