Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 10, 2021 09:14 am GMT

Go regular expressions

This time, I wanted to tackle the regular expression package in Go. In one of my recent projects, I had to use this library. But I have to confess it's not straightforward at first sight. I hope this article will unravel this feature.

To use the library (https://golang.org/pkg/regexp/), just add:

import (    ...    "regexp"    ...)

to your code.

Following is a collection of tips for solving a specific problem. Throughout the following examples, I used the assert Go package to make it easier to understand.

Compiling the regular expression

Before anything, you need to compile your regular expression before using any package function. You can use Compile() or MustCompile(). The difference between them is that the latter panics whenever an error is found compiling the regex but the former doesn't. If your code doesn't involve a regex provided at runtime, it's safer to use MustCompile() as you know immedialty whether your regex syntax is correct.

Note that the Posix() versions require a Posix regex rather than a Perl-like regular expression.

Is my string matching the regexp ?

To just check whether a string is matching your regex, you can use two different functions:

// don't worry, fake phone numberphone := "202-555-0147"phoneRE := `\d{3}\-\d{3}\-\d{4}`// returns: true <nil>matched, err := regexp.MatchString(phoneRE, phone)assert.True(matched)assert.Nil(err)// or compile and testre := regexp.MustCompile(phoneRE)assert.True(re.MatchString(phone))// but this returns true as well!assert.True(re.MatchString("202-555-0147mkljhfQDHMFJ"))// but not thisre = regexp.MustCompile(`\d{3}\-\d{3}\-\d{4}$`)assert.False(re.MatchString("202-555-0147mkljhfQDHMFJ"))

Beware that the string 202-555-0147mkljhfQDHMFJ is matched using the first regex. The explanation is given is the Compile() definition and by the term leftmost:

When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. 

You can influence this behavior with the Longest() function.

It's wiser to use MustCompile() or Compile() if you have to reuse your regex to just compile it once.

Using capture groups

Capture groups are called Submatches in the regexp package. You get access to capture groups using on FindStringSubmatch() function:

// use capture groupsphoneRECaps := `(\d{3})\-(\d{3})\-(\d{4})$`re = regexp.MustCompile(phoneRECaps)// caps is a slice of strings, where caps[0] matches the whole match// caps[1] == "202" etcmatches := re.FindStringSubmatch(phone)// print out: there're 3 capture groupsassert.Equal(re.NumSubexp(), 3)assert.Equal(matches[0], "202-555-0147")assert.Equal(matches[1], "202")assert.Equal(matches[2], "555")assert.Equal(matches[3], "0147")assert.ElementsMatch(matches, []string{"202-555-0147", "202", "555", "0147"})

Using named capture groups

To fully benefit from the Python-like named capture groups, you can't have a direct access to the value of the submatch for a particular name. You only have an indirect and unwieldy access: first get all names, the get the corresponding index for that name and then fetch the capture group string:

// use named capture groupsphoneRENamedCaps := `(?P<area>\d{3})\-(?P<exchange>\d{3})\-(?P<line>\d{4})$`re = regexp.MustCompile(phoneRENamedCaps)// print out: [ area exchange line], not that the first element is the empty stringnames := re.SubexpNames()assert.ElementsMatch(names, []string{"", "area", "exchange", "line"})// // indirect access to namesmatches = re.FindStringSubmatch(phone)assert.Len(matches, 4)capName := names[1]; nameIndex := re.SubexpIndex(capName); assert.Equal(matches[nameIndex], "202")capName = names[2]; nameIndex = re.SubexpIndex(capName); assert.Equal(matches[nameIndex], "555")capName = names[3]; nameIndex = re.SubexpIndex(capName); assert.Equal(matches[nameIndex], "0147")

Splitting a string

It might be useful sometimes to split a string delimited with characters matching a regexp:

csv := "a;b;c;;;;d;e;f;;;g"split1 := regexp.MustCompile(";").Split(csv, -1)split2 := regexp.MustCompile(";*").Split(csv, -1)assert.Len(split1, 12)assert.ElementsMatch(split1, []string{"a", "b", "c", "", "", "", "d", "e", "f", "", "", "g"})assert.Len(split2, 7)assert.ElementsMatch(split2, []string{"a", "b", "c", "d", "e", "f", "g"})

Replacing strings

You can replace strings by providing a template made of references to a matched capture group. You can use $1 (or ${1}) to refer to the first submatch, $2 for the second etc:

csv = "a;b;c;;;;d;e;f;;;g"split := regexp.MustCompile("(;+)")// prints: "a;b;c;d;e;f;g"assert.Equal(split.ReplaceAllString(csv, ";"), "a;b;c;d;e;f;g")digits := "0123456789"digitsRe := regexp.MustCompile(strings.Repeat(`(\d)`,10))assert.Equal(digitsRe.ReplaceAllString(digits, "$10$9$8$7$6$5$4$3$2$1"), "9876543210")

You can use names instead:

// using names rather than indexesdigitsRe = regexp.MustCompile(`(?P<zero>\d)(?P<one>\d)(?P<two>\d)(?P<three>\d)(?P<four>\d)(?P<five>\d)(?P<six>\d)(?P<seven>\d)(?P<eight>\d)(?P<nine>\d)`)assert.Equal(    digitsRe.ReplaceAllString(digits, "nine=${nine}, eight=${eight}, seven=${seven}, six=${six}, five=${five}, four=${four}, three=${three}, two=${two}, one=${one}, zero=${zero}"),    "nine=9, eight=8, seven=7, six=6, five=5, four=4, three=3, two=2, one=1, zero=0",)

Hope this helps !

Photo by Mick Haupt on Unsplash


Original Link: https://dev.to/dandyvica/go-regular-expressions-53dn

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To