Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 11, 2022 01:55 am GMT

Does /[A-z]/ Work For Case-Insensitive Regex?

A while back I remember coming across one example online for finding a case-insensitive letter that looked like this:

/[A-z]/

Just to make sure it's crystal clear, that's a range from uppercase A to lowercase z.

I thought this would be a great and concise way to do the job, but I came to realize it was not quite doing what I thought it would.

The Issue With /[A-z]/

While I initially thought this would be fine, I was getting some unexpected results when I used it. Here's a screenshot from Rubular showing what I started seeing:

Regex example

The expression is successfully finding lowercase and uppercase letters, but it's also grabbing a few extra symbols. Where is this coming from?

Well if we take a look at the ASCII table, we'll notice that the uppercase alphabetical letters are codes 65 through 90, and that the lowercase alphabetical letters are codes 97 through 122. There are six additional characters between the two sets of letters! Below is a portion of the table showing the six characters and their decimal codes:

Character CodeCharacter
......
88X
89Y
90Z
91[
92\
93]
94^
95_
96`
97a
98b
99c
......

Because of these additional symbols, what I thought was shorthand was a completely different expression.

This expression:

/[A-z]/

Really evaluates to be something more like:

/[A-Z\[\\\]^_`a-z]/

Now it's much clearer why the expression wasn't working the way I intended it to!

What To Use Instead

If the whole regex can be case-insensitive, the easiest thing to do is use the case insensitive modifier, i:

/[a-z]/i

If only a certain part of your expression can be case-insensitive, there are a couple of options. For example, let's say we are looking for a three-character, letter-only string where the first and last character are lowercase but the middle character can be uppercase or lowercase. How would we write that?

One option would be to do this:

/[a-z][a-zA-Z][a-z]/

Another way to do this would be to specify modes inline with the expression. This allows you to turn on the case insensitive mode for a portion of your rexpression. Here's what that could look like:

/[a-z](?i)[a-z](?-i)[a-z]/

Note: Specifying these modes inline may not work for all programming languages.

Wrapping Up

I'm actually glad I came across this issue. It was a great learning experience and it was helpful to see how the ranges in regex work. I thought it would be good to share this here in case anyone else comes across this in the future. Hopefully this is helpful to someone!

Thanks for reading!


Original Link: https://dev.to/fitzgeraldkd/does-a-z-work-for-case-insensitive-regex-47fn

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To