Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

October 17, 2021 10:58 pm GMT

Making the Parser

Continuing from last week, let's make a parser.

It will be really simple and basically cover nothing, but it will do as an example.

This article contains code snippets of Regex that may be disturbing for some programmers.
Reader discretion is advised.

First, you have to think what you want to cover with your parser.

For this example it will be: single words, "quoted words" and tag:value.

So, let's make a draft of what it will do:

function MyBasicParser(string){  const singleValues = parseSingleValues(string);  const quotedValues = parseQuotedValues(string);  const tagValues = parseTagValues(string);  return [singleValues, quotedValues, tagValues]}

Let's start with the simplest one:

function parseSingleValues(string){  // lets just split by spaces!  return string.split(' ');}parseSingleValues('some random string');// returns: [ 'some', 'random', 'string' ]

Nice, looking good and easy!
(And we didn't even had to use Regex!)

Next is the quoted values:

function parseQuotedValues(string) {  const quotesRegex = /(?<quote>["']).*?\k<quote>/g;  return string    // matches and returns everything that matched (this will include the quotes)    .match(quotesRegex)    // we can clear the quotes by slicing the first and last character off the string    // and since no matches means null, we use the optional chaining here    ?.map(s => s.substring(1, s.length - 1));}parseQuotedValues(`something "quoted here" not here 'here again'`);// returns: [ 'quoted here', 'here again' ]

Ok... ok... don't fret now.

First, the Regex:

(?<quote>["']) this will match either single or double quotes and give it a name (to easily reference later)

.*? match anything

\k<quote> this will match the same as it did earlier.

g so it doesn't stop at the first match

Regex101 will explain it a lot better than me.

The Regex alone, using the .match function would return [ '"quoted here"', '\'here again\'' ].

So we just slice the first and last and there you go!

Finally the tags!

function parseTagValues(string) {  const tagRegex = /\S+:\S+/g;  const tagRegexInclusive = /\S*:\S*/g;  return string    // matches the quoted values    .match(tagRegex)    // split at the colon (if there were matches)    ?.map(s => s.split(':'));}parseTagValues('tag:value something alone: and other:tag :value');// returns: [ [ 'tag', 'value' ], [ 'other', 'tag' ] ]

Not so scary right?

But why two you might ask?

\S this matches any non-white space character

: matches the colon

\S and another match of non-white space

And the difference between them is:

+ will match ONE or more of the token

* will match ZERO or more of the token

Regex101 to the rescue again.

If + matches only tag:value and other:tag, * will, in addition to those, also match alone: and :value. And for this example, I will just not treat those last two as tags.

But that won't do...

Some of you might be already expecting this... but let's just show the others:

// let's call MyBasicParser with all the values we usedMyBasicParser(  `some random string something "quoted here" not here 'here again' tag:value something alone: and other:tag :value`);// it returns:/*     [      [ 'some','random','string','something','"quoted','here"','not','here','\'here','again\'','tag:value','something','alone:','and','other:tag',':value' ],      [ 'quoted here', 'here again' ],      [['tag', 'value'], ['other', 'tag']]    ]*/

OOPS!

The refactoring!

For each piece, I will want to return the string without the part that was parsed.

I also know that I will want to change the order, because as is it will just parse everything as "single values".

This also means the order is important, so, I want the quotes to be parsed first.
With the quotes parsed, it will need to parse the tags.
And finally it will parse the rest.

Let's see the code:

function MyBasicParser(string) {  // this now also returns the string after the parsing  const { quotedValues, afterQuotedString } = parseQuotedValues(string);  // that the next one will use and will give the same  const { tagValues, afterTagString } = parseTagValues(afterQuotedString);  // this one, being the last, will be the same  const singleValues = parseSingleValues(afterTagString);  // I've just changed here so it would return an object   // too many arrays were being returned and with the order changing... what was what?  // now, the consumer of the parser will know exactly what is what  return { singleValues, quotedValues, tagValues };}

I know, I could make it even better, maybe with a Fluent Interface or something... but hey... just an example!

And as for the methods:

function parseSingleValues(string) {  // I've added here a filter to filter empty string values  // because as we clean the strings, a lot of spaces will be left there  return string.split(' ').filter(Boolean);}// new helper function!function tryRegexAndCleanTheString(string, regex) {  // take the matches as before  const regexMatches = string.match(regex);  // clean the string by simply replacing the match value with an empty string  const cleanedString = regexMatches.reduce((acc, cur) => acc.replace(cur, ''), string);  return { regexMatches, cleanedString };}// both are still the same, except that they use the helper function// then they return an object with the matches (still dealing with each in their own way)// and the cleaned string for the next step to usefunction parseQuotedValues(string) {  const quotesRegex = /(?<quote>["']).*?\k<quote>/g;  const { regexMatches, cleanedString } = tryRegexAndCleanTheString(string, quotesRegex);  return {    quotedValues: regexMatches?.map(s => s.substring(1, s.length - 1)),    afterQuotedString: cleanedString,  };}function parseTagValues(string) {  const tagRegex = /\S+:\S+/g;  const { regexMatches, cleanedString } = tryRegexAndCleanTheString(string, tagRegex);  return {    tagValues: regexMatches?.map(s => s.split(':')),    afterTagString: cleanedString  };}

The end result

MyBasicParser(  `some random string something "quoted here" not here 'here again' tag:value something alone: and other:tag :value`);// it returns:/*     {      singleValues:  [ 'some','random','string','something','not','here','something','alone:','and',':value' ],      quotedValues: [ 'quoted here', 'here again' ],      tagValues: [['tag', 'value'], ['other', 'tag']]    }*/

The next step

This is but a really, REALLY simple version of my own parser:

https://www.npmjs.com/package/@noriller/easy-filter-parser

That I use in:

https://www.npmjs.com/package/@noriller/easy-filter

And that will use the "continuation" of them.

As for today... that's all!

Next time we will be doing a basic version of the filter!

Cover Photo by Melanie Wasser on Unsplash and badly edited by yours truly.

Original Link: https://dev.to/noriller/making-the-parser-32lp

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To