Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 14, 2022 07:37 pm GMT

100 Languages Speedrun: Episode 55: Better Thue Interpreter in Crystal

When I started the series I said that some of the episodes will be about creating a new programming language, but so far it's just been RPN calculators.

So let's create a better Thue! I made an episode about Thue(https://dev.to/taw/100-languages-speedrun-episode-43-thue-143i), so read that for some background.

Thue is a fun esoteric language, but suffers from two annoying issues:

  • newline handling requires special weird rules
  • a lot of nearly identical rules needed to be copied and pasted as there are no wildcard matches

I'll be using Crystal to code the interpreter(https://taw.hashnode.dev/100-languages-speedrun-episode-49-crystal).

Basic Interpreter

Before I start adding new features, let's write a base interpreter.

class ThueRule  getter :left  def initialize(@left : String, @right : String)  end  def apply(str, idx)    before = str[0, idx]    after = str[idx+@left.size .. -1]    if @right == "~"      print "
" replacement = "" elsif @right[0] == '~' print @right[1..-1] replacement = "" elsif @right == ":::" replacement = STDIN.gets.not_nil!.chomp else replacement = @right end before + replacement + after end def to_s(io) io << "Rule<#{@left}::=#{@right}>" endendclass ThueProgram def initialize @rules = [] of ThueRule @initial = "" @state = "" end def load(path) lines = File.read_lines(path).map(&.chomp).zip(1..) while lines.size > 0 line, line_no = lines.shift # Ignoring invalid rules, they are sometimes used as comments next unless line =~ /\A(.*)::=(.*)\z/ break if $1 == "" @rules << ThueRule.new($1, $2) end @initial = lines.map(&.first).join("
") end def random_match match = nil matches_count = 0 @rules.each do |rule| idx = 0 loop do match_idx = @state.index(rule.left, idx) break unless match_idx idx = match_idx + 1 matches_count += 1 next unless rand(matches_count) == 0 match = {rule: rule, idx: match_idx} end end match end def run @state = @initial while match = random_match rule = match[:rule] idx = match[:idx] if ENV["DEBUG"]? STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}" end @state = rule.apply(@state, idx) end endendunless ARGV[0] STDERR.puts "Usage: #{$0} <file.thue>" exit 1endprog = ThueProgram.newprog.load(ARGV[0])# Crystal doesn't handle SIGPIPE well and we want to support:# crystal thue.cr examples/fizzbuzz.thue | head -n 100begin prog.runrescue e : IO::Error exit if e.os_error == Errno::EPIPE raise eend

Step by step:

  • ThueRule represents a Thue rule like a::=b
  • ThueRule knows how to apply itself to a string at certain position, so it handles input and output itself
  • ThueProgram is defined as a collection of rules and initial state
  • ThueProgram#load knows how to read itself from Thue file
  • ThueProgram#random_match picks a random rule at random index according to fair algorithm, without building the whole list in memory
  • ThueProgram#run runs the program, starting from initial state
  • if we run it without passing Thue file name, we print short message and exit
  • if STDOUT gets closed, we catch an exception and check if it's the right error, then exit quietly. Crystal lacks any good way to quit on SIGPIPE the way Ruby's trap("PIPE", "EXIT") does it

We can confirm the program is working on a few test programs:

$ crystal thue.cr examples/fizzbuzz.thue | head -n 1012Fizz4BuzzFizz78FizzBuzz$ crystal thue.cr examples/many_dice2.thue6,4,2$ DEBUG=1 crystal thue.cr examples/hello.thueApplying rule Rule<_::=~Hello, World!> at 0 to "_"Hello, World!

Better newline handling

The first issue with Thue I want to address is newline handling. A printing rule cannot print newlines, except the ~ rule which prints just the newline. It makes Thue I/O unnecessarily annoying. The easy way to fix it is by supporting escape codes in Rules.
and \\ are all we need for now. But since we're fixing things we might just as well add one extra escape code \s for space. This is mainly because sometimes Thue code needs space at end of the line, and a lot of editors strip that automatically.

This only required rewriting ThueRule a little. I also added some more output triggered by ENV["DEBUG"].

class ThueRule  @left : String  @right : String  getter :left  def initialize(left, right)    @left = process_backslashes(left)    @right = process_backslashes(right)    @right = "~
" if @right == "~" end def apply(str, idx) before = str[0, idx] after = str[idx+@left.size .. -1] if @right[0] == '~' print @right[1..-1] replacement = "" elsif @right == ":::" replacement = STDIN.gets.not_nil!.chomp else replacement = @right end before + replacement + after end def to_s(io) io << "Rule<#{@left.inspect}::=#{@right.inspect}>" end def process_backslashes(s) s.gsub(/\\./) do |m| if m[1] == 'n' "
" else m[1] end end endend

Let's see how nice this is now. We can start with this dice rolling program in improved Thue:

_::=~How many dice do you want to roll (0-9)?
<A::=:::<B::=~Here are your rolls:
0::=B1::=B@2::=B@@3::=B@@@4::=B@@@@5::=B@@@@@6::=B@@@@@@7::=B@@@@@@@8::=B@@@@@@@@9::=B@@@@@@@@@^@::=^%%::=~1
%::=~2
%::=~3
%::=~4
%::=~5
%::=~6
^$::=~Thank you for playing ThueDice!
::=^<<_A$
$ crystal thue_nl.cr examples_nl/many_dice.thueHow many dice do you want to roll (0-9)?4Here are your rolls:2563Thank you for playing ThueDice!$ crystal thue_nl.cr examples_nl/many_dice.thueHow many dice do you want to roll (0-9)?1Here are your rolls:4Thank you for playing ThueDice!$ crystal thue_nl.cr examples_nl/many_dice.thueHow many dice do you want to roll (0-9)?9Here are your rolls:366236331Thank you for playing ThueDice!

We can also see the new debug output:

$ DEBUG=1 crystal thue_nl.cr examples_nl/many_dice.thueRule<"_"::="~How many dice do you want to roll (0-9)?
">Rule<"<A"::=":::">Rule<"<B"::="~Here are your rolls:
">Rule<"0"::="B">Rule<"1"::="B@">Rule<"2"::="B@@">Rule<"3"::="B@@@">Rule<"4"::="B@@@@">Rule<"5"::="B@@@@@">Rule<"6"::="B@@@@@@">Rule<"7"::="B@@@@@@@">Rule<"8"::="B@@@@@@@@">Rule<"9"::="B@@@@@@@@@">Rule<"^@"::="^%">Rule<"%"::="~1
">Rule<"%"::="~2
">Rule<"%"::="~3
">Rule<"%"::="~4
">Rule<"%"::="~5
">Rule<"%"::="~6
">Rule<"^$"::="~Thank you for playing ThueDice!
">Applying rule Rule<"_"::="~How many dice do you want to roll (0-9)?
"> at 3 to "^<<_A$"How many dice do you want to roll (0-9)?Applying rule Rule<"<A"::=":::"> at 2 to "^<<A$"2Applying rule Rule<"2"::="B@@"> at 2 to "^<2$"Applying rule Rule<"<B"::="~Here are your rolls:
"> at 1 to "^<B@@$"Here are your rolls:Applying rule Rule<"^@"::="^%"> at 0 to "^@@$"Applying rule Rule<"%"::="~4
"> at 1 to "^%@$"4Applying rule Rule<"^@"::="^%"> at 0 to "^@$"Applying rule Rule<"%"::="~5
"> at 1 to "^%$"5Applying rule Rule<"^$"::="~Thank you for playing ThueDice!
"> at 0 to "^$"Thank you for playing ThueDice!No more matchesn. Final state: ""

Interestingly this kind of debug output is much more difficult in the original Thue, as we'd be printing debug information between line contents and its newline.

And of course we can now have correct Hello, World! that prints that final newline

_::=~Hello, World!
::=_
$ crystal thue_nl.cr examples_nl/hello.thueHello, World!

We can also fix the program to say hello to someone, these are updated rules, the other 166 are the same:

!<::=~Hello,\s^*$::=~!
$ crystal thue_nl.cr examples_nl/hello2.thueAlexanderHello, Alexander!

Designing a better Thue

Now we face much bigger issue. Thue programs need crazy amount of repetition as every letter we pass needs its own rule. So for example the program that asks for your name (letters only), has rules like these 52 times each, once for every lowercase and uppercase letter:

a<::=<a^*a::=^*>a>a::=~a

FizzBuzz program likewise had rules like these, multiplied for every digit:

0::=00::=00::=000::=~0

There were also 10 rules for adding 1 to a digit, which aren't quite the same, but there's a lot of redundancy:

0::=11::=22::=33::=44::=55::=66::=77::=88::=99::=0

We'd really like to deal with this copypasta. The most obvious idea is to allow character ranges like [0-9] or [a-zA-Z]. But then would we also have to do parentheses () and backreferences \1 so we get rules like these?

([a-zA-Z])<::=<\1

So I had a different idea. The only thing we really need to match is character classes, so why not do this:

[a-zA-Z]<::=<[a-zA-Z]

Doing it like that has nice additional benefit that we can do shifted character classes very neatly, like these rules for addition:

[0-8]::=[1-9]9::=0

We can also repeat the same class multiple times:

[0-9]::=[0-9][0-9]

This supports any N-to-N, and N-to-1 mapping. If you have something more complicated, then you'll need to be creative and use multiple rules. I think this is great common ground, where you don't get any crazy additional powers, but it also saves you from pointless copy&paste programming.

And to make it really clear, all such rules are expanded into their regular form. So [0-9]::=[0-9][0-9] actually becomes 10 regular Thue rules (well regular plus fixed newline handling).

An extra feature that I didn't really think about, is that you could have a range on the right side only, which in effect means random choice.

Some programs in Better Thue

Before I show the whole interpreter, let's just go through a few programs, as they're much more concise.

Here's the dice rolling program, it has 6 rules of what ROLL can turn into:

^roll::=^ROLL^,::=^COMMA^$::=~ROLL::=~[1-6]COMMA::=~,::=^roll,roll,roll$
$ crystal thue_rx.cr examples_rx/many_dice2.thue1,5,3

Loop uses triple range rule ([0-9]::=[0-9][0-9]) and different range rule ([0-8]::=[1-9]):

# rules for adding one to a digit ::=# only 9 is special ::=[0-8]::=[1-9]9::=0# if extra carry arrived, add leading 1 ::=^::=^1# we're done, so just let  pass through ::=[0-9]::=[0-9]# print the result ::=# first activate each letter with , but leave a copy behind ::=^::=^[0-9]::=[0-9][0-9]# now print the activated letter ::=[0-9]::=~[0-9]# once printing finished, switch to increment mode $::=$::=~::=^1$
$ crystal thue_rx.cr examples_rx/addone.thue419420

Hello says hello to you. It's the program that benefited the most:

_::=:::# let the < pass through to the left ::=[a-zA-Z]<::=<[a-zA-Z]# if < reached ^ then we are ready to start printing ::=# we need this step so it waits for user input before it starts to print ::=^<::=^*!<!<::=~Hello,\s# activate print letter command ::=^*[a-zA-Z]::=^*>[a-zA-Z]# execute print letter ::=>[a-zA-Z]::=~[a-zA-Z]# we're done, print exclamation mark and newline ::=^*$::=~!
::=^_<$
$ crystal thue_rx.cr examples_rx/hello2.thueBitcoinellaHello, Bitcoinella!

Loop loops forever:

# rules for adding one to a digit ::=# only 9 is special ::=[0-8]::=[1-9]9::=0# if extra carry arrived, add leading 1 ::=^::=^1# we're done, so just let  pass through ::=[0-9]::=[0-9]# print the result ::=# first activate each letter with , but leave a copy behind ::=^::=^[0-9]::=[0-9][0-9]# now print the activated letter ::=[0-9]::=~[0-9]# once printing finished, switch to increment mode $::=$::=~::=^1$
$ crystal thue_rx.cr examples_rx/loop.thue | head -n 201234567891011121314151617181920

And FizzBuzz obviously FizzBuzzes:

# rules for adding one to a digit ::=# only 9 is special ::=[0-8]::=[1-9]9::=0# if number ended but extra carry arrived, add leading 1 ::=^::=^1,::=,1# If we reach a comma that means another number starts ::=# we also want to increment it so switch back to  mode ::=,::=,# we're done, so just let  pass through ::=[0-9]::=[0-9]# start the printer ::=# it's simpler to look at both counter at once ::=# we reset the counters, then start printer in proper mode ::=^::=^1,1,::=1,1,2,1,::=2,1,3,1,::=0,1,F1,2,::=1,2,2,2,::=2,2,3,2,::=0,2,F1,3,::=1,3,2,3,::=2,3,3,3,::=0,3,F1,4,::=1,4,2,4,::=2,4,3,4,::=0,4,F1,5,::=1,0,B2,5,::=2,0,B3,5,::=0,0,X# if we have F, B, or X, we just print that ::=# then we pass  to the right without printing number ::=F::=~FizzB::=~BuzzX::=~FizzBuzz[0-9]::=[0-9]# print the result if we're in number print mode ::=# first activate each letter with , but leave a copy behind ::=[0-9]::=[0-9][0-9]# now print the activated letter ::=[0-9]::=~[0-9]# once printing finished, switch to increment mode $::=$$::=$::=~::=^0,0,0$
$ crystal thue_rx.cr examples_rx/fizzbuzz.thue | head -n 2012Fizz4BuzzFizz78FizzBuzz11Fizz1314FizzBuzz1617Fizz19Buzz

For FizzBuzz it looks like the list of 15 remainders combinations could use some more complex regexps to be made more readable, but I guess that's a puzzle you'll need to solve yourself.

Final Source Code

Here's the final source code:

class ThueRule  getter :left  def initialize(@left : String, @right : String)    @right = "~
" if @right == "~" end def apply(str, idx) before = str[0, idx] after = str[idx+@left.size .. -1] if @right[0] == '~' print @right[1..-1] replacement = "" elsif @right == ":::" replacement = STDIN.gets.not_nil!.chomp else replacement = @right end before + replacement + after end def to_s(io) io << "Rule<#{@left.inspect}::=#{@right.inspect}>" endendclass ThueSideParser getter :results @toparse : Array(Char) def initialize(@str : String) @results = [""] @toparse = @str.chars parse end private def parse until @toparse.empty? case @toparse[0] when '[' chars = parse_range if @results.size == 1 @results = chars.map{|c| @results[0]+c} elsif @results.size == chars.size @results = @results.zip(chars).map{|s,c| s+c} else raise "Sizes of character classes mismatch in #{@str}" end else c = parse_character @results = @results.map{|s| s + c} end end @results end private def parse_character if @toparse[0] == '\\' @toparse.shift raise "Unmatched \\ in #{@str}" if eos? c = @toparse.shift case c when 'n' '
'
when 's' ' ' else c end else @toparse.shift end end private def parse_range chars = [] of Char @toparse.shift loop do raise "Character range never closed in #{@str}" if eos? if @toparse[0] == ']' @toparse.shift return chars end c = parse_character raise "Character range never closed in #{@str}" if eos? if @toparse[0] == '-' @toparse.shift e = parse_character raise "Invalid character range in #{@str}" if e < c chars.concat(c..e) else chars << c end end end private def eos? @toparse.empty? endendclass ThueRuleParser def initialize(@str : String) if @str =~ /\A(.*)::=(.*)\z/ @valid = true @left = $1 @right = $2 else @left = "" @right = "" @valid = false end end def valid_rule? @valid end def empty_rule? @valid && @left.empty? end def call lefts = ThueSideParser.new(@left).results rights = ThueSideParser.new(@right).results # Support N-to-1 and 1-to-N rules lefts *= rights.size if lefts.size == 1 rights *= lefts.size if rights.size == 1 unless lefts.size == rights.size raise "Mismatched side of rule #{@str}" end lefts.zip(rights).map do |left, right| ThueRule.new(left, right) end endendclass ThueProgram def initialize @rules = [] of ThueRule @initial = "" @state = "" end def load(path) lines = File.read_lines(path).map(&.chomp).zip(1..) while lines.size > 0 line, line_no = lines.shift # Ignoring invalid rules, they are sometimes used as comments parser = ThueRuleParser.new(line) next unless parser.valid_rule? break if parser.empty_rule? @rules.concat parser.call end @initial = lines.map(&.first).join("
") end def random_match match = nil matches_count = 0 @rules.each do |rule| idx = 0 loop do match_idx = @state.index(rule.left, idx) break unless match_idx idx = match_idx + 1 matches_count += 1 next unless rand(matches_count) == 0 match = {rule: rule, idx: match_idx} end end match end def run(debug) @state = @initial if debug @rules.each do |rule| STDERR.puts rule end end while match = random_match rule = match[:rule] idx = match[:idx] if debug STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}" end @state = rule.apply(@state, idx) end if debug STDERR.puts "No more matches. Final state: #{@state.inspect}" end endendunless ARGV[0] STDERR.puts "Usage: #{$0} <file.thue>" exit 1endprog = ThueProgram.newprog.load(ARGV[0])# Crystal doesn't handle SIGPIPE well and we want to support:# crystal thue.cr examples/fizzbuzz.thue | head -n 100begin prog.run(!!ENV["DEBUG"]?)rescue e : IO::Error exit if e.os_error == Errno::EPIPE raise eend

Next steps

The biggest issue with this interpreter is that every step checks every rule, so it's O(number of rules) * O(state size).

It would be fairly straightforward to compile list of states into a finite automaton without backtracking, even if specifics (uniformly random match) are a bit different from the usual matching algorithm, so it's technically not a DFA or an NFA.

Overall I think this language is a much better Thue than the original Thue. Writing actually programs would be just as much of a challenge, but amount of copying and pasting is dramatically lower.

Another Thue with regexps language is Rue, but it supports unlimited regexps, and that makes some coding challenges pretty much trivial.

As for coding it all in Crystal, it was generally a positive experience. The code contains very few pointless type annotations, and I didn't need to torture it into a different shape than I wanted just to make the type checker happy. I mostly chose it over Ruby, as I wanted to do the very fast finite automaton, but in the end I didn't quite get there.

Code

All code examples for the series will be in this repository.

Code for the Better Thue Interpreter in Crystal episode is available here.


Original Link: https://dev.to/taw/100-languages-speedrun-episode-55-better-thue-interpreter-in-crystal-3obp

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To