An Interest In:
Web News this Week
- April 1, 2024
- March 31, 2024
- March 30, 2024
- March 29, 2024
- March 28, 2024
- March 27, 2024
- March 26, 2024
Learning F A Simple Parser
A couple of weeks have passed since my last post and I still really enjoy learning F#. So when I found this interesting post called "Emulator basics: a stack and register machine" in which the author builds a virtual machine for running simple C programs, I decided to port the code from EcmaScript.
I won't go into details regarding the actual VM design and its instruction set, the original post already explains it nicely. Let's focus on the type of source code we'll be parsing instead:
plus: ADD RDI, RSI MOV RAX, RDI RETmain: MOV RDI, 1 MOV RSI, 2 CALL plus RET
This program will start at the label main:
, add 2 numbers into registers and then call the procedure defined after the label plus:
to add them. The result of the addition (3) will be the exit code of this program.
The Parser
I decided to represent the result of the parser as a record type called ParseResult
which consists of a list of tuples containing instructions and their optional arguments (e.g. ("add", ["RDI"; "RSI"])
or ("ret", [])
) as well as a map which tracks labels and their corresponding line numbers:
type ParseResult = { instructions: (string * string list) list labels: Map<string, int> }
Patterns
F#'s active patterns are one of my favorite language features. Like most newcomers to the language I tend to overuse them, though in this specific case they seem like a good fit and a lightweight alternative to parser combinators or similar solutions.
let (|Empty|Directive|LineComment|Label|Instruction|) (line : string) = if line = "" then Empty else if line.StartsWith(".") then Directive else if line.StartsWith(";") || line.StartsWith("#") then LineComment else if line.Contains(":") then Label (line.Split(":").[0]) else Instruction (line.Split([|';'; '#'|]).[0])
Active patterns allow us to define named partitions of data, which can be used in pattern matching expressions just like constructors of discriminated unions. This specific pattern first checks if the line is empty, in which case it returns the Empty
identifier. Lines that start with a .
character are directives (Directive
) and will be ignored, just like full line comments (LineComment
) starting with ;
or #
. The Label
pattern matches any line containing a :
and will only return the part before the colon (e.g. main:
becomes main
). All other lines represent instructions and will return their content stripped of comments.
Parsing
The actual parsing logic is relatively simple:
let parse (source : string) : ParseResult = source.Split("\n") |> Array.fold (fun result (line : string) -> match line.Trim() with | Directive | Empty | LineComment -> result | Label l -> let labels = Map.add l result.instructions.Length result.labels { result with labels = labels } | Instruction i -> let instruction = parseInstruction i { result with instructions = (instruction :: result.instructions) } ) { instructions=[]; labels=Map.empty } |> fun result -> { result with instructions = List.rev result.instructions }
The input source is split into lines and then Array.fold
is used for pattern matching every line and updating the ParseResult
record as needed. Directive
, Empty
and LineComment
have no effect on the program, so the result will be returned unmodified. Label
uses a copy and update record expression to add a new value to the map, while Instruction
does something similar for the list of tuples, using the following parseInstruction
helper function:
let private parseInstruction (instruction : string) = let operation = instruction.Split(" ").[0].ToLower() let operands = instruction.Substring(operation.Length).Split(",") |> Array.map (fun (s : string) -> s.Trim()) |> Array.filter ((<>) "") |> Array.toList operation, operands
This splits the operation and its operands and then cleans up the latter.
The Result
Calling this with the example program listed above, gives the following output in F# Interactive:
parse source;;val it : ParseResult = {instructions = [("add", ["RDI"; "RSI"]); ("mov", ["RAX"; "RDI"]); ("ret", []); ("mov", ["RDI"; "1"]); ("mov", ["RSI"; "2"]); ("call", ["plus"]); ("ret", [])]; labels = map [("main", 3); ("plus", 0)];}
Summary
Like the original parser this is brittle. On the other hand it shows off a nice F# feature (active patterns) and the whole code weighs in at only about 30 SLOC. Given that I'm still relatively new to the language this is probably not the nicest piece of code, but it was fast to write and pretty much worked on first try once the types checked out. Stay tuned for the next installment in this series where I'll tackle the interpreter part of the original post.
Original Link: https://dev.to/citizen428/learning-f-a-simple-parser-46f9
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To