OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Mar 29, 2024 1:26 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 1 post ] 
Author Message
 Post subject: Parser Generator Tools
PostPosted: Fri Aug 07, 2015 11:03 pm 
Offline
Member
Member

Joined: Wed Mar 05, 2008 4:41 pm
Posts: 46
Location: San Francisco, California, USA
Hey guys!

I've made a set of insanely simple LALR parser generator tools that you can use in your projects. I'll go ahead and explain how you can use it.

The idea is to eventually rewrite the tools in my own language and integrate these tools into my OS, so it's self-hosting and self-compiling. I have yet to get there, but if you want to use the tools to do the same, be my guest!
See my topic: Self hosting OS/Compiler from scratch

It took a while to create the parser generator. I've been wanting to write one for a long time, and I finally got my head wrapped around it. It's not just any parser, I wrote one years ago. This one should be able to handle most any programming language you throw at it. It only took 20 hours or so to complete, after I understood it for the most part.

Here's the URL: https://github.com/joshwyant/ParserGenerator

So without further ado...

First, you create your terminals and nonterminals:
Code:
enum Terminal {
  Unknown, Ident, Var, Equals, DoubleEquals, Eof
}
enum Nonterminal {
  Init, Start, Expression, Factor
}


Then, you define your grammar:
Code:
partial class MyGrammar : LALRGrammar<Terminal, Nonterminal> {
  public MyGrammar() : base(Unknown, Eof, Init, Start) {
    var start = DefineProduction(Start);
    var expression = DefineProduction(Expression);
    var factor = DefineProduction(Factor);

    ProductionRule t;
    t = start % expression;
    t = expression % Expression / Equals / Factor;
    t = expression % Expression / DoubleEquals / Factor;
    t = expression % Factor;
    t = factor % Ident;
  }
}

The syntax to define the grammar is known as combinator style, which allows you to write a grammar directly in code, rather than having to pass a grammar file to a tool.
You're passing in the 4 system terminals and nonterminals to the base constructor (like "Start" and "Eof"), so the framework can handle those correctly.

Finally, you create a lexer. This one demonstrates the the typical complexity, except for whitespace:
Code:
partial class MyGrammar {
  public class Lexer : LexerBase {
    Dictionary<string, Terminal> ReservedWords
      = new Dictionary<string, Terminal> { "var", Var };

    public Lexer(TextReader reader) : base(reader) { }

    protected override IEnumerable<Token> Lex() {
      while (Reader.HasNext()) {
        var c = Reader.Peek();
        if (char.IsLetter(c) || c == "_") {
          while (char.IsLetterOrDigit(Reader.Peek()) || Reader.Peek() == "_")
            Reader.Read();
          if (ReservedWords.Contains(...))
            yield return new Token(...);
          else
            yield return new Token(Ident);
        }
        else {
          switch (Reader.Read()) {
          case '=':
            if (Reader.Peek() == '=') {
              Reader.Read(); yield return new Token(DoubleEquals);
            }
            else yield return new Token(Equals);
            break;
          // ...
          default:
            yield return new Token(Terminal.Unknown);
            break;
          }
        }
      }
      yield return new Token(Eof);
    }
  }

  public override Lexer GetLexer(TextReader reader) {
    return new Lexer(reader);
  }
}

This can be placed in its own file, but it is nested under your `MyGrammar` class, so you can use your `Terminal` and `Nonterminal` enums. You're overriding `GetLexer()` in the grammar to return your new fancy lexer.
For a full lexer, see: https://github.com/joshwyant/ParserGenerator/blob/master/TestLanguage/Lexer.cs

Now, you're ready to use your parser:
Code:
var grammar = new MyGrammar();
var abstractSyntaxTree = grammar.Parse(".......");

... and that's it!

Now, I don't believe there are many tools like this. You can even save, load, cache, or JIT compile the parse table to a function or save it to an assembly.

How would you generate code from the abstract syntax tree?
Well, semantic analysis comes first. You will have a lot of junk from your grammar in the tree that doesn't make sense in terms of semantics. So, you'll create elements for your syntax tree:
Code:
class FullyQualifiedName {
  public string[] NameParts;
}
class IfStatement : Statement {
  public Expression Condition;
  public Statement Body;
  public Statement Else;
}


You can extract information from abstract syntax tree nodes by flattening them out. The framework makes this ridiculously easy, allowing you to read and peek tokens that make up the node:
Code:
var tokens = astNode.Flatten();
if (tokens.Peek() == If)
  return scanIf(tokens);
...


When you finally have your tree based on your semantic analysis, you can generate your output code using the visitor pattern. To show you how painfully simple that is, I will give you a full working example in less than 1k lines of code here. This comes from a scripting language that I wrote using a simpler parser, but the code generation method is the same. The principle is to have a subroutine for each node type, and recursively call the subroutines on the node's children to generate your machine code.

Well, there you have it. If only I knew how to do this a long time ago, if only I either had the code in front of me, or someone explained it to me in plain english.

Have a nice day! :D

Josh

EDIT:
A fluent interface is also supported for the grammar. I actually like it better:
Code:
partial class MyGrammar : LALRGrammar<Terminal, Nonterminal> {
  public MyGrammar() : base(Unknown, Eof, Init, Start)
  {
    DefineProduction(Start)
      .As(Expression);

    DefineProduction(Expression)
      .As(Expression, Equals, Factor)
      .Or(Expression, DoubleEquals, Factor)
      .Or(Factor);

    DefineProduction(Factor)
      .As(Ident);
  }
}

See https://github.com/joshwyant/ParserGenerator/blob/master/TestLanguage/Grammar.cs for a full example language similar to c# or Java.

_________________
https://github.com/joshwyant
https://linkedin.com/in/joshwyant


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 33 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group