Reactive templating: Parsing and code generation

This is the second part of a three-part series.

The lexer is now generating tokens. Cool.

On a high level, the way a parser works is quite similar to the way a lexer works. The main difference is that instead of operating character by character, it operates token by token. A token is seen, and then some expectations are set for the following tokens until some end condition is met.

While this iteration is happening, the parser is usually generating a data representation of the tokens that it’s seen. My parser doesn’t do this because my use case doesn’t require it. The parsing process is also a fine opportunity to handle syntax errors, which can be discovered when an unexpected token is seen.

In my case, the parser is responsible for creating the lexer, doing the lexing, normalizing certain token values and throwing errors if it discovers an obvious problem with the template expression.

Parsing

 1export class Parser {
 2  lexer: Lexer;
 3  tokens: Token[] = [];
 4  parseTokens: Token[] = [];
 5  done: boolean = false;
 6
 7  constructor(str: string) {
 8    this.lexer = new Lexer(str);
 9    this.lexer.emitter.addEventListener("token", (e: any) => {
10      this.tokens.push(e.detail);
11    });
12    this.lexer.lex();
13    this.normalizeTokens();
14
15    // We want to preserve tokens, so we copy them into `this.parseTokens`.
16    // parse() will mutate `this.parseTokens` and leave `this.tokens` untouched.
17    // `this.parseTokens` is used for catching and reporting syntax errors.
18    this.parseTokens = [...this.tokens];
19
20    // Depletes all tokens in `this.parseTokens`.
21    // Leaves `this.tokens` untouched.
22    this.parse();
23  }
24
25  normalizeTokens() {
26    // ...
27  }
28
29  take(...types: TokenType[]): Token | undefined {
30    const token = this.parseTokens.shift();
31    // If we're looking for a specific token type...
32    if (token && types.length) {
33      // Return it if it matches the type we're looking for.
34      if (types.includes(token.type)) {
35        return token;
36      }
37      // Otherwise, put it back.
38      this.parseTokens.unshift(token);
39    }
40    // If we didn't pass any types, we're just looking for any token.
41    if (token && !types.length) {
42      return token;
43    }
44    // No token.
45    return undefined;
46  }
47
48  parse() {
49    while (!this.done) {
50      const token = this.take();
51
52      if (!token) {
53        this.done = true;
54        continue;
55      }
56
57      switch (token.type) {
58        case TokenType.filterPipe:
59        const next = this.take(TokenType.filterFn);
60        if (!next) {
61          const actualNext = this.take();
62          throw new Error(`Expected to see a filter function, instead saw a type "${actualNext.type}" with value "${actualNext.value}".`);
63        }
64      }
65    }
66  }
67}

Above, parse will iterate over tokens until there are none remaining.

In this example, I’m only throwing an error for one of the many possible syntax errors I could imagine.

I take tokens until I see a filter pipe, and if the token that follows is not a TokenType.filterFn then an error with a helpful message is thrown.

Following this pattern, I’ll consider all of the other error conditions I can think of and throw on each.

Generating code

Parsing is complete and no errors have been generated. Nice. The tokens I have should be valid for use in code generation.

This template expression:

1`The coin landed on ${Math.random() > 0.5 ? 'heads' : 'tails'}! Great!` | uppercase

Gives me these tokens:

 1[
 2    { "type": "string", "value": "`The coin landed on ${" },
 3    { "type": "variable", "value": "[\"Math\"][\"random\"]" },
 4    { "type": "paren-open", "value": "(" },
 5    { "type": "paren-close", "value": ")" },
 6    { "type": "operator", "value": ">" },
 7    { "type": "number", "value": 0.5 },
 8    { "type": "operator", "value": "?" },
 9    { "type": "string", "value": "'heads'" },
10    { "type": "operator", "value": ":" },
11    { "type": "string", "value": "'tails'" },
12    { "type": "string", "value": "}! Great!`" },
13    { "type": "filterPipe", "value": "|" },
14    { "type": "filter-fn", "value": "[\"uppercase\"]" }
15]

So that I can generate this (normally minified):

 1// This takes a string like '["hello"]["world"]["how"]["are"]["you"]'
 2// and returns an array like: ['hello', 'world', 'how', 'are', 'you'].
 3function spl(str) {
 4    return str
 5        .split(/\[|\]/)
 6        .filter(Boolean)
 7        .map((s) => s.replace(/^['"]/g, ""))
 8        .map((s) => s.replace(/['"]$/g, ""));
 9}
10// This takes an object and an array like ['hello', 'world', 'how', 'are', 'you']
11// and returns true if obj['hello']['world']['how']['are']['you'] is defined.
12function has(o, a) {
13    let cur = o;
14    for (let i = 0; i < a.length; i++) {
15        if (cur[a[i]] === undefined) {
16            return false;
17        }
18        cur = cur[a[i]];
19    }
20    return true;
21}
22// This returns true if the provided value is an object that looks like a ref.
23function i(r) {
24    return typeof r === "object" && "isRef" in r && r.isRef;
25}
26// This unwraps the value of a ref, if whatever is provided is a ref.
27function u(r) {
28    return i(r) ? r.value : r;
29}
30let val = `The coin landed on ${(has(ctx, spl('["Math"]["random"]')) ? u(ctx["Math"]["random"]) : window["Math"]["random"])() > 0.5 ? "heads" : "tails"}! Great!`;
31val = ctx["uppercase"](val);
32return val;

The above is the body of a function that is created at runtime and cached so that I don’t end up re-creating the same function over and over again for the same expression. Lines 1 through 29 are just helper functions that are included in every function that is generated.

I create it like this:

1let code = "..."; // the above code
2const fn = new Function("ctx", code);

And then call it with my model as its only argument. Like this:

1const model = {
2  uppercase: (str: string) => str.toUpperCase(),
3};
4
5fn(model);

The generated function attempts to access model["Math"]["random"], which doesn’t exist, so it falls back to window["Math"]["random"], which does exist.

Line 30 above is hard to read. Here it is with slightly better formatting:

1let val = `The coin landed on ${
2  (
3    has(ctx, spl('["Math"]["random"]')) ? u(ctx["Math"]["random"]) : window["Math"]["random"]
4  )() > 0.5 ? "heads" : "tails"
5  }! Great!`;

Every token of type variable is wrapped in these helpers. Here’s what would happen with the following model and expression:

Model:

1{ foo: "bar" }

Expression:

1<div>{{ foo }}</div>

Tokens:

1[ { "type": "variable", "value": "[\"foo\"]" } ]

Generated:

1// .. helpers removed
2let val = (has(ctx, spl('["foo"]')) ? u(ctx["foo"]) : window["foo"]);
3return val; // "bar"