[PATCH olang v2 0/3] frontend: add binary operation expr support

public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed

* [PATCH olang v2 0/3] frontend: add binary operation expr support
@ 2024-03-17 21:29 Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Johnny Richard
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Johnny Richard @ 2024-03-17 21:29 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This patchset implements the frontend for parsing binary operation
expressions.  The backend implementation is not implemented on this
patch series.

Johnny Richard (3):
  lexer: add tokenize support to binary op tokens
  ast: create binary operation ast node
  parser: add all binary operation expressions

 examples/expression.ol        |   3 +
 src/ast.c                     |  14 +++
 src/ast.h                     |  34 ++++++
 src/lexer.c                   | 210 +++++++++++++++++++++++++++++++++-
 src/lexer.h                   |  29 +++++
 src/parser.c                  | 209 ++++++++++++++++++++++++++++++---
 tests/integration/cli_test.c  |  56 ++++++++-
 tests/integration/proc_exec.h |   3 +-
 8 files changed, 536 insertions(+), 22 deletions(-)
 create mode 100644 examples/expression.ol


base-commit: b34e6e1e8dde2852c2d1b050b6ac882605959180
-- 
2.44.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens
  2024-03-17 21:29 [PATCH olang v2 0/3] frontend: add binary operation expr support Johnny Richard
@ 2024-03-17 21:29 ` Johnny Richard
  2024-03-18  0:30   ` Carlos Maniero
  2024-03-17 21:29 ` [PATCH olang v2 2/3] ast: create binary operation ast node Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 3/3] parser: add all binary operation expressions Johnny Richard
  2 siblings, 1 reply; 7+ messages in thread
From: Johnny Richard @ 2024-03-17 21:29 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

In order to parse token cmp not equals I also added the unary not token.

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
v2: Add support to tokenize every binary operation tokens

 examples/expression.ol        |   3 +
 src/lexer.c                   | 182 ++++++++++++++++++++++++++++++++--
 src/lexer.h                   |  26 +++++
 tests/integration/cli_test.c  |  56 ++++++++++-
 tests/integration/proc_exec.h |   3 +-
 5 files changed, 261 insertions(+), 9 deletions(-)
 create mode 100644 examples/expression.ol

diff --git a/examples/expression.ol b/examples/expression.ol
new file mode 100644
index 0000000..efa4ab5
--- /dev/null
+++ b/examples/expression.ol
@@ -0,0 +1,3 @@
+fn main(): u32 {
+  return (10 + 1 * 2) - (10 - (1 + 1) / 2)
+}
diff --git a/src/lexer.c b/src/lexer.c
index dd6f11d..14c2962 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -37,6 +37,9 @@ lexer_current_char(lexer_t *lexer);
 static void
 lexer_skip_char(lexer_t *lexer);
 
+static char
+lexer_peek_next_char(lexer_t *lexer);
+
 static bool
 lexer_is_eof(lexer_t *lexer);
 
@@ -101,6 +104,118 @@ lexer_next_token(lexer_t *lexer, token_t *token)
         }
 
         switch (current_char) {
+            case '=': {
+                size_t start_offset = lexer->offset;
+
+                if (lexer_peek_next_char(lexer) == '=') {
+                    lexer_skip_char(lexer);
+                    lexer_skip_char(lexer);
+                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset);
+                    return;
+                }
+
+                lexer_init_char_value_token(lexer, token, TOKEN_EQ);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '!': {
+                size_t start_offset = lexer->offset;
+
+                if (lexer_peek_next_char(lexer) == '=') {
+                    lexer_skip_char(lexer);
+                    lexer_skip_char(lexer);
+                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_NEQ, start_offset);
+                    return;
+                }
+
+                lexer_init_char_value_token(lexer, token, TOKEN_BANG);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '&': {
+                size_t start_offset = lexer->offset;
+
+                if (lexer_peek_next_char(lexer) == '&') {
+                    lexer_skip_char(lexer);
+                    lexer_skip_char(lexer);
+                    lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_AND, start_offset);
+                    return;
+                }
+
+                lexer_init_char_value_token(lexer, token, TOKEN_AND);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '|': {
+                size_t start_offset = lexer->offset;
+
+                if (lexer_peek_next_char(lexer) == '|') {
+                    lexer_skip_char(lexer);
+                    lexer_skip_char(lexer);
+                    lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_OR, start_offset);
+                    return;
+                }
+
+                lexer_init_char_value_token(lexer, token, TOKEN_PIPE);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '<': {
+                size_t start_offset = lexer->offset;
+
+                switch (lexer_peek_next_char(lexer)) {
+                    case '<': {
+                        lexer_skip_char(lexer);
+                        lexer_skip_char(lexer);
+                        lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_LSHIFT, start_offset);
+                        return;
+                    }
+                    case '=': {
+                        lexer_skip_char(lexer);
+                        lexer_skip_char(lexer);
+                        lexer_init_str_value_token(lexer, token, TOKEN_CMP_LEQ, start_offset);
+                        return;
+                    }
+                    default: {
+                        lexer_init_char_value_token(lexer, token, TOKEN_LT);
+                        lexer_skip_char(lexer);
+                        return;
+                    }
+                }
+            }
+            case '>': {
+                size_t start_offset = lexer->offset;
+
+                switch (lexer_peek_next_char(lexer)) {
+                    case '>': {
+                        lexer_skip_char(lexer);
+                        lexer_skip_char(lexer);
+                        lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_RSHIFT, start_offset);
+                        return;
+                    }
+                    case '=': {
+                        lexer_skip_char(lexer);
+                        lexer_skip_char(lexer);
+                        lexer_init_str_value_token(lexer, token, TOKEN_CMP_GEQ, start_offset);
+                        return;
+                    }
+                    default: {
+                        lexer_init_char_value_token(lexer, token, TOKEN_GT);
+                        lexer_skip_char(lexer);
+                        return;
+                    }
+                }
+            }
+            case '^': {
+                lexer_init_char_value_token(lexer, token, TOKEN_CIRCUMFLEX);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '%': {
+                lexer_init_char_value_token(lexer, token, TOKEN_PERCENT);
+                lexer_skip_char(lexer);
+                return;
+            }
             case '(': {
                 lexer_init_char_value_token(lexer, token, TOKEN_OPAREN);
                 lexer_skip_char(lexer);
@@ -126,6 +241,26 @@ lexer_next_token(lexer_t *lexer, token_t *token)
                 lexer_skip_char(lexer);
                 return;
             }
+            case '+': {
+                lexer_init_char_value_token(lexer, token, TOKEN_PLUS);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '-': {
+                lexer_init_char_value_token(lexer, token, TOKEN_DASH);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '*': {
+                lexer_init_char_value_token(lexer, token, TOKEN_STAR);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '/': {
+                lexer_init_char_value_token(lexer, token, TOKEN_SLASH);
+                lexer_skip_char(lexer);
+                return;
+            }
             case '\n': {
                 lexer_init_char_value_token(lexer, token, TOKEN_LF);
                 lexer_skip_char(lexer);
@@ -146,12 +281,38 @@ lexer_next_token(lexer_t *lexer, token_t *token)
 }
 
 static char *token_kind_str_table[] = {
-    [TOKEN_UNKNOWN] = "unknown", [TOKEN_IDENTIFIER] = "identifier",
-    [TOKEN_NUMBER] = "number",   [TOKEN_FN] = "fn",
-    [TOKEN_RETURN] = "return",   [TOKEN_LF] = "line_feed",
-    [TOKEN_OPAREN] = "(",        [TOKEN_CPAREN] = ")",
-    [TOKEN_COLON] = ":",         [TOKEN_OCURLY] = "{",
-    [TOKEN_CCURLY] = "}",        [TOKEN_EOF] = "EOF",
+    [TOKEN_UNKNOWN] = "unknown",
+    [TOKEN_IDENTIFIER] = "identifier",
+    [TOKEN_NUMBER] = "number",
+    [TOKEN_FN] = "fn",
+    [TOKEN_RETURN] = "return",
+    [TOKEN_LF] = "line_feed",
+    [TOKEN_OPAREN] = "(",
+    [TOKEN_CPAREN] = ")",
+    [TOKEN_COLON] = ":",
+    [TOKEN_OCURLY] = "{",
+    [TOKEN_CCURLY] = "}",
+    [TOKEN_PLUS] = "+",
+    [TOKEN_DASH] = "-",
+    [TOKEN_STAR] = "*",
+    [TOKEN_SLASH] = "/",
+    [TOKEN_EQ] = "=",
+    [TOKEN_CMP_EQ] = "==",
+    [TOKEN_BANG] = "!",
+    [TOKEN_CMP_NEQ] = "!=",
+    [TOKEN_LT] = "<",
+    [TOKEN_GT] = ">",
+    [TOKEN_CMP_LEQ] = "<=",
+    [TOKEN_CMP_GEQ] = ">=",
+    [TOKEN_PERCENT] = "%",
+    [TOKEN_BITWISE_LSHIFT] = "<<",
+    [TOKEN_BITWISE_RSHIFT] = ">>",
+    [TOKEN_CIRCUMFLEX] = "^",
+    [TOKEN_PIPE] = "|",
+    [TOKEN_LOGICAL_OR] = "||",
+    [TOKEN_AND] = "&",
+    [TOKEN_LOGICAL_AND] = "&&",
+    [TOKEN_EOF] = "EOF",
 };
 
 char *
@@ -167,6 +328,15 @@ lexer_current_char(lexer_t *lexer)
     return lexer->source.chars[lexer->offset];
 }
 
+static char
+lexer_peek_next_char(lexer_t *lexer)
+{
+    if (lexer->offset + 1 >= lexer->source.size) {
+        return 0;
+    }
+    return lexer->source.chars[lexer->offset + 1];
+}
+
 static void
 lexer_skip_char(lexer_t *lexer)
 {
diff --git a/src/lexer.h b/src/lexer.h
index cb91d7e..5ed777b 100644
--- a/src/lexer.h
+++ b/src/lexer.h
@@ -39,7 +39,33 @@ typedef enum token_kind
     TOKEN_FN,
     TOKEN_RETURN,
 
+    // Equality operators
+    TOKEN_CMP_EQ,
+    TOKEN_CMP_NEQ,
+    TOKEN_CMP_LEQ,
+    TOKEN_CMP_GEQ,
+
+    // Logical Operators
+    TOKEN_LOGICAL_OR,
+    TOKEN_LOGICAL_AND,
+
+    // Bitwise Operators
+    TOKEN_BITWISE_LSHIFT,
+    TOKEN_BITWISE_RSHIFT,
+
     // Single char
+    TOKEN_BANG,
+    TOKEN_GT,
+    TOKEN_LT,
+    TOKEN_PERCENT,
+    TOKEN_AND,
+    TOKEN_PIPE,
+    TOKEN_CIRCUMFLEX,
+    TOKEN_EQ,
+    TOKEN_PLUS,
+    TOKEN_DASH,
+    TOKEN_SLASH,
+    TOKEN_STAR,
     TOKEN_LF,
     TOKEN_OPAREN,
     TOKEN_CPAREN,
diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c
index 8cc22f9..d46471b 100644
--- a/tests/integration/cli_test.c
+++ b/tests/integration/cli_test.c
@@ -20,7 +20,7 @@
 #include <stdio.h>
 
 static MunitResult
-test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture)
+test_cli_dump_tokens_example_main_exit(const MunitParameter params[], void *user_data_or_fixture)
 {
     cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.ol");
     munit_assert_int(compilation_result.exec.exit_code, ==, 0);
@@ -42,6 +42,47 @@ test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture)
     return MUNIT_OK;
 }
 
+static MunitResult
+test_cli_dump_tokens_example_expression(const MunitParameter params[], void *user_data_or_fixture)
+{
+    cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/expression.ol");
+    munit_assert_int(compilation_result.exec.exit_code, ==, 0);
+    munit_assert_string_equal(compilation_result.exec.stdout_buf,
+                              "../../examples/expression.ol:1:1: <fn>\n"
+                              "../../examples/expression.ol:1:4: <identifier>\n"
+                              "../../examples/expression.ol:1:8: <(>\n"
+                              "../../examples/expression.ol:1:9: <)>\n"
+                              "../../examples/expression.ol:1:10: <:>\n"
+                              "../../examples/expression.ol:1:12: <identifier>\n"
+                              "../../examples/expression.ol:1:16: <{>\n"
+                              "../../examples/expression.ol:1:17: <line_feed>\n"
+                              "../../examples/expression.ol:2:3: <return>\n"
+                              "../../examples/expression.ol:2:10: <(>\n"
+                              "../../examples/expression.ol:2:11: <number>\n"
+                              "../../examples/expression.ol:2:14: <+>\n"
+                              "../../examples/expression.ol:2:16: <number>\n"
+                              "../../examples/expression.ol:2:18: <*>\n"
+                              "../../examples/expression.ol:2:20: <number>\n"
+                              "../../examples/expression.ol:2:21: <)>\n"
+                              "../../examples/expression.ol:2:23: <->\n"
+                              "../../examples/expression.ol:2:25: <(>\n"
+                              "../../examples/expression.ol:2:26: <number>\n"
+                              "../../examples/expression.ol:2:29: <->\n"
+                              "../../examples/expression.ol:2:31: <(>\n"
+                              "../../examples/expression.ol:2:32: <number>\n"
+                              "../../examples/expression.ol:2:34: <+>\n"
+                              "../../examples/expression.ol:2:36: <number>\n"
+                              "../../examples/expression.ol:2:37: <)>\n"
+                              "../../examples/expression.ol:2:39: </>\n"
+                              "../../examples/expression.ol:2:41: <number>\n"
+                              "../../examples/expression.ol:2:42: <)>\n"
+                              "../../examples/expression.ol:2:43: <line_feed>\n"
+                              "../../examples/expression.ol:3:1: <}>\n"
+                              "../../examples/expression.ol:3:2: <line_feed>\n"
+                              "../../examples/expression.ol:4:1: <EOF>\n");
+    return MUNIT_OK;
+}
+
 static MunitResult
 test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_or_fixture)
 {
@@ -62,7 +103,18 @@ test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_
 }
 
 static MunitTest tests[] = {
-    { "/test_cli_dump_tokens", test_cli_dump_tokens, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL },
+    { "/test_cli_dump_tokens_example_main_exit",
+      test_cli_dump_tokens_example_main_exit,
+      NULL,
+      NULL,
+      MUNIT_TEST_OPTION_NONE,
+      NULL },
+    { "/test_cli_dump_tokens_example_expression",
+      test_cli_dump_tokens_example_expression,
+      NULL,
+      NULL,
+      MUNIT_TEST_OPTION_NONE,
+      NULL },
     { "/test_cli_compile_minimal_program", test_cli_compile_minimal_program, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL },
     { NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }
 };
diff --git a/tests/integration/proc_exec.h b/tests/integration/proc_exec.h
index 135aa6a..45c2977 100644
--- a/tests/integration/proc_exec.h
+++ b/tests/integration/proc_exec.h
@@ -21,7 +21,8 @@
 typedef struct proc_exec_result
 {
     int exit_code;
-    char stdout_buf[1024];
+    // FIXME: output buffer shouldn't be fixed size
+    char stdout_buf[2048];
 } proc_exec_result_t;
 
 typedef struct proc_exec_command
-- 
2.44.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens
  2024-03-17 21:29 ` [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Johnny Richard
@ 2024-03-18  0:30   ` Carlos Maniero
  2024-03-18  8:49     ` Johnny Richard
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maniero @ 2024-03-18  0:30 UTC (permalink / raw)
  To: Johnny Richard, ~johnnyrichard/olang-devel

> +            case '=': {
> +                size_t start_offset = lexer->offset;
> +
> +                if (lexer_peek_next_char(lexer) == '=') {
> +                    lexer_skip_char(lexer);
> +                    lexer_skip_char(lexer);
> +                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset);
> +                    return;
> +                }
> +
> +                lexer_init_char_value_token(lexer, token, TOKEN_EQ);
> +                lexer_skip_char(lexer);
> +                return;
> +            }

We could apply the *maximal munch* [1] here avoiding looking to the
future by skipping the char right after founding it:

+            case '=': {
+                size_t start_offset = lexer->offset;
+                lexer_skip_char(lexer);
+
+                if (lexer_current_char(lexer) == '=') {
+                    lexer_skip_char(lexer);
+                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset);
+                    return;
+                }
+
+                lexer_init_char_value_token(lexer, token, TOKEN_EQ);
+                lexer_skip_char(lexer);
+                return;
+            }

I know it is a silly change, but this help us to keep the parsing
process as deterministic as possible.

[1]: https://en.wikipedia.org/wiki/Maximal_munch#:~:text=In%20computer%20programming%20and%20computer,as%20possible%20should%20be%20consumed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens
  2024-03-18  0:30   ` Carlos Maniero
@ 2024-03-18  8:49     ` Johnny Richard
  0 siblings, 0 replies; 7+ messages in thread
From: Johnny Richard @ 2024-03-18  8:49 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

On Sun, Mar 17, 2024 at 09:30:34PM -0300, Carlos Maniero wrote:
> > +            case '=': {
> > +                size_t start_offset = lexer->offset;
> > +
> > +                if (lexer_peek_next_char(lexer) == '=') {
> > +                    lexer_skip_char(lexer);
> > +                    lexer_skip_char(lexer);
> > +                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset);
> > +                    return;
> > +                }
> > +
> > +                lexer_init_char_value_token(lexer, token, TOKEN_EQ);
> > +                lexer_skip_char(lexer);
> > +                return;
> > +            }
> 
> We could apply the *maximal munch* [1] here avoiding looking to the
> future by skipping the char right after founding it:

Thanks, I kind agree on removing the necessity of lookahead here.

> 
> +            case '=': {
> +                size_t start_offset = lexer->offset;
> +                lexer_skip_char(lexer);
> +
> +                if (lexer_current_char(lexer) == '=') {
> +                    lexer_skip_char(lexer);
> +                    lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset);
> +                    return;
> +                }
> +
> +                lexer_init_char_value_token(lexer, token, TOKEN_EQ);

This function expect the offset being point to the char in question.  In
your solution we already moved the offset.

I solution for this problem would replace the /lexer_init_char_value_token/ with 
/lexer_init_str_value_token/.  I have sent a new revision for this patch
with the correction.

> +                lexer_skip_char(lexer);

This solution is moving twice to consume a single char.

> +                return;
> +            }
> 
> I know it is a silly change, but this help us to keep the parsing
> process as deterministic as possible.

I don't think it's a silly change, that's a nice improvement, thanks for
suggesting this change.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH olang v2 2/3] ast: create binary operation ast node
  2024-03-17 21:29 [PATCH olang v2 0/3] frontend: add binary operation expr support Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Johnny Richard
@ 2024-03-17 21:29 ` Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 3/3] parser: add all binary operation expressions Johnny Richard
  2 siblings, 0 replies; 7+ messages in thread
From: Johnny Richard @ 2024-03-17 21:29 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
v2: Map all binary operation kind possibles at this moment

 src/ast.c | 14 ++++++++++++++
 src/ast.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/src/ast.c b/src/ast.c
index ab56c96..c9f33a4 100644
--- a/src/ast.c
+++ b/src/ast.c
@@ -52,6 +52,20 @@ ast_new_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type
     return node_fn_def;
 }
 
+ast_node_t *
+ast_new_node_bin_op(arena_t *arena, ast_binary_op_kind_t kind, ast_node_t *lhs, ast_node_t *rhs)
+{
+    ast_node_t *node_bin_op = (ast_node_t *)arena_alloc(arena, sizeof(ast_node_t));
+    assert(node_bin_op);
+
+    node_bin_op->kind = AST_NODE_BINARY_OP;
+    node_bin_op->data.as_bin_op.kind = kind;
+    node_bin_op->data.as_bin_op.lhs = lhs;
+    node_bin_op->data.as_bin_op.rhs = rhs;
+
+    return node_bin_op;
+}
+
 ast_node_t *
 ast_new_node_literal_u32(arena_t *arena, uint32_t value)
 {
diff --git a/src/ast.h b/src/ast.h
index 2b42781..4e146f8 100644
--- a/src/ast.h
+++ b/src/ast.h
@@ -30,6 +30,7 @@ typedef enum
     AST_NODE_PROGRAM,
     AST_NODE_BLOCK,
     AST_NODE_FN_DEF,
+    AST_NODE_BINARY_OP,
     AST_NODE_RETURN_STMT,
     AST_NODE_LITERAL,
     AST_NODE_UNKNOWN
@@ -73,6 +74,35 @@ typedef struct ast_literal
     ast_literal_value_t value;
 } ast_literal_t;
 
+typedef enum ast_binary_op_kind
+{
+    AST_BINOP_ADDITION,
+    AST_BINOP_SUBTRACTION,
+    AST_BINOP_MULTIPLICATION,
+    AST_BINOP_DIVISION,
+    AST_BINOP_REMINDER,
+    AST_BINOP_BITWISE_LSHIFT,
+    AST_BINOP_BITWISE_RSHIFT,
+    AST_BINOP_BITWISE_XOR,
+    AST_BINOP_BITWISE_AND,
+    AST_BINOP_BITWISE_OR,
+    AST_BINOP_CMP_LT,
+    AST_BINOP_CMP_GT,
+    AST_BINOP_CMP_LEQ,
+    AST_BINOP_CMP_GEQ,
+    AST_BINOP_CMP_EQ,
+    AST_BINOP_CMP_NEQ,
+    AST_BINOP_LOGICAL_AND,
+    AST_BINOP_LOGICAL_OR,
+} ast_binary_op_kind_t;
+
+typedef struct ast_binary_op
+{
+    ast_binary_op_kind_t kind;
+    ast_node_t *lhs;
+    ast_node_t *rhs;
+} ast_binary_op_t;
+
 typedef struct ast_return_stmt
 {
     ast_node_t *data;
@@ -82,6 +112,7 @@ typedef union
 {
     ast_program_t as_program;
     ast_fn_definition_t as_fn_def;
+    ast_binary_op_t as_bin_op;
     ast_literal_t as_literal;
     ast_block_t as_block;
     ast_return_stmt_t as_return_stmt;
@@ -99,6 +130,9 @@ ast_new_program(arena_t *arena, ast_node_t *fn_def);
 ast_node_t *
 ast_new_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type, ast_node_t *block);
 
+ast_node_t *
+ast_new_node_bin_op(arena_t *arena, ast_binary_op_kind_t kind, ast_node_t *lhs, ast_node_t *rhs);
+
 ast_node_t *
 ast_new_node_literal_u32(arena_t *arena, uint32_t value);
 
-- 
2.44.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH olang v2 3/3] parser: add all binary operation expressions
  2024-03-17 21:29 [PATCH olang v2 0/3] frontend: add binary operation expr support Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Johnny Richard
  2024-03-17 21:29 ` [PATCH olang v2 2/3] ast: create binary operation ast node Johnny Richard
@ 2024-03-17 21:29 ` Johnny Richard
  2024-03-17 20:37   ` [olang/patches/.build.yml] build success builds.sr.ht
  2 siblings, 1 reply; 7+ messages in thread
From: Johnny Richard @ 2024-03-17 21:29 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This implementation uses precedence climbing method to guarantee
precedence.

Now is pretty difficult to visualize the ast tree since we don't have a
ast pretty printer.  I have validated it by running **gdb**.

Testing the ast with unit tests is quite annoying without a good "DSL".
We can design such DSL in the near future. =^)

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
v2: Replace the previous solution with precedence based on function
    calls to climbing method. The solution is more extensible and smaller
    since it already maps every possible binary operation at this very
    moment.

 src/lexer.c  |  28 +++++++
 src/lexer.h  |   3 +
 src/parser.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 227 insertions(+), 13 deletions(-)

diff --git a/src/lexer.c b/src/lexer.c
index 14c2962..437d0b1 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -322,6 +322,34 @@ token_kind_to_cstr(token_kind_t kind)
     return token_kind_str_table[kind];
 }
 
+bool
+token_kind_is_binary_op(token_kind_t kind)
+{
+    switch (kind) {
+        case TOKEN_PLUS:
+        case TOKEN_DASH:
+        case TOKEN_SLASH:
+        case TOKEN_STAR:
+        case TOKEN_PERCENT:
+        case TOKEN_BITWISE_LSHIFT:
+        case TOKEN_BITWISE_RSHIFT:
+        case TOKEN_LT:
+        case TOKEN_CMP_LEQ:
+        case TOKEN_GT:
+        case TOKEN_CMP_GEQ:
+        case TOKEN_CMP_EQ:
+        case TOKEN_CMP_NEQ:
+        case TOKEN_AND:
+        case TOKEN_CIRCUMFLEX:
+        case TOKEN_PIPE:
+        case TOKEN_LOGICAL_AND:
+        case TOKEN_LOGICAL_OR:
+            return true;
+        default:
+            return false;
+    }
+}
+
 static char
 lexer_current_char(lexer_t *lexer)
 {
diff --git a/src/lexer.h b/src/lexer.h
index 5ed777b..80fb25a 100644
--- a/src/lexer.h
+++ b/src/lexer.h
@@ -104,6 +104,9 @@ lexer_lookahead(lexer_t *lexer, token_t *token, size_t n);
 char *
 token_kind_to_cstr(token_kind_t kind);
 
+bool
+token_kind_is_binary_op(token_kind_t kind);
+
 string_view_t
 lexer_get_token_line(lexer_t *lexer, token_t *token);
 
diff --git a/src/parser.c b/src/parser.c
index 5b7d39a..76ef91a 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -39,6 +39,12 @@ parser_parse_block(parser_t *parser);
 ast_node_t *
 parser_parse_fn_definition(parser_t *parser);
 
+static ast_node_t *
+parser_parse_expr(parser_t *parser);
+
+static ast_node_t *
+parser_parse_factor(parser_t *parser);
+
 static void
 skip_line_feeds(lexer_t *lexer);
 
@@ -57,10 +63,189 @@ ast_node_t *
 parser_parse_program(parser_t *parser)
 {
     ast_node_t *fn = parser_parse_fn_definition(parser);
+    if (fn == NULL) {
+        return NULL;
+    }
 
     return ast_new_program(parser->arena, fn);
 }
 
+static ast_binary_op_kind_t
+token_kind_to_binary_op_kind(token_kind_t kind)
+{
+    switch (kind) {
+        case TOKEN_PLUS:
+            return AST_BINOP_ADDITION;
+        case TOKEN_DASH:
+            return AST_BINOP_SUBTRACTION;
+        case TOKEN_SLASH:
+            return AST_BINOP_DIVISION;
+        case TOKEN_STAR:
+            return AST_BINOP_MULTIPLICATION;
+        case TOKEN_PERCENT:
+            return AST_BINOP_REMINDER;
+        case TOKEN_BITWISE_LSHIFT:
+            return AST_BINOP_BITWISE_LSHIFT;
+        case TOKEN_BITWISE_RSHIFT:
+            return AST_BINOP_BITWISE_RSHIFT;
+        case TOKEN_CIRCUMFLEX:
+            return AST_BINOP_BITWISE_XOR;
+        case TOKEN_AND:
+            return AST_BINOP_BITWISE_AND;
+        case TOKEN_PIPE:
+            return AST_BINOP_BITWISE_OR;
+        case TOKEN_LT:
+            return AST_BINOP_CMP_LT;
+        case TOKEN_GT:
+            return AST_BINOP_CMP_GT;
+        case TOKEN_CMP_LEQ:
+            return AST_BINOP_CMP_LEQ;
+        case TOKEN_CMP_GEQ:
+            return AST_BINOP_CMP_GEQ;
+        case TOKEN_CMP_EQ:
+            return AST_BINOP_CMP_EQ;
+        case TOKEN_CMP_NEQ:
+            return AST_BINOP_CMP_NEQ;
+        case TOKEN_LOGICAL_AND:
+            return AST_BINOP_LOGICAL_AND;
+        case TOKEN_LOGICAL_OR:
+            return AST_BINOP_LOGICAL_OR;
+        default: {
+            fprintf(stderr, "error: token kind (%s) not compatible with binary op kind\n", token_kind_to_cstr(kind));
+            assert(false);
+        }
+    }
+}
+
+typedef enum
+{
+    BINOP_MIN_PREC,
+    BINOP_LOGICAL_OR_PREC,
+    BINOP_LOGICAL_AND_PREC,
+    BINOP_BITWISE_OR_PREC,
+    BINOP_BITWISE_XOR_PREC,
+    BINOP_BITWISE_AND_PREC,
+    BINOP_CMP_EQ_AND_NEQ_PREC,
+    BINOP_CMP_LT_AND_GT_PREC,
+    BINOP_BITWISE_SHIFT_PREC,
+    BINOP_ADDITIVE_PREC,
+    BINOP_MULTIPLICATIVE_PREC,
+} binary_op_precedence_t;
+
+static binary_op_precedence_t
+get_binary_op_precedence(token_kind_t kind)
+{
+    switch (kind) {
+        case TOKEN_PLUS:
+        case TOKEN_DASH:
+            return BINOP_ADDITIVE_PREC;
+        case TOKEN_SLASH:
+        case TOKEN_STAR:
+        case TOKEN_PERCENT:
+            return BINOP_MULTIPLICATIVE_PREC;
+        case TOKEN_BITWISE_LSHIFT:
+        case TOKEN_BITWISE_RSHIFT:
+            return BINOP_BITWISE_SHIFT_PREC;
+        case TOKEN_LT:
+        case TOKEN_GT:
+            return BINOP_CMP_LT_AND_GT_PREC;
+        case TOKEN_EQ:
+        case TOKEN_CMP_NEQ:
+            return BINOP_CMP_EQ_AND_NEQ_PREC;
+        case TOKEN_AND:
+            return BINOP_BITWISE_AND_PREC;
+        case TOKEN_CIRCUMFLEX:
+            return BINOP_BITWISE_XOR_PREC;
+        case TOKEN_PIPE:
+            return BINOP_BITWISE_OR_PREC;
+        case TOKEN_LOGICAL_AND:
+            return BINOP_LOGICAL_AND_PREC;
+        case TOKEN_LOGICAL_OR:
+            return BINOP_LOGICAL_OR_PREC;
+        default:
+            assert(false);
+    }
+}
+
+static ast_node_t *
+parser_parse_expr_1(parser_t *parser, ast_node_t *lhs, size_t prev_precedence)
+{
+    ast_node_t *expr = NULL;
+
+    token_t lookahead_token;
+    lexer_peek_next(parser->lexer, &lookahead_token);
+
+    while (token_kind_is_binary_op(lookahead_token.kind) &&
+           get_binary_op_precedence(lookahead_token.kind) >= prev_precedence) {
+        token_t token_op;
+        lexer_next_token(parser->lexer, &token_op);
+
+        ast_node_t *rhs = parser_parse_factor(parser);
+        if (rhs == NULL) {
+            return NULL;
+        }
+
+        lexer_peek_next(parser->lexer, &lookahead_token);
+
+        while (token_kind_is_binary_op(lookahead_token.kind) &&
+               get_binary_op_precedence(lookahead_token.kind) > get_binary_op_precedence(token_op.kind)) {
+            rhs = parser_parse_expr_1(parser, rhs, get_binary_op_precedence(token_op.kind));
+            lexer_peek_next(parser->lexer, &lookahead_token);
+        }
+
+        expr = ast_new_node_bin_op(parser->arena, token_kind_to_binary_op_kind(token_op.kind), lhs, rhs);
+        if (expr == NULL) {
+            return NULL;
+        }
+    }
+
+    if (expr == NULL) {
+        return lhs;
+    }
+
+    return expr;
+}
+
+static ast_node_t *
+parser_parse_expr(parser_t *parser)
+{
+    ast_node_t *lhs = parser_parse_factor(parser);
+    if (lhs == NULL) {
+        return NULL;
+    }
+
+    return parser_parse_expr_1(parser, lhs, BINOP_MIN_PREC);
+}
+
+static ast_node_t *
+parser_parse_factor(parser_t *parser)
+{
+    token_t token;
+    lexer_next_token(parser->lexer, &token);
+
+    switch (token.kind) {
+        case TOKEN_NUMBER:
+            return ast_new_node_literal_u32(parser->arena, string_view_to_u32(token.value));
+
+        case TOKEN_OPAREN: {
+            ast_node_t *expr = parser_parse_expr(parser);
+            if (expr == NULL) {
+                return NULL;
+            }
+
+            if (!skip_expected_token(parser, TOKEN_CPAREN)) {
+                return NULL;
+            }
+
+            return expr;
+        }
+        default: {
+            fprintf(stderr, "error: parse_factor: unsupported or invalid token (%s)\n", token_kind_to_cstr(token.kind));
+            assert(false);
+        }
+    }
+}
+
 ast_node_t *
 parser_parse_fn_definition(parser_t *parser)
 {
@@ -131,41 +316,39 @@ parser_parse_type(parser_t *parser, type_t *type)
 static ast_node_t *
 parser_parse_block(parser_t *parser)
 {
-    token_t number_token;
     if (!skip_expected_token(parser, TOKEN_OCURLY)) {
-        return false;
+        return NULL;
     }
 
     skip_line_feeds(parser->lexer);
 
     ast_node_t *node_block = ast_new_node_block(parser->arena);
+    if (node_block == NULL) {
+        return NULL;
+    }
 
     if (!skip_expected_token(parser, TOKEN_RETURN)) {
-        return false;
+        return NULL;
     }
 
     ast_node_t *node_return_stmt = ast_new_node_return_stmt(parser->arena);
     assert(node_return_stmt);
 
-    if (!expected_token(parser, &number_token, TOKEN_NUMBER)) {
-        return false;
+    ast_node_t *expr = parser_parse_expr(parser);
+    if (expr == NULL) {
+        return NULL;
     }
 
-    ast_node_t *literal_node = ast_new_node_literal_u32(parser->arena, string_view_to_u32(number_token.value));
-    assert(literal_node);
-
-    node_return_stmt->data.as_return_stmt.data = literal_node;
+    node_return_stmt->data.as_return_stmt.data = expr;
 
     list_append(node_block->data.as_block.nodes, node_return_stmt);
-
     if (!skip_expected_token(parser, TOKEN_LF)) {
-        return false;
+        return NULL;
     }
 
     skip_line_feeds(parser->lexer);
-
     if (!skip_expected_token(parser, TOKEN_CCURLY)) {
-        return false;
+        return NULL;
     }
 
     return node_block;
-- 
2.44.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [olang/patches/.build.yml] build success
  2024-03-17 21:29 ` [PATCH olang v2 3/3] parser: add all binary operation expressions Johnny Richard
@ 2024-03-17 20:37   ` builds.sr.ht
  0 siblings, 0 replies; 7+ messages in thread
From: builds.sr.ht @ 2024-03-17 20:37 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

olang/patches/.build.yml: SUCCESS in 38s

[frontend: add binary operation expr support][0] v2 from [Johnny Richard][1]

[0]: https://lists.sr.ht/~johnnyrichard/olang-devel/patches/50275
[1]: mailto:johnny@johnnyrichard.com

✓ #1171442 SUCCESS olang/patches/.build.yml https://builds.sr.ht/~johnnyrichard/job/1171442

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-18  7:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-17 21:29 [PATCH olang v2 0/3] frontend: add binary operation expr support Johnny Richard
2024-03-17 21:29 ` [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Johnny Richard
2024-03-18  0:30   ` Carlos Maniero
2024-03-18  8:49     ` Johnny Richard
2024-03-17 21:29 ` [PATCH olang v2 2/3] ast: create binary operation ast node Johnny Richard
2024-03-17 21:29 ` [PATCH olang v2 3/3] parser: add all binary operation expressions Johnny Richard
2024-03-17 20:37   ` [olang/patches/.build.yml] build success builds.sr.ht

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox