public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed
* [olang/patches/.build.yml] build success
  2024-02-28 19:04 ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
@ 2024-02-28 18:11   ` builds.sr.ht
  2024-03-01  3:34   ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Carlos Maniero
  1 sibling, 0 replies; 11+ messages in thread
From: builds.sr.ht @ 2024-02-28 18:11 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

olang/patches/.build.yml: SUCCESS in 49s

[create initial syntax analysis logic][0] from [Johnny Richard][1]

[0]: https://lists.sr.ht/~johnnyrichard/olang-devel/patches/49873
[1]: mailto:johnny@johnnyrichard.com

✓ #1159030 SUCCESS olang/patches/.build.yml https://builds.sr.ht/~johnnyrichard/job/1159030

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH olang v1 0/4] create initial syntax analysis logic
@ 2024-02-28 19:04 Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 1/4] string_view: add string view formatter for printf fmt Johnny Richard
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Johnny Richard @ 2024-02-28 19:04 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This patchset implements the initial syntax analysis logic for a small
portion of the language.

Since the type system contains only a u32 type, the semantic analysis
can be postponed.

In terms of Abstract Syntax Tree, this patchset also introduces a very
basic AST data structure.

Johnny Richard (4):
  string_view: add string view formatter for printf fmt
  string_view: add string view conversion to uint32_t
  lexer: add token lookahead capability
  parser: create simplified parser for tiny AST

 src/ast.c                     |  79 ++++++++++++++
 src/ast.h                     | 101 ++++++++++++++++++
 src/lexer.c                   |  38 +++++++
 src/lexer.h                   |  10 ++
 src/parser.c                  | 193 ++++++++++++++++++++++++++++++++++
 src/parser.h                  |  38 +++++++
 src/string_view.c             |  11 ++
 src/string_view.h             |   8 +-
 tests/unit/parser_test.c      |  88 ++++++++++++++++
 tests/unit/string_view_test.c |  61 +++++++++++
 10 files changed, 626 insertions(+), 1 deletion(-)
 create mode 100644 src/ast.c
 create mode 100644 src/ast.h
 create mode 100644 src/parser.c
 create mode 100644 src/parser.h
 create mode 100644 tests/unit/parser_test.c
 create mode 100644 tests/unit/string_view_test.c

-- 
2.43.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH olang v1 1/4] string_view: add string view formatter for printf fmt
  2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
@ 2024-02-28 19:04 ` Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t Johnny Richard
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Johnny Richard @ 2024-02-28 19:04 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
 src/string_view.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/string_view.h b/src/string_view.h
index 367ef6b..a212ab9 100644
--- a/src/string_view.h
+++ b/src/string_view.h
@@ -19,6 +19,10 @@
 
 #include <stdbool.h>
 #include <stddef.h>
+#include <stdint.h>
+
+#define SV_FMT "%.*s"
+#define SV_ARG(sv) (int)(sv).size, (sv).chars
 
 typedef struct string_view
 {
-- 
2.43.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t
  2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 1/4] string_view: add string view formatter for printf fmt Johnny Richard
@ 2024-02-28 19:04 ` Johnny Richard
  2024-02-29 15:16   ` Carlos Maniero
  2024-02-28 19:04 ` [PATCH olang v1 3/4] lexer: add token lookahead capability Johnny Richard
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Johnny Richard @ 2024-02-28 19:04 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
 src/string_view.c             | 11 +++++++
 src/string_view.h             |  4 ++-
 tests/unit/string_view_test.c | 61 +++++++++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 tests/unit/string_view_test.c

diff --git a/src/string_view.c b/src/string_view.c
index 122eaa2..58bf197 100644
--- a/src/string_view.c
+++ b/src/string_view.c
@@ -17,6 +17,8 @@
 #include "string_view.h"
 
 #include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
 #include <string.h>
 
 bool
@@ -33,3 +35,12 @@ string_view_eq_to_cstr(string_view_t str, char *cstr)
     }
     return i == cstr_len;
 }
+
+uint32_t
+string_view_to_u32(string_view_t str)
+{
+    char ret[str.size + 1];
+    memset(ret, 0, str.size + 1);
+    memcpy(ret, str.chars, str.size);
+    return atoi(ret);
+}
diff --git a/src/string_view.h b/src/string_view.h
index a212ab9..d5d2e6c 100644
--- a/src/string_view.h
+++ b/src/string_view.h
@@ -31,8 +31,10 @@ typedef struct string_view
 
 } string_view_t;
 
-// TODO: missing unit test
 bool
 string_view_eq_to_cstr(string_view_t str, char *cstr);
 
+uint32_t
+string_view_to_u32(string_view_t str);
+
 #endif /* STRING_VIEW_T */
diff --git a/tests/unit/string_view_test.c b/tests/unit/string_view_test.c
new file mode 100644
index 0000000..b5d1fcf
--- /dev/null
+++ b/tests/unit/string_view_test.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#define MUNIT_ENABLE_ASSERT_ALIASES
+#include "munit.h"
+#include "string_view.h"
+
+#include <string.h>
+
+static MunitResult
+string_view_eq_to_cstr_test(const MunitParameter params[], void *user_data_or_fixture)
+{
+    char *name = "John Doe";
+
+    string_view_t str = { .chars = name, .size = strlen(name) };
+
+    assert_true(string_view_eq_to_cstr(str, "John Doe"));
+    assert_false(string_view_eq_to_cstr(str, "Doe"));
+
+    return MUNIT_OK;
+}
+
+static MunitResult
+string_view_to_u32_test(const MunitParameter params[], void *user_data_or_fixture)
+{
+    char *number = "69";
+
+    string_view_t str = { .chars = number, .size = strlen(number) };
+
+    assert_uint32(string_view_to_u32(str), ==, 69);
+
+    return MUNIT_OK;
+}
+
+static MunitTest tests[] = {
+    { "/eq_to_cstr_test", string_view_eq_to_cstr_test, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL },
+    { "/to_u32_test", string_view_to_u32_test, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL },
+    { NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }
+};
+
+static const MunitSuite suite = { "/string_view", tests, NULL, 1, MUNIT_SUITE_OPTION_NONE };
+
+int
+main(int argc, char *argv[])
+{
+    return munit_suite_main(&suite, NULL, argc, argv);
+    return EXIT_SUCCESS;
+}
-- 
2.43.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH olang v1 3/4] lexer: add token lookahead capability
  2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 1/4] string_view: add string view formatter for printf fmt Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t Johnny Richard
@ 2024-02-28 19:04 ` Johnny Richard
  2024-02-28 19:04 ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
  2024-03-01 22:33 ` [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
  4 siblings, 0 replies; 11+ messages in thread
From: Johnny Richard @ 2024-02-28 19:04 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

In order to skip line breaks (LF) we have to be able to spy the next
token without consume it.

This patch also adds a function to **lexer_peek_next** token, which is
equivalent to **lexer_look_ahead(n = 1)**.

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
 src/lexer.c | 22 ++++++++++++++++++++++
 src/lexer.h |  6 ++++++
 2 files changed, 28 insertions(+)

diff --git a/src/lexer.c b/src/lexer.c
index b107762..c7756a6 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -233,3 +233,25 @@ lexer_str_to_token_kind(string_view_t text)
 
     return TOKEN_IDENTIFIER;
 }
+
+void
+lexer_peek_next(lexer_t *lexer, token_t *token)
+{
+    lexer_lookahead(lexer, token, 1);
+}
+
+void
+lexer_lookahead(lexer_t *lexer, token_t *token, size_t n)
+{
+    size_t previous_offset = lexer->offset;
+    size_t previous_row = lexer->row;
+    size_t previous_bol = lexer->bol;
+
+    for (size_t i = 0; i < n; ++i) {
+        lexer_next_token(lexer, token);
+    }
+
+    lexer->offset = previous_offset;
+    lexer->row = previous_row;
+    lexer->bol = previous_bol;
+}
diff --git a/src/lexer.h b/src/lexer.h
index 8c09e02..729c957 100644
--- a/src/lexer.h
+++ b/src/lexer.h
@@ -68,6 +68,12 @@ lexer_init(lexer_t *lexer, string_view_t source);
 void
 lexer_next_token(lexer_t *lexer, token_t *token);
 
+void
+lexer_peek_next(lexer_t *lexer, token_t *token);
+
+void
+lexer_lookahead(lexer_t *lexer, token_t *token, size_t n);
+
 char *
 token_kind_to_cstr(token_kind_t kind);
 
-- 
2.43.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH olang v1 4/4] parser: create simplified parser for tiny AST
  2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
                   ` (2 preceding siblings ...)
  2024-02-28 19:04 ` [PATCH olang v1 3/4] lexer: add token lookahead capability Johnny Richard
@ 2024-02-28 19:04 ` Johnny Richard
  2024-02-28 18:11   ` [olang/patches/.build.yml] build success builds.sr.ht
  2024-03-01  3:34   ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Carlos Maniero
  2024-03-01 22:33 ` [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
  4 siblings, 2 replies; 11+ messages in thread
From: Johnny Richard @ 2024-02-28 19:04 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This commit introduces a simple and restricted parser designed to handle
a small program structure. Its purpose is to lay the foundation for
future optimizations.

Error handling during syntax analysis is rudimentary. If an error
occurs, it will be printed, and the program will abort without further
parsing to detect additional syntax errors.

Additionally, it's important to note that semantic analysis will be
conducted at a later stage in the compiler pipeline. As only u32 type is
currently implemented, a separate type checker will not be developed.
Consequently, the AST generated during syntax analysis can be directly
passed to the backend.

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
 src/ast.c                |  79 ++++++++++++++++
 src/ast.h                | 101 ++++++++++++++++++++
 src/lexer.c              |  16 ++++
 src/lexer.h              |   4 +
 src/parser.c             | 193 +++++++++++++++++++++++++++++++++++++++
 src/parser.h             |  38 ++++++++
 tests/unit/parser_test.c |  88 ++++++++++++++++++
 7 files changed, 519 insertions(+)
 create mode 100644 src/ast.c
 create mode 100644 src/ast.h
 create mode 100644 src/parser.c
 create mode 100644 src/parser.h
 create mode 100644 tests/unit/parser_test.c

diff --git a/src/ast.c b/src/ast.c
new file mode 100644
index 0000000..ad3124d
--- /dev/null
+++ b/src/ast.c
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+
+#include "arena.h"
+#include "ast.h"
+#include "string_view.h"
+
+ast_node_t *
+ast_make_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type, ast_node_t *block)
+{
+    ast_node_t *node_fn_def = (ast_node_t *)arena_alloc(arena, sizeof(ast_node_t));
+    assert(node_fn_def);
+
+    node_fn_def->kind = AST_NODE_FN_DEF;
+    ast_fn_definition_t *fn_def = &node_fn_def->data.as_fn_def;
+
+    fn_def->identifier = identifier;
+    fn_def->return_type = return_type;
+    fn_def->block = block;
+
+    return node_fn_def;
+}
+
+ast_node_t *
+ast_make_node_literal_u32(arena_t *arena, uint32_t value)
+{
+    ast_node_t *node_literal = (ast_node_t *)arena_alloc(arena, sizeof(ast_node_t));
+    assert(node_literal);
+
+    node_literal->kind = AST_NODE_LITERAL;
+    node_literal->data.as_literal.kind = AST_LITERAL_U32;
+    node_literal->data.as_literal.value.as_u32 = value;
+
+    return node_literal;
+}
+
+ast_node_t *
+ast_make_node_return_stmt(arena_t *arena)
+{
+    ast_node_t *node_return_stmt = (ast_node_t *)arena_alloc(arena, sizeof(ast_node_t));
+    assert(node_return_stmt);
+
+    node_return_stmt->kind = AST_NODE_RETURN_STMT;
+
+    return node_return_stmt;
+}
+
+ast_node_t *
+ast_make_node_block(arena_t *arena)
+{
+    ast_node_t *node_block = (ast_node_t *)arena_alloc(arena, sizeof(ast_node_t));
+    assert(node_block);
+
+    node_block->kind = AST_NODE_BLOCK;
+
+    node_block->data.as_block.nodes = (list_t *)arena_alloc(arena, sizeof(list_t));
+    assert(node_block->data.as_block.nodes);
+
+    list_init(node_block->data.as_block.nodes, arena);
+
+    return node_block;
+}
diff --git a/src/ast.h b/src/ast.h
new file mode 100644
index 0000000..b80c067
--- /dev/null
+++ b/src/ast.h
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#ifndef AST_H
+#define AST_H
+
+#include <stdint.h>
+
+#include "arena.h"
+#include "list.h"
+#include "string_view.h"
+
+typedef struct ast_node ast_node_t;
+
+typedef enum
+{
+    AST_NODE_BLOCK,
+    AST_NODE_FN_DEF,
+    AST_NODE_RETURN_STMT,
+    AST_NODE_LITERAL,
+    AST_NODE_UNKNOWN
+} ast_node_kind_t;
+
+typedef enum
+{
+    TYPE_U32
+} type_t;
+
+typedef struct ast_block
+{
+    list_t *nodes;
+} ast_block_t;
+
+typedef struct ast_fn_definition
+{
+    string_view_t identifier;
+    type_t return_type;
+    ast_node_t *block;
+} ast_fn_definition_t;
+
+typedef enum
+{
+    AST_LITERAL_U32
+} ast_literal_kind_t;
+
+typedef union
+{
+    uint32_t as_u32;
+} ast_literal_value_t;
+
+typedef struct ast_literal
+{
+    ast_literal_kind_t kind;
+    ast_literal_value_t value;
+} ast_literal_t;
+
+typedef struct ast_return_stmt
+{
+    ast_node_t *data;
+} ast_return_stmt_t;
+
+typedef union
+{
+    ast_fn_definition_t as_fn_def;
+    ast_literal_t as_literal;
+    ast_block_t as_block;
+    ast_return_stmt_t as_return_stmt;
+} ast_node_data_t;
+
+typedef struct ast_node
+{
+    ast_node_kind_t kind;
+    ast_node_data_t data;
+} ast_node_t;
+
+ast_node_t *
+ast_make_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type, ast_node_t *block);
+
+ast_node_t *
+ast_make_node_literal_u32(arena_t *arena, uint32_t value);
+
+ast_node_t *
+ast_make_node_return_stmt(arena_t *arena);
+
+ast_node_t *
+ast_make_node_block(arena_t *arena);
+
+#endif /* AST_H */
diff --git a/src/lexer.c b/src/lexer.c
index c7756a6..e9e97d4 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -19,6 +19,7 @@
 #include <assert.h>
 #include <ctype.h>
 #include <stdbool.h>
+#include <stdio.h>
 
 void
 lexer_init(lexer_t *lexer, string_view_t source)
@@ -255,3 +256,18 @@ lexer_lookahead(lexer_t *lexer, token_t *token, size_t n)
     lexer->row = previous_row;
     lexer->bol = previous_bol;
 }
+
+void
+lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream)
+{
+    size_t offset = token->location.bol;
+    char *str = lexer->source.chars + offset;
+
+    size_t i = 0;
+    while ((i + offset) < lexer->source.size && str[i] != '\n' && str[i] != 0) {
+        ++i;
+    }
+    string_view_t line = { .chars = str, .size = i };
+    fprintf(stream, "" SV_FMT "\n", SV_ARG(line));
+    fprintf(stream, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^");
+}
diff --git a/src/lexer.h b/src/lexer.h
index 729c957..d836b91 100644
--- a/src/lexer.h
+++ b/src/lexer.h
@@ -19,6 +19,7 @@
 
 #include "string_view.h"
 #include <stdint.h>
+#include <stdio.h>
 
 typedef struct lexer
 {
@@ -77,4 +78,7 @@ lexer_lookahead(lexer_t *lexer, token_t *token, size_t n);
 char *
 token_kind_to_cstr(token_kind_t kind);
 
+void
+lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream);
+
 #endif /* LEXER_H */
diff --git a/src/parser.c b/src/parser.c
new file mode 100644
index 0000000..f50b61a
--- /dev/null
+++ b/src/parser.c
@@ -0,0 +1,193 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "lexer.h"
+#include "parser.h"
+
+static bool
+skip_expected_token(parser_t *parser, token_kind_t expected_kind);
+
+static bool
+expected_token(parser_t *parser, token_t *token, token_kind_t kind);
+
+static bool
+parser_parse_type(parser_t *parser, type_t *type);
+
+static ast_node_t *
+parser_parse_block(parser_t *parser);
+
+static void
+skip_line_feeds(lexer_t *lexer);
+
+void
+parser_init(parser_t *parser, lexer_t *lexer, arena_t *arena, char *file_path)
+{
+    assert(parser && "parser is required");
+    assert(lexer && "lexer is required");
+    assert(file_path && "file_path is required");
+    parser->lexer = lexer;
+    parser->arena = arena;
+    parser->file_path = file_path;
+}
+
+ast_node_t *
+parser_parse_fn_definition(parser_t *parser)
+{
+    if (!skip_expected_token(parser, TOKEN_FN))
+        return NULL;
+
+    skip_line_feeds(parser->lexer);
+
+    token_t fn_name_token;
+
+    if (!expected_token(parser, &fn_name_token, TOKEN_IDENTIFIER))
+        return NULL;
+
+    skip_line_feeds(parser->lexer);
+
+    if (!skip_expected_token(parser, TOKEN_OPAREN))
+        return NULL;
+
+    skip_line_feeds(parser->lexer);
+
+    if (!skip_expected_token(parser, TOKEN_CPAREN))
+        return NULL;
+
+    skip_line_feeds(parser->lexer);
+
+    if (!skip_expected_token(parser, TOKEN_COLON))
+        return NULL;
+
+    skip_line_feeds(parser->lexer);
+
+    type_t fn_return_type;
+    if (!parser_parse_type(parser, &fn_return_type)) {
+        return NULL;
+    }
+
+    skip_line_feeds(parser->lexer);
+
+    ast_node_t *block = parser_parse_block(parser);
+    if (block == NULL) {
+        return NULL;
+    }
+
+    return ast_make_node_fn_def(parser->arena, fn_name_token.value, fn_return_type, block);
+}
+
+static bool
+parser_parse_type(parser_t *parser, type_t *type)
+{
+    token_t token;
+
+    if (!expected_token(parser, &token, TOKEN_IDENTIFIER)) {
+        return false;
+    }
+
+    if (string_view_eq_to_cstr(token.value, "u32")) {
+        *type = TYPE_U32;
+        return true;
+    }
+
+    return false;
+}
+
+static ast_node_t *
+parser_parse_block(parser_t *parser)
+{
+    token_t number_token;
+    if (!skip_expected_token(parser, TOKEN_OCURLY)) {
+        return false;
+    }
+
+    skip_line_feeds(parser->lexer);
+
+    ast_node_t *node_block = ast_make_node_block(parser->arena);
+
+    if (!skip_expected_token(parser, TOKEN_RETURN)) {
+        return false;
+    }
+
+    ast_node_t *node_return_stmt = ast_make_node_return_stmt(parser->arena);
+    assert(node_return_stmt);
+
+    if (!expected_token(parser, &number_token, TOKEN_NUMBER)) {
+        return false;
+    }
+
+    ast_node_t *literal_node = ast_make_node_literal_u32(parser->arena, string_view_to_u32(number_token.value));
+    assert(literal_node);
+
+    node_return_stmt->data.as_return_stmt.data = literal_node;
+
+    list_append(node_block->data.as_block.nodes, node_return_stmt);
+
+    if (!skip_expected_token(parser, TOKEN_LF)) {
+        return false;
+    }
+
+    skip_line_feeds(parser->lexer);
+
+    if (!skip_expected_token(parser, TOKEN_CCURLY)) {
+        return false;
+    }
+
+    return node_block;
+}
+
+static bool
+skip_expected_token(parser_t *parser, token_kind_t expected_kind)
+{
+    token_t token;
+    return expected_token(parser, &token, expected_kind);
+}
+
+static bool
+expected_token(parser_t *parser, token_t *token, token_kind_t expected_kind)
+{
+    lexer_next_token(parser->lexer, token);
+
+    if (token->kind != expected_kind) {
+        fprintf(stderr,
+                "%s:%lu:%lu: error: got <%s> token but expect <%s>\n",
+                parser->file_path,
+                token->location.row + 1,
+                (token->location.offset - token->location.bol) + 1,
+                token_kind_to_cstr(token->kind),
+                token_kind_to_cstr(expected_kind));
+        lexer_print_token_highlight(parser->lexer, token, stderr);
+        return false;
+    }
+    return true;
+}
+
+static void
+skip_line_feeds(lexer_t *lexer)
+{
+    token_t token;
+    lexer_peek_next(lexer, &token);
+
+    while (token.kind == TOKEN_LF) {
+        lexer_next_token(lexer, &token);
+        lexer_peek_next(lexer, &token);
+    }
+}
diff --git a/src/parser.h b/src/parser.h
new file mode 100644
index 0000000..3f1a00b
--- /dev/null
+++ b/src/parser.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#ifndef PARSER_H
+#define PARSER_H
+
+#include "arena.h"
+#include "ast.h"
+#include "lexer.h"
+
+typedef struct parser
+{
+    lexer_t *lexer;
+    arena_t *arena;
+    // TODO: we should define a better place to file_path string
+    char *file_path;
+} parser_t;
+
+void
+parser_init(parser_t *parser, lexer_t *lexer, arena_t *arena, char *file_path);
+
+ast_node_t *
+parser_parse_fn_definition(parser_t *parser);
+
+#endif /* PARSER_H */
diff --git a/tests/unit/parser_test.c b/tests/unit/parser_test.c
new file mode 100644
index 0000000..32ebc8e
--- /dev/null
+++ b/tests/unit/parser_test.c
@@ -0,0 +1,88 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#define MUNIT_ENABLE_ASSERT_ALIASES
+
+#include "arena.h"
+#include "ast.h"
+#include "lexer.h"
+#include "list.h"
+#include "munit.h"
+#include "parser.h"
+#include "string_view.h"
+
+#define ARENA_CAPACITY (1024 * 1024)
+
+static MunitResult
+parse_fn_definition_test(const MunitParameter params[], void *user_data_or_fixture)
+{
+    arena_t arena = arena_new(ARENA_CAPACITY);
+
+    char *file_path = "main.0";
+    char *source_value = "fn main(): u32 {\n\treturn 69\n}";
+
+    lexer_t lexer;
+    string_view_t source = { .chars = source_value, .size = strlen(source_value) };
+    lexer_init(&lexer, source);
+
+    parser_t parser;
+    parser_init(&parser, &lexer, &arena, file_path);
+
+    ast_node_t *node_fn_def = parser_parse_fn_definition(&parser);
+    assert_not_null(node_fn_def);
+    assert_uint(node_fn_def->kind, ==, AST_NODE_FN_DEF);
+
+    ast_fn_definition_t *fn = &node_fn_def->data.as_fn_def;
+    assert_memory_equal(fn->identifier.size, fn->identifier.chars, "main");
+    assert_uint(fn->return_type, ==, TYPE_U32);
+
+    ast_node_t *block = fn->block;
+    assert_not_null(block);
+
+    assert_uint(block->kind, ==, AST_NODE_BLOCK);
+    assert_uint(list_size(block->data.as_block.nodes), ==, 1);
+    list_item_t *block_item = list_get(block->data.as_block.nodes, 0);
+    assert_not_null(block_item);
+    assert_not_null(block_item->value);
+
+    ast_node_t *node = (ast_node_t *)block_item->value;
+    assert_not_null(node);
+    assert_uint(node->kind, ==, AST_NODE_RETURN_STMT);
+
+    ast_node_t *number_node = node->data.as_return_stmt.data;
+    assert_not_null(number_node);
+    assert_uint(number_node->kind, ==, AST_NODE_LITERAL);
+    assert_uint(number_node->data.as_literal.kind, ==, AST_LITERAL_U32);
+    assert_uint(number_node->data.as_literal.value.as_u32, ==, 69);
+
+    arena_free(&arena);
+
+    return MUNIT_OK;
+}
+
+static MunitTest tests[] = {
+    { "/parse_fn_definition", parse_fn_definition_test, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL },
+    { NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }
+};
+
+static const MunitSuite suite = { "/parser", tests, NULL, 1, MUNIT_SUITE_OPTION_NONE };
+
+int
+main(int argc, char *argv[])
+{
+    return munit_suite_main(&suite, NULL, argc, argv);
+    return EXIT_SUCCESS;
+}
-- 
2.43.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t
  2024-02-28 19:04 ` [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t Johnny Richard
@ 2024-02-29 15:16   ` Carlos Maniero
  2024-02-29 22:40     ` Johnny Richard
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos Maniero @ 2024-02-29 15:16 UTC (permalink / raw)
  To: Johnny Richard, ~johnnyrichard/olang-devel

First of all, thank you for adding tests to *string_view*. I've just
a few small adjustments to ask:

> +    memset(ret, 0, str.size + 1);
> +    memcpy(ret, str.chars, str.size);
This will iterate *str.size* times twice. Instead of using *memset* you
could only set the *NULL terminator* as the last char of *ret*.

> +static MunitResult
> +string_view_eq_to_cstr_test(const MunitParameter params[], void *user_data_or_fixture)
> +{
> +    char *name = "John Doe";
> +
> +    string_view_t str = { .chars = name, .size = strlen(name) };
> +
> +    assert_true(string_view_eq_to_cstr(str, "John Doe"));
> +    assert_false(string_view_eq_to_cstr(str, "Doe"));
> +
> +    return MUNIT_OK;
> +}
It would be great if you add also a test taking just a portion of the
string to make sure the *.size* is working.

> +static MunitResult
> +string_view_to_u32_test(const MunitParameter params[], void *user_data_or_fixture)
> +{
> +    char *number = "69";
> +
> +    string_view_t str = { .chars = number, .size = strlen(number) };
> +
> +    assert_uint32(string_view_to_u32(str), ==, 69);
> +
> +    return MUNIT_OK;
> +}
Same here. Instead of taking the entire number you can get just a digit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t
  2024-02-29 15:16   ` Carlos Maniero
@ 2024-02-29 22:40     ` Johnny Richard
  0 siblings, 0 replies; 11+ messages in thread
From: Johnny Richard @ 2024-02-29 22:40 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

On Thu, Feb 29, 2024 at 12:16:07PM -0300, Carlos Maniero wrote:
> > +    memset(ret, 0, str.size + 1);
> > +    memcpy(ret, str.chars, str.size);
> This will iterate *str.size* times twice. Instead of using *memset* you
> could only set the *NULL terminator* as the last char of *ret*.

Sure, I will fix it in an other revision.

> > +static MunitResult
> > +string_view_eq_to_cstr_test(const MunitParameter params[], void *user_data_or_fixture)
> > +{
> > +    char *name = "John Doe";
> > +
> > +    string_view_t str = { .chars = name, .size = strlen(name) };
> > +
> > +    assert_true(string_view_eq_to_cstr(str, "John Doe"));
> > +    assert_false(string_view_eq_to_cstr(str, "Doe"));
> > +
> > +    return MUNIT_OK;
> > +}
> It would be great if you add also a test taking just a portion of the
> string to make sure the *.size* is working.

Hmm... I think it's better to create a separated test for this.

> > +static MunitResult
> > +string_view_to_u32_test(const MunitParameter params[], void *user_data_or_fixture)
> > +{
> > +    char *number = "69";
> > +
> > +    string_view_t str = { .chars = number, .size = strlen(number) };
> > +
> > +    assert_uint32(string_view_to_u32(str), ==, 69);
> > +
> > +    return MUNIT_OK;
> > +}
> Same here. Instead of taking the entire number you can get just a digit.

I will create another test for this.  Something like test create slice over cstr.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH olang v1 4/4] parser: create simplified parser for tiny AST
  2024-02-28 19:04 ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
  2024-02-28 18:11   ` [olang/patches/.build.yml] build success builds.sr.ht
@ 2024-03-01  3:34   ` Carlos Maniero
  2024-03-01 22:23     ` Johnny Richard
  1 sibling, 1 reply; 11+ messages in thread
From: Carlos Maniero @ 2024-03-01  3:34 UTC (permalink / raw)
  To: Johnny Richard, ~johnnyrichard/olang-devel

> +ast_node_t *
> +ast_make_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type, ast_node_t *block);
> +
> +ast_node_t *
> +ast_make_node_literal_u32(arena_t *arena, uint32_t value);
> +
> +ast_node_t *
> +ast_make_node_return_stmt(arena_t *arena);
> +
> +ast_node_t *
> +ast_make_node_block(arena_t *arena);

s/ast_make/ast_new just to keep consistency with other functions that
allocate memory.

> +void
> +lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream)
> +{
> +    size_t offset = token->location.bol;
> +    char *str = lexer->source.chars + offset;
> +
> +    size_t i = 0;
> +    while ((i + offset) < lexer->source.size && str[i] != '\n' && str[i] != 0) {
> +        ++i;
> +    }
> +    string_view_t line = { .chars = str, .size = i };
> +    fprintf(stream, "" SV_FMT "\n", SV_ARG(line));
> +    fprintf(stream, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^");
> +}

1. It bothers me a little that the lexer is performing IO operations.
   IMO, the parser should be responsible for error handling. But I can live
   with this if you think this is the right place for this function.

2. An alternative is making the lexer returning the line *string_view*
   and make the parser do the rest.

3. nitpick: Isn't *(i + offset) < lexer->source.size* and *str[i] != 0*
   redundant? The last char will (or at least should) always be a NULL
   terminator.

4. I spent some time trying to understand what this function was
   supposed to do. I believe that having a function that just returns
   the token's line could help (2). But an alternative here is using the
   *string_view* struct instead of having *str* and *i*. Something like
   this:

     void
     lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream)
     {
         size_t offset = token->location.bol;
         string_view_t line = { .chars = lexer->source.chars + offset, .size = 0 };
     
         while (line.chars[i] != '\n' && line.chars[i] != 0) {
             ++line.size;
         }

         fprintf(stream, "" SV_FMT "\n", SV_ARG(line));
         fprintf(stream, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^");
     }

> +    if (!skip_expected_token(parser, TOKEN_COLON))
> +        return NULL;
> +
> +    skip_line_feeds(parser->lexer);
> +
> +    type_t fn_return_type;
> +    if (!parser_parse_type(parser, &fn_return_type)) {
> +        return NULL;
> +    }

nitpick: I’ve spotted some inconsistency with the use of brackets in
if statements. Just throwing it out there, but I believe we should be
using brackets in all if statements. However, I’m totally fine with
removing them in guard clauses, as long as we maintain a consistent
style.

This is great work man! I'm excited of how close we are now from start
the back-end.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH olang v1 4/4] parser: create simplified parser for tiny AST
  2024-03-01  3:34   ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Carlos Maniero
@ 2024-03-01 22:23     ` Johnny Richard
  0 siblings, 0 replies; 11+ messages in thread
From: Johnny Richard @ 2024-03-01 22:23 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

On Fri, Mar 01, 2024 at 12:34:59AM -0300, Carlos Maniero wrote:
> > +ast_node_t *
> > +ast_make_node_fn_def(arena_t *arena, string_view_t identifier, type_t return_type, ast_node_t *block);
> > +
> > +ast_node_t *
> > +ast_make_node_literal_u32(arena_t *arena, uint32_t value);
> > +
> > +ast_node_t *
> > +ast_make_node_return_stmt(arena_t *arena);
> > +
> > +ast_node_t *
> > +ast_make_node_block(arena_t *arena);
> 
> s/ast_make/ast_new just to keep consistency with other functions that
> allocate memory.

Sure!

> > +void
> > +lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream)
> > +{
> > +    size_t offset = token->location.bol;
> > +    char *str = lexer->source.chars + offset;
> > +
> > +    size_t i = 0;
> > +    while ((i + offset) < lexer->source.size && str[i] != '\n' && str[i] != 0) {
> > +        ++i;
> > +    }
> > +    string_view_t line = { .chars = str, .size = i };
> > +    fprintf(stream, "" SV_FMT "\n", SV_ARG(line));
> > +    fprintf(stream, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^");
> > +}
> 
> 1. It bothers me a little that the lexer is performing IO operations.
>    IMO, the parser should be responsible for error handling. But I can live
>    with this if you think this is the right place for this function.

After analise the code and think about the future typechecker
implementation I kind agree with you, I will change in the next patch.

> 2. An alternative is making the lexer returning the line *string_view*
>    and make the parser do the rest.

Great, I will do that.

> 3. nitpick: Isn't *(i + offset) < lexer->source.size* and *str[i] != 0*
>    redundant? The last char will (or at least should) always be a NULL
>    terminator.

I will keep the check, I don't see a problem on keeping it.  This will
make the code more resilient in case of bugs.

> 4. I spent some time trying to understand what this function was
>    supposed to do. I believe that having a function that just returns
>    the token's line could help (2). But an alternative here is using the
>    *string_view* struct instead of having *str* and *i*. Something like
>    this:
> 
>      void
>      lexer_print_token_highlight(lexer_t *lexer, token_t *token, FILE *stream)
>      {
>          size_t offset = token->location.bol;
>          string_view_t line = { .chars = lexer->source.chars + offset, .size = 0 };
>      
>          while (line.chars[i] != '\n' && line.chars[i] != 0) {

Ops, you forgot to remove the **i** here. :x

>              ++line.size;
>          }
> 
>          fprintf(stream, "" SV_FMT "\n", SV_ARG(line));
>          fprintf(stream, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^");
>      }

Yeah, I will do this change as well, thanks.

> > +    if (!skip_expected_token(parser, TOKEN_COLON))
> > +        return NULL;
> > +
> > +    skip_line_feeds(parser->lexer);
> > +
> > +    type_t fn_return_type;
> > +    if (!parser_parse_type(parser, &fn_return_type)) {
> > +        return NULL;
> > +    }
>
> nitpick: I’ve spotted some inconsistency with the use of brackets in
> if statements. Just throwing it out there, but I believe we should be
> using brackets in all if statements. However, I’m totally fine with
> removing them in guard clauses, as long as we maintain a consistent
> style.

Let's be consistent and keep **{** and **}** on every flow control
statements. Changed.

> This is great work man! I'm excited of how close we are now from start
> the back-end.

Thanks for reviewing it. I will make almost every change you requested
on a new patch revision.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH olang v1 0/4] create initial syntax analysis logic
  2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
                   ` (3 preceding siblings ...)
  2024-02-28 19:04 ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
@ 2024-03-01 22:33 ` Johnny Richard
  4 siblings, 0 replies; 11+ messages in thread
From: Johnny Richard @ 2024-03-01 22:33 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel

A new revision (v2) of this patchset has been sent.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-01 21:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28 19:04 [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard
2024-02-28 19:04 ` [PATCH olang v1 1/4] string_view: add string view formatter for printf fmt Johnny Richard
2024-02-28 19:04 ` [PATCH olang v1 2/4] string_view: add string view conversion to uint32_t Johnny Richard
2024-02-29 15:16   ` Carlos Maniero
2024-02-29 22:40     ` Johnny Richard
2024-02-28 19:04 ` [PATCH olang v1 3/4] lexer: add token lookahead capability Johnny Richard
2024-02-28 19:04 ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
2024-02-28 18:11   ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-01  3:34   ` [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Carlos Maniero
2024-03-01 22:23     ` Johnny Richard
2024-03-01 22:33 ` [PATCH olang v1 0/4] create initial syntax analysis logic Johnny Richard

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox