[PATCH olang v3 0/2] Create --dump-tokens on compiler cli

public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed

* [PATCH olang v3 0/2] Create --dump-tokens on compiler cli
@ 2024-02-19  1:38 Johnny Richard
  2024-02-19  1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Johnny Richard @ 2024-02-19  1:38 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This patchset creates the lexer and leave 3 TODO for man documents and
test (unit / integration).

Johnny Richard (2):
  utils: create string_view data structure
  lexer: create --dump-tokens cli command

 .gitignore                     |   1 +
 examples/main_exit.0           |   3 +
 src/0c.c                       | 121 +++++++++++++++++-
 src/lexer.c                    | 224 +++++++++++++++++++++++++++++++++
 src/lexer.h                    |  74 +++++++++++
 src/string_view.c              |  35 ++++++
 src/string_view.h              |  34 +++++
 tests/integration/cli_runner.c |   4 +-
 tests/integration/cli_runner.h |   2 +-
 tests/integration/cli_test.c   |   2 +-
 10 files changed, 494 insertions(+), 6 deletions(-)
 create mode 100644 examples/main_exit.0
 create mode 100644 src/lexer.c
 create mode 100644 src/lexer.h
 create mode 100644 src/string_view.c
 create mode 100644 src/string_view.h

-- 
2.43.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH olang v3 1/2] utils: create string_view data structure
  2024-02-19  1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
@ 2024-02-19  1:38 ` Johnny Richard
  2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
  2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
  2 siblings, 0 replies; 9+ messages in thread
From: Johnny Richard @ 2024-02-19  1:38 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
 src/string_view.c | 35 +++++++++++++++++++++++++++++++++++
 src/string_view.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)
 create mode 100644 src/string_view.c
 create mode 100644 src/string_view.h

diff --git a/src/string_view.c b/src/string_view.c
new file mode 100644
index 0000000..122eaa2
--- /dev/null
+++ b/src/string_view.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#include "string_view.h"
+
+#include <stdbool.h>
+#include <string.h>
+
+bool
+string_view_eq_to_cstr(string_view_t str, char *cstr)
+{
+    size_t cstr_len = strlen(cstr);
+    if (str.size != cstr_len) {
+        return false;
+    }
+
+    size_t i = 0;
+    while (i < cstr_len && str.chars[i] == cstr[i]) {
+        i++;
+    }
+    return i == cstr_len;
+}
diff --git a/src/string_view.h b/src/string_view.h
new file mode 100644
index 0000000..367ef6b
--- /dev/null
+++ b/src/string_view.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#ifndef STRING_VIEW_T
+#define STRING_VIEW_T
+
+#include <stdbool.h>
+#include <stddef.h>
+
+typedef struct string_view
+{
+    char *chars;
+    size_t size;
+
+} string_view_t;
+
+// TODO: missing unit test
+bool
+string_view_eq_to_cstr(string_view_t str, char *cstr);
+
+#endif /* STRING_VIEW_T */
-- 
2.43.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH olang v3 2/2] lexer: create --dump-tokens cli command
  2024-02-19  1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
  2024-02-19  1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard
@ 2024-02-19  1:44 ` Johnny Richard
  2024-02-19  0:47   ` [olang/patches/.build.yml] build success builds.sr.ht
                     ` (2 more replies)
  2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
  2 siblings, 3 replies; 9+ messages in thread
From: Johnny Richard @ 2024-02-19  1:44 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard


This patch introduces the dump tokens interface and create the initial
setup for lexical analysis.

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
Changes:

  - V2 fix linter
  - V3 fix integration tests

 .gitignore                     |   1 +
 examples/main_exit.0           |   3 +
 src/0c.c                       | 121 +++++++++++++++++-
 src/lexer.c                    | 224 +++++++++++++++++++++++++++++++++
 src/lexer.h                    |  74 +++++++++++
 tests/integration/cli_runner.c |   4 +-
 tests/integration/cli_runner.h |   2 +-
 tests/integration/cli_test.c   |   2 +-
 8 files changed, 425 insertions(+), 6 deletions(-)
 create mode 100644 examples/main_exit.0
 create mode 100644 src/lexer.c
 create mode 100644 src/lexer.h

diff --git a/.gitignore b/.gitignore
index fe64668..92496d7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,4 @@
 build
 *.o
 docs/site.tar.gz
+tests/integration/*_test
diff --git a/examples/main_exit.0 b/examples/main_exit.0
new file mode 100644
index 0000000..c86fc68
--- /dev/null
+++ b/examples/main_exit.0
@@ -0,0 +1,3 @@
+fn main(): u32 {
+  return 0
+}
diff --git a/src/0c.c b/src/0c.c
index 33ac945..e5199a7 100644
--- a/src/0c.c
+++ b/src/0c.c
@@ -14,8 +14,125 @@
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see <https://www.gnu.org/licenses/>.
  */
+#include <errno.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "lexer.h"
+#include "string_view.h"
+
+typedef struct cli_args
+{
+    int argc;
+    char **argv;
+} cli_args_t;
+
+char *
+cli_args_shift(cli_args_t *args);
+
+typedef struct cli_opts
+{
+    // TODO: create man page instruction for --dump-tokens option
+    bool dump_tokens;
+    char *file_path;
+} cli_opts_t;
+
+void
+print_usage(FILE *stream, char *prog);
+
+string_view_t
+read_entire_file(char *file_path);
+
 int
-main(void)
+main(int argc, char **argv)
+{
+    cli_args_t args = { .argc = argc, .argv = argv };
+    cli_opts_t opts = { 0 };
+
+    char *prog = cli_args_shift(&args);
+
+    if (argc != 3) {
+        print_usage(stderr, prog);
+        return EXIT_FAILURE;
+    }
+
+    for (char *arg = cli_args_shift(&args); arg != NULL; arg = cli_args_shift(&args)) {
+        if (strcmp(arg, "--dump-tokens") == 0) {
+            opts.dump_tokens = true;
+        } else {
+            opts.file_path = arg;
+        }
+    }
+
+    if (!opts.dump_tokens) {
+        print_usage(stderr, prog);
+        return EXIT_FAILURE;
+    }
+
+    string_view_t file_content = read_entire_file(opts.file_path);
+
+    // TODO: missing integration test for lexer tokenizing
+    lexer_t lexer = { 0 };
+    lexer_init(&lexer, file_content);
+
+    token_t token = { 0 };
+    lexer_next_token(&lexer, &token);
+    while (token.kind != TOKEN_EOF) {
+        printf("%s:%lu:%lu: <%s>\n",
+               opts.file_path,
+               token.location.row + 1,
+               (token.location.offset - token.location.bol) + 1,
+               token_kind_to_cstr(token.kind));
+        lexer_next_token(&lexer, &token);
+    }
+
+    free(file_content.chars);
+
+    return EXIT_SUCCESS;
+}
+
+char *
+cli_args_shift(cli_args_t *args)
+{
+    if (args->argc == 0)
+        return NULL;
+    --(args->argc);
+    return *(args->argv)++;
+}
+
+void
+print_usage(FILE *stream, char *prog)
+{
+    fprintf(stream, "usage: %s <source.0> --dump-tokens\n", prog);
+}
+
+string_view_t
+read_entire_file(char *file_path)
 {
-    return 0;
+    FILE *stream = fopen(file_path, "rb");
+
+    if (stream == NULL) {
+        fprintf(stderr, "Could not open file %s: %s\n", file_path, strerror(errno));
+        exit(EXIT_FAILURE);
+    }
+
+    string_view_t file_content = { 0 };
+
+    fseek(stream, 0, SEEK_END);
+    file_content.size = ftell(stream);
+    fseek(stream, 0, SEEK_SET);
+
+    file_content.chars = (char *)malloc(file_content.size);
+
+    if (file_content.chars == NULL) {
+        fprintf(stderr, "Could not read file %s: %s\n", file_path, strerror(errno));
+        exit(EXIT_FAILURE);
+    }
+
+    fread(file_content.chars, 1, file_content.size, stream);
+    fclose(stream);
+
+    return file_content;
 }
diff --git a/src/lexer.c b/src/lexer.c
new file mode 100644
index 0000000..544a54d
--- /dev/null
+++ b/src/lexer.c
@@ -0,0 +1,224 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#include "lexer.h"
+
+#include <assert.h>
+#include <ctype.h>
+#include <stdbool.h>
+
+void
+lexer_init(lexer_t *lexer, string_view_t source)
+{
+    assert(lexer);
+    lexer->source = source;
+    lexer->offset = 0;
+    lexer->row = 0;
+    lexer->bol = 0;
+}
+
+static char
+lexer_next_char(lexer_t *lexer);
+
+static void
+lexer_skip_char(lexer_t *lexer);
+
+static bool
+lexer_is_eof(lexer_t *lexer);
+
+static bool
+lexer_is_not_eof(lexer_t *lexer);
+
+static bool
+_isspace(char c);
+
+static void
+lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind);
+
+static void
+lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset);
+
+static token_kind_t
+lexer_str_to_token_kind(string_view_t text);
+
+void
+lexer_next_token(lexer_t *lexer, token_t *token)
+{
+    if (lexer_is_eof(lexer)) {
+        *token = (token_t){ .kind = TOKEN_EOF };
+        return;
+    }
+
+    char current_char = lexer_next_char(lexer);
+
+    if (_isspace(current_char)) {
+        while (_isspace(current_char) && lexer_is_not_eof(lexer)) {
+            lexer_skip_char(lexer);
+            current_char = lexer_next_char(lexer);
+        }
+    }
+
+    while (lexer_is_not_eof(lexer)) {
+        if (isalpha(current_char)) {
+            size_t start_offset = lexer->offset;
+            while (isalnum(current_char) && lexer_is_not_eof(lexer)) {
+                lexer_skip_char(lexer);
+                current_char = lexer_next_char(lexer);
+            }
+
+            string_view_t text = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset };
+
+            lexer_init_str_token(lexer, token, lexer_str_to_token_kind(text), start_offset);
+            return;
+        }
+
+        if (isdigit(current_char)) {
+            size_t start_offset = lexer->offset;
+            while (isdigit(current_char) && lexer_is_not_eof(lexer)) {
+                lexer_skip_char(lexer);
+                current_char = lexer_next_char(lexer);
+            }
+
+            lexer_init_str_token(lexer, token, TOKEN_NUMBER, start_offset);
+            return;
+        }
+
+        switch (current_char) {
+            case '(': {
+                lexer_init_char_token(lexer, token, TOKEN_OPAREN);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case ')': {
+                lexer_init_char_token(lexer, token, TOKEN_CPAREN);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case ':': {
+                lexer_init_char_token(lexer, token, TOKEN_COLON);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '{': {
+                lexer_init_char_token(lexer, token, TOKEN_OCURLY);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '}': {
+                lexer_init_char_token(lexer, token, TOKEN_CCURLY);
+                lexer_skip_char(lexer);
+                return;
+            }
+            case '\n': {
+                lexer_init_char_token(lexer, token, TOKEN_LF);
+                lexer_skip_char(lexer);
+                return;
+            }
+            default: {
+                lexer_init_char_token(lexer, token, TOKEN_UNKNOWN);
+                lexer_skip_char(lexer);
+                return;
+            }
+        }
+    }
+
+    if (lexer_is_eof(lexer)) {
+        *token = (token_t){ .kind = TOKEN_EOF };
+        return;
+    }
+}
+
+static char *token_kind_str_table[] = {
+    [TOKEN_UNKNOWN] = "unknown", [TOKEN_IDENTIFIER] = "identifier",
+    [TOKEN_NUMBER] = "number",   [TOKEN_FN] = "fn",
+    [TOKEN_RETURN] = "return",   [TOKEN_LF] = "line_feed",
+    [TOKEN_OPAREN] = "(",        [TOKEN_CPAREN] = ")",
+    [TOKEN_COLON] = ":",         [TOKEN_OCURLY] = "{",
+    [TOKEN_CCURLY] = "}",        [TOKEN_EOF] = "EOF",
+};
+
+char *
+token_kind_to_cstr(token_kind_t kind)
+{
+    assert(kind < sizeof(token_kind_str_table));
+    return token_kind_str_table[kind];
+}
+
+static char
+lexer_next_char(lexer_t *lexer)
+{
+    return lexer->source.chars[lexer->offset];
+}
+
+static void
+lexer_skip_char(lexer_t *lexer)
+{
+    assert(lexer->offset < lexer->source.size);
+    if (lexer->source.chars[lexer->offset] == '\n') {
+        lexer->row++;
+        lexer->bol = ++lexer->offset;
+    } else {
+        lexer->offset++;
+    }
+}
+
+static bool
+lexer_is_eof(lexer_t *lexer)
+{
+    return lexer->offset >= lexer->source.size;
+}
+
+static bool
+lexer_is_not_eof(lexer_t *lexer)
+{
+    return !lexer_is_eof(lexer);
+}
+
+static bool
+_isspace(char c)
+{
+    return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v';
+}
+
+static void
+lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind)
+{
+    string_view_t str = { .chars = lexer->source.chars + lexer->offset, .size = 1 };
+    token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol };
+    *token = (token_t){ .kind = kind, .value = str, .location = location };
+}
+
+static void
+lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset)
+{
+    string_view_t str = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset };
+    token_loc_t location = { .offset = start_offset, .row = lexer->row, .bol = lexer->bol };
+    *token = (token_t){ .kind = kind, .value = str, .location = location };
+}
+
+static token_kind_t
+lexer_str_to_token_kind(string_view_t text)
+{
+    if (string_view_eq_to_cstr(text, "return")) {
+        return TOKEN_RETURN;
+    }
+
+    if (string_view_eq_to_cstr(text, "fn")) {
+        return TOKEN_FN;
+    }
+
+    return TOKEN_IDENTIFIER;
+}
diff --git a/src/lexer.h b/src/lexer.h
new file mode 100644
index 0000000..8c09e02
--- /dev/null
+++ b/src/lexer.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2024 olang maintainers
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+#ifndef LEXER_H
+#define LEXER_H
+
+#include "string_view.h"
+#include <stdint.h>
+
+typedef struct lexer
+{
+    string_view_t source;
+    size_t offset;
+    size_t row;
+    size_t bol;
+} lexer_t;
+
+typedef enum token_kind
+{
+    TOKEN_UNKNOWN,
+    TOKEN_IDENTIFIER,
+    TOKEN_NUMBER,
+
+    // Keywords
+    TOKEN_FN,
+    TOKEN_RETURN,
+
+    // Single char
+    TOKEN_LF,
+    TOKEN_OPAREN,
+    TOKEN_CPAREN,
+    TOKEN_COLON,
+    TOKEN_OCURLY,
+    TOKEN_CCURLY,
+    TOKEN_EOF
+} token_kind_t;
+
+typedef struct token_loc
+{
+    size_t offset;
+    size_t row;
+    size_t bol;
+} token_loc_t;
+
+typedef struct token
+{
+    token_kind_t kind;
+    string_view_t value;
+    token_loc_t location;
+} token_t;
+
+void
+lexer_init(lexer_t *lexer, string_view_t source);
+
+void
+lexer_next_token(lexer_t *lexer, token_t *token);
+
+char *
+token_kind_to_cstr(token_kind_t kind);
+
+#endif /* LEXER_H */
diff --git a/tests/integration/cli_runner.c b/tests/integration/cli_runner.c
index 4e0f7c4..0531bcc 100644
--- a/tests/integration/cli_runner.c
+++ b/tests/integration/cli_runner.c
@@ -62,7 +62,7 @@ create_tmp_file_name(char *file_name)
 }
 
 cli_result_t
-cli_runner_compile_file(char *src)
+cli_runner_compiler_dump_tokens(char *src)
 {
     assert_compiler_exists();
 
@@ -70,7 +70,7 @@ cli_runner_compile_file(char *src)
     create_tmp_file_name(result.program_path);
 
     char command[1024];
-    sprintf(command, "%s -o %s %s", OLANG_COMPILER_PATH, result.program_path, src);
+    sprintf(command, "%s %s --dump-tokens", OLANG_COMPILER_PATH, src);
 
     result.exit_code = system(command);
     return result;
diff --git a/tests/integration/cli_runner.h b/tests/integration/cli_runner.h
index 5caa319..8f4d69a 100644
--- a/tests/integration/cli_runner.h
+++ b/tests/integration/cli_runner.h
@@ -23,5 +23,5 @@ typedef struct cli_result_t
 } cli_result_t;
 
 cli_result_t
-cli_runner_compile_file(char *src);
+cli_runner_compiler_dump_tokens(char *src);
 #endif
diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c
index c7a9557..ce2ed91 100644
--- a/tests/integration/cli_test.c
+++ b/tests/integration/cli_test.c
@@ -21,7 +21,7 @@
 static MunitResult
 test_cli_hello_file(const MunitParameter params[], void *user_data_or_fixture)
 {
-    cli_result_t compilation_result = cli_runner_compile_file("../../examples/hello.olang");
+    cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.0");
     munit_assert_int(compilation_result.exit_code, ==, 0);
     return MUNIT_OK;
 }
-- 
2.43.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [olang/patches/.build.yml] build success
  2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
@ 2024-02-19  0:47   ` builds.sr.ht
  2024-02-19  3:30   ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero
  2024-02-19 10:01   ` Carlos Maniero
  2 siblings, 0 replies; 9+ messages in thread
From: builds.sr.ht @ 2024-02-19  0:47 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

olang/patches/.build.yml: SUCCESS in 34s

[Create --dump-tokens on compiler cli][0] v3 from [Johnny Richard][1]

[0]: https://lists.sr.ht/~johnnyrichard/olang-devel/patches/49645
[1]: mailto:johnny@johnnyrichard.com

✓ #1153060 SUCCESS olang/patches/.build.yml https://builds.sr.ht/~johnnyrichard/job/1153060

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command
  2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
  2024-02-19  0:47   ` [olang/patches/.build.yml] build success builds.sr.ht
@ 2024-02-19  3:30   ` Carlos Maniero
  2024-02-19 19:51     ` Johnny Richard
  2024-02-19 10:01   ` Carlos Maniero
  2 siblings, 1 reply; 9+ messages in thread
From: Carlos Maniero @ 2024-02-19  3:30 UTC (permalink / raw)
  To: Johnny Richard, ~johnnyrichard/olang-devel

Nice work man! I just have a few comments:

> +    while (token.kind != TOKEN_EOF) {
> +        printf("%s:%lu:%lu: <%s>\n",
> +               opts.file_path,
> +               token.location.row + 1,
> +               (token.location.offset - token.location.bol) + 1,
> +               token_kind_to_cstr(token.kind));
> +        lexer_next_token(&lexer, &token);
> +    }
IMO, EOF token should be printed to, as it is a token returned by the
lexer.

> +    if (lexer_is_eof(lexer)) {
> +        *token = (token_t){ .kind = TOKEN_EOF };
> +        return;
> +    }

Missing token location. I know it seems silly to have the EOF position.
But it is useful for parser error messages such as "expected } found
EOF". Remember that this code appears twice, before and after the while.

> +lexer_next_char(lexer_t *lexer)

s/lexer_next_char/lexer_current_char
the current name of the function give me the impression that it changes
the offset.

> +    if (lexer->source.chars[lexer->offset] == '\n') {
call lexer_next_char/lexer_current_char instead.

> +static bool
> +_isspace(char c)
> +{
> +    return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v';
> +}

What do you think about just add the *\n* guard before calling the
*isspace* that way it is clean for someone reading the code why you have
to reimplement the function.

  return c != '\n' && isspace(c);

> +static void
> +lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind);
> +
> +static void
> +lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset);
> +
> +static token_kind_t
> +lexer_str_to_token_kind(string_view_t text);

I don't have a suggestion to it, but IMO *lexer_init_char_token* and
*lexer_init_str_token* makes me feel we are initializing a "string" and
a 'char" token. But I haven't a better name, I thought calling it
*lexer_init_single_char_token* and *lexer_init_multi_char_token* but
IDK if it is really better.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command
  2024-02-19  3:30   ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero
@ 2024-02-19 19:51     ` Johnny Richard
  2024-02-19 19:17       ` Carlos Maniero
  0 siblings, 1 reply; 9+ messages in thread
From: Johnny Richard @ 2024-02-19 19:51 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

Thanks for review it.  Here I have the changes you requested.  Let me
know if this changes are enough.

    Johnny Richard

-------->8--------
Subject: fixup: review comments

---
 0c.c    |   20 +++++++++++++++-----
 lexer.c |   57 ++++++++++++++++++++++++++++++++++-----------------------
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/src/0c.c b/src/0c.c
index e5199a7..0af9caa 100644
--- a/src/0c.c
+++ b/src/0c.c
@@ -42,6 +42,9 @@ typedef struct cli_opts
 void
 print_usage(FILE *stream, char *prog);
 
+static void
+print_token(char *file_path, token_t *token);
+
 string_view_t
 read_entire_file(char *file_path);
 
@@ -80,13 +83,10 @@ main(int argc, char **argv)
     token_t token = { 0 };
     lexer_next_token(&lexer, &token);
     while (token.kind != TOKEN_EOF) {
-        printf("%s:%lu:%lu: <%s>\n",
-               opts.file_path,
-               token.location.row + 1,
-               (token.location.offset - token.location.bol) + 1,
-               token_kind_to_cstr(token.kind));
+        print_token(opts.file_path, &token);
         lexer_next_token(&lexer, &token);
     }
+    print_token(opts.file_path, &token);
 
     free(file_content.chars);
 
@@ -136,3 +136,13 @@ read_entire_file(char *file_path)
 
     return file_content;
 }
+
+static void
+print_token(char *file_path, token_t *token)
+{
+    printf("%s:%lu:%lu: <%s>\n",
+           file_path,
+           token->location.row + 1,
+           (token->location.offset - token->location.bol) + 1,
+           token_kind_to_cstr(token->kind));
+}
diff --git a/src/lexer.c b/src/lexer.c
index 544a54d..b107762 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -31,7 +31,7 @@ lexer_init(lexer_t *lexer, string_view_t source)
 }
 
 static char
-lexer_next_char(lexer_t *lexer);
+lexer_current_char(lexer_t *lexer);
 
 static void
 lexer_skip_char(lexer_t *lexer);
@@ -46,10 +46,13 @@ static bool
 _isspace(char c);
 
 static void
-lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind);
+lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind);
 
 static void
-lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset);
+lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset);
+
+static void
+lexer_init_eof_token(lexer_t *lexer, token_t *token);
 
 static token_kind_t
 lexer_str_to_token_kind(string_view_t text);
@@ -58,16 +61,16 @@ void
 lexer_next_token(lexer_t *lexer, token_t *token)
 {
     if (lexer_is_eof(lexer)) {
-        *token = (token_t){ .kind = TOKEN_EOF };
+        lexer_init_eof_token(lexer, token);
         return;
     }
 
-    char current_char = lexer_next_char(lexer);
+    char current_char = lexer_current_char(lexer);
 
     if (_isspace(current_char)) {
         while (_isspace(current_char) && lexer_is_not_eof(lexer)) {
             lexer_skip_char(lexer);
-            current_char = lexer_next_char(lexer);
+            current_char = lexer_current_char(lexer);
         }
     }
 
@@ -76,12 +79,12 @@ lexer_next_token(lexer_t *lexer, token_t *token)
             size_t start_offset = lexer->offset;
             while (isalnum(current_char) && lexer_is_not_eof(lexer)) {
                 lexer_skip_char(lexer);
-                current_char = lexer_next_char(lexer);
+                current_char = lexer_current_char(lexer);
             }
 
             string_view_t text = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset };
 
-            lexer_init_str_token(lexer, token, lexer_str_to_token_kind(text), start_offset);
+            lexer_init_str_value_token(lexer, token, lexer_str_to_token_kind(text), start_offset);
             return;
         }
 
@@ -89,46 +92,46 @@ lexer_next_token(lexer_t *lexer, token_t *token)
             size_t start_offset = lexer->offset;
             while (isdigit(current_char) && lexer_is_not_eof(lexer)) {
                 lexer_skip_char(lexer);
-                current_char = lexer_next_char(lexer);
+                current_char = lexer_current_char(lexer);
             }
 
-            lexer_init_str_token(lexer, token, TOKEN_NUMBER, start_offset);
+            lexer_init_str_value_token(lexer, token, TOKEN_NUMBER, start_offset);
             return;
         }
 
         switch (current_char) {
             case '(': {
-                lexer_init_char_token(lexer, token, TOKEN_OPAREN);
+                lexer_init_char_value_token(lexer, token, TOKEN_OPAREN);
                 lexer_skip_char(lexer);
                 return;
             }
             case ')': {
-                lexer_init_char_token(lexer, token, TOKEN_CPAREN);
+                lexer_init_char_value_token(lexer, token, TOKEN_CPAREN);
                 lexer_skip_char(lexer);
                 return;
             }
             case ':': {
-                lexer_init_char_token(lexer, token, TOKEN_COLON);
+                lexer_init_char_value_token(lexer, token, TOKEN_COLON);
                 lexer_skip_char(lexer);
                 return;
             }
             case '{': {
-                lexer_init_char_token(lexer, token, TOKEN_OCURLY);
+                lexer_init_char_value_token(lexer, token, TOKEN_OCURLY);
                 lexer_skip_char(lexer);
                 return;
             }
             case '}': {
-                lexer_init_char_token(lexer, token, TOKEN_CCURLY);
+                lexer_init_char_value_token(lexer, token, TOKEN_CCURLY);
                 lexer_skip_char(lexer);
                 return;
             }
             case '\n': {
-                lexer_init_char_token(lexer, token, TOKEN_LF);
+                lexer_init_char_value_token(lexer, token, TOKEN_LF);
                 lexer_skip_char(lexer);
                 return;
             }
             default: {
-                lexer_init_char_token(lexer, token, TOKEN_UNKNOWN);
+                lexer_init_char_value_token(lexer, token, TOKEN_UNKNOWN);
                 lexer_skip_char(lexer);
                 return;
             }
@@ -136,7 +139,7 @@ lexer_next_token(lexer_t *lexer, token_t *token)
     }
 
     if (lexer_is_eof(lexer)) {
-        *token = (token_t){ .kind = TOKEN_EOF };
+        lexer_init_eof_token(lexer, token);
         return;
     }
 }
@@ -158,7 +161,7 @@ token_kind_to_cstr(token_kind_t kind)
 }
 
 static char
-lexer_next_char(lexer_t *lexer)
+lexer_current_char(lexer_t *lexer)
 {
     return lexer->source.chars[lexer->offset];
 }
@@ -167,7 +170,7 @@ static void
 lexer_skip_char(lexer_t *lexer)
 {
     assert(lexer->offset < lexer->source.size);
-    if (lexer->source.chars[lexer->offset] == '\n') {
+    if (lexer_current_char(lexer) == '\n') {
         lexer->row++;
         lexer->bol = ++lexer->offset;
     } else {
@@ -190,11 +193,11 @@ lexer_is_not_eof(lexer_t *lexer)
 static bool
 _isspace(char c)
 {
-    return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v';
+    return c != '\n' && isspace(c);
 }
 
 static void
-lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind)
+lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind)
 {
     string_view_t str = { .chars = lexer->source.chars + lexer->offset, .size = 1 };
     token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol };
@@ -202,13 +205,21 @@ lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind)
 }
 
 static void
-lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset)
+lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset)
 {
     string_view_t str = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset };
     token_loc_t location = { .offset = start_offset, .row = lexer->row, .bol = lexer->bol };
     *token = (token_t){ .kind = kind, .value = str, .location = location };
 }
 
+static void
+lexer_init_eof_token(lexer_t *lexer, token_t *token)
+{
+    string_view_t str = { 0 };
+    token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol };
+    *token = (token_t){ .kind = TOKEN_EOF, .value = str, .location = location };
+}
+
 static token_kind_t
 lexer_str_to_token_kind(string_view_t text)
 {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command
  2024-02-19 19:51     ` Johnny Richard
@ 2024-02-19 19:17       ` Carlos Maniero
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Maniero @ 2024-02-19 19:17 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

Great work!

Can you share a v4? So then the pipeline will be triggered.

Thank you!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command
  2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
  2024-02-19  0:47   ` [olang/patches/.build.yml] build success builds.sr.ht
  2024-02-19  3:30   ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero
@ 2024-02-19 10:01   ` Carlos Maniero
  2 siblings, 0 replies; 9+ messages in thread
From: Carlos Maniero @ 2024-02-19 10:01 UTC (permalink / raw)
  To: Johnny Richard, ~johnnyrichard/olang-devel

I'm sending here the integration tests of the --dump-tokens, in case you
want to send a new patch with one less TODO.

But feel free to ignore this message, I can send a new patch after this one is
merged.

Just to let you know the cli_runner was extended to intercept the program
stdout.

-- >8 --
 tests/integration/cli_runner.c | 47 ++++++++++++++++++++++++++++++----
 tests/integration/cli_runner.h |  1 +
 tests/integration/cli_test.c   | 14 ++++++++++
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/tests/integration/cli_runner.c b/tests/integration/cli_runner.c
index 0531bcc..7e4fe9a 100644
--- a/tests/integration/cli_runner.c
+++ b/tests/integration/cli_runner.c
@@ -20,6 +20,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/wait.h>
 #include <unistd.h>
 
 #define OLANG_COMPILER_PATH "../../0c"
@@ -62,16 +63,52 @@ create_tmp_file_name(char *file_name)
 }
 
 cli_result_t
-cli_runner_compiler_dump_tokens(char *src)
+cli_runner_compiler(char *src, char *args[])
 {
     assert_compiler_exists();
 
-    cli_result_t result;
+    cli_result_t result = { 0 };
     create_tmp_file_name(result.program_path);
 
-    char command[1024];
-    sprintf(command, "%s %s --dump-tokens", OLANG_COMPILER_PATH, src);
+    int fd_link[2];
+
+    if (pipe(fd_link) == -1) {
+        perror("pipe error.");
+        exit(1);
+    }
+
+    pid_t pid = fork();
+
+    if (pid == -1) {
+        perror("fork error.");
+        exit(1);
+    }
+
+    if (pid == 0) {
+        dup2(fd_link[1], STDOUT_FILENO);
+        close(fd_link[0]);
+        close(fd_link[1]);
+
+        execv(OLANG_COMPILER_PATH, args);
+        perror("execl error.");
+        exit(127);
+    } else {
+        close(fd_link[1]);
+        if (read(fd_link[0], result.compiler_output, sizeof(result.compiler_output)) == -1) {
+            perror("read error.");
+            exit(1);
+        }
+        int status;
+        waitpid(pid, &status, 0);
+        result.exit_code = WEXITSTATUS(status);
+    }
 
-    result.exit_code = system(command);
     return result;
 }
+
+cli_result_t
+cli_runner_compiler_dump_tokens(char *src)
+{
+    char *program_args[] = { "0c", "--dump-tokens", src, NULL };
+    return cli_runner_compiler(src, program_args);
+}
diff --git a/tests/integration/cli_runner.h b/tests/integration/cli_runner.h
index 8f4d69a..7ce4e7b 100644
--- a/tests/integration/cli_runner.h
+++ b/tests/integration/cli_runner.h
@@ -20,6 +20,7 @@ typedef struct cli_result_t
 {
     int exit_code;
     char program_path[255];
+    char compiler_output[1024];
 } cli_result_t;
 
 cli_result_t
diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c
index ce2ed91..1fd70c7 100644
--- a/tests/integration/cli_test.c
+++ b/tests/integration/cli_test.c
@@ -23,6 +23,20 @@ test_cli_hello_file(const MunitParameter params[], void *user_data_or_fixture)
 {
     cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.0");
     munit_assert_int(compilation_result.exit_code, ==, 0);
+    munit_assert_string_equal(compilation_result.compiler_output,
+                              "../../examples/main_exit.0:1:1: <fn>\n"
+                              "../../examples/main_exit.0:1:4: <identifier>\n"
+                              "../../examples/main_exit.0:1:8: <(>\n"
+                              "../../examples/main_exit.0:1:9: <)>\n"
+                              "../../examples/main_exit.0:1:10: <:>\n"
+                              "../../examples/main_exit.0:1:12: <identifier>\n"
+                              "../../examples/main_exit.0:1:16: <{>\n"
+                              "../../examples/main_exit.0:1:17: <line_feed>\n"
+                              "../../examples/main_exit.0:2:3: <return>\n"
+                              "../../examples/main_exit.0:2:10: <number>\n"
+                              "../../examples/main_exit.0:2:11: <line_feed>\n"
+                              "../../examples/main_exit.0:3:1: <}>\n"
+                              "../../examples/main_exit.0:3:2: <line_feed>\n");
     return MUNIT_OK;
 }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH olang v3 0/2] Create --dump-tokens on compiler cli
  2024-02-19  1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
  2024-02-19  1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard
  2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
@ 2024-02-19 21:07 ` Johnny Richard
  2 siblings, 0 replies; 9+ messages in thread
From: Johnny Richard @ 2024-02-19 21:07 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel

Patchset SUPERSEDED by v4

Link: https://lists.sr.ht/~johnnyrichard/olang-devel/%3C20240219210541.25624-1-johnny%40johnnyrichard.com%3E


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-19 20:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19  1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
2024-02-19  1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard
2024-02-19  1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
2024-02-19  0:47   ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-19  3:30   ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero
2024-02-19 19:51     ` Johnny Richard
2024-02-19 19:17       ` Carlos Maniero
2024-02-19 10:01   ` Carlos Maniero
2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox