* [PATCH olang v3 0/2] Create --dump-tokens on compiler cli @ 2024-02-19 1:38 Johnny Richard 2024-02-19 1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Johnny Richard @ 2024-02-19 1:38 UTC (permalink / raw) To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard This patchset creates the lexer and leave 3 TODO for man documents and test (unit / integration). Johnny Richard (2): utils: create string_view data structure lexer: create --dump-tokens cli command .gitignore | 1 + examples/main_exit.0 | 3 + src/0c.c | 121 +++++++++++++++++- src/lexer.c | 224 +++++++++++++++++++++++++++++++++ src/lexer.h | 74 +++++++++++ src/string_view.c | 35 ++++++ src/string_view.h | 34 +++++ tests/integration/cli_runner.c | 4 +- tests/integration/cli_runner.h | 2 +- tests/integration/cli_test.c | 2 +- 10 files changed, 494 insertions(+), 6 deletions(-) create mode 100644 examples/main_exit.0 create mode 100644 src/lexer.c create mode 100644 src/lexer.h create mode 100644 src/string_view.c create mode 100644 src/string_view.h -- 2.43.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH olang v3 1/2] utils: create string_view data structure 2024-02-19 1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard @ 2024-02-19 1:38 ` Johnny Richard 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard 2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard 2 siblings, 0 replies; 9+ messages in thread From: Johnny Richard @ 2024-02-19 1:38 UTC (permalink / raw) To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> --- src/string_view.c | 35 +++++++++++++++++++++++++++++++++++ src/string_view.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 src/string_view.c create mode 100644 src/string_view.h diff --git a/src/string_view.c b/src/string_view.c new file mode 100644 index 0000000..122eaa2 --- /dev/null +++ b/src/string_view.c @@ -0,0 +1,35 @@ +/* + * Copyright (C) 2024 olang maintainers + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <https://www.gnu.org/licenses/>. + */ +#include "string_view.h" + +#include <stdbool.h> +#include <string.h> + +bool +string_view_eq_to_cstr(string_view_t str, char *cstr) +{ + size_t cstr_len = strlen(cstr); + if (str.size != cstr_len) { + return false; + } + + size_t i = 0; + while (i < cstr_len && str.chars[i] == cstr[i]) { + i++; + } + return i == cstr_len; +} diff --git a/src/string_view.h b/src/string_view.h new file mode 100644 index 0000000..367ef6b --- /dev/null +++ b/src/string_view.h @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2024 olang maintainers + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <https://www.gnu.org/licenses/>. + */ +#ifndef STRING_VIEW_T +#define STRING_VIEW_T + +#include <stdbool.h> +#include <stddef.h> + +typedef struct string_view +{ + char *chars; + size_t size; + +} string_view_t; + +// TODO: missing unit test +bool +string_view_eq_to_cstr(string_view_t str, char *cstr); + +#endif /* STRING_VIEW_T */ -- 2.43.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH olang v3 2/2] lexer: create --dump-tokens cli command 2024-02-19 1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard 2024-02-19 1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard @ 2024-02-19 1:44 ` Johnny Richard 2024-02-19 0:47 ` [olang/patches/.build.yml] build success builds.sr.ht ` (2 more replies) 2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard 2 siblings, 3 replies; 9+ messages in thread From: Johnny Richard @ 2024-02-19 1:44 UTC (permalink / raw) To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard This patch introduces the dump tokens interface and create the initial setup for lexical analysis. Signed-off-by: Johnny Richard <johnny@johnnyrichard.com> --- Changes: - V2 fix linter - V3 fix integration tests .gitignore | 1 + examples/main_exit.0 | 3 + src/0c.c | 121 +++++++++++++++++- src/lexer.c | 224 +++++++++++++++++++++++++++++++++ src/lexer.h | 74 +++++++++++ tests/integration/cli_runner.c | 4 +- tests/integration/cli_runner.h | 2 +- tests/integration/cli_test.c | 2 +- 8 files changed, 425 insertions(+), 6 deletions(-) create mode 100644 examples/main_exit.0 create mode 100644 src/lexer.c create mode 100644 src/lexer.h diff --git a/.gitignore b/.gitignore index fe64668..92496d7 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,4 @@ build *.o docs/site.tar.gz +tests/integration/*_test diff --git a/examples/main_exit.0 b/examples/main_exit.0 new file mode 100644 index 0000000..c86fc68 --- /dev/null +++ b/examples/main_exit.0 @@ -0,0 +1,3 @@ +fn main(): u32 { + return 0 +} diff --git a/src/0c.c b/src/0c.c index 33ac945..e5199a7 100644 --- a/src/0c.c +++ b/src/0c.c @@ -14,8 +14,125 @@ * You should have received a copy of the GNU General Public License * along with this program. If not, see <https://www.gnu.org/licenses/>. */ +#include <errno.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "lexer.h" +#include "string_view.h" + +typedef struct cli_args +{ + int argc; + char **argv; +} cli_args_t; + +char * +cli_args_shift(cli_args_t *args); + +typedef struct cli_opts +{ + // TODO: create man page instruction for --dump-tokens option + bool dump_tokens; + char *file_path; +} cli_opts_t; + +void +print_usage(FILE *stream, char *prog); + +string_view_t +read_entire_file(char *file_path); + int -main(void) +main(int argc, char **argv) +{ + cli_args_t args = { .argc = argc, .argv = argv }; + cli_opts_t opts = { 0 }; + + char *prog = cli_args_shift(&args); + + if (argc != 3) { + print_usage(stderr, prog); + return EXIT_FAILURE; + } + + for (char *arg = cli_args_shift(&args); arg != NULL; arg = cli_args_shift(&args)) { + if (strcmp(arg, "--dump-tokens") == 0) { + opts.dump_tokens = true; + } else { + opts.file_path = arg; + } + } + + if (!opts.dump_tokens) { + print_usage(stderr, prog); + return EXIT_FAILURE; + } + + string_view_t file_content = read_entire_file(opts.file_path); + + // TODO: missing integration test for lexer tokenizing + lexer_t lexer = { 0 }; + lexer_init(&lexer, file_content); + + token_t token = { 0 }; + lexer_next_token(&lexer, &token); + while (token.kind != TOKEN_EOF) { + printf("%s:%lu:%lu: <%s>\n", + opts.file_path, + token.location.row + 1, + (token.location.offset - token.location.bol) + 1, + token_kind_to_cstr(token.kind)); + lexer_next_token(&lexer, &token); + } + + free(file_content.chars); + + return EXIT_SUCCESS; +} + +char * +cli_args_shift(cli_args_t *args) +{ + if (args->argc == 0) + return NULL; + --(args->argc); + return *(args->argv)++; +} + +void +print_usage(FILE *stream, char *prog) +{ + fprintf(stream, "usage: %s <source.0> --dump-tokens\n", prog); +} + +string_view_t +read_entire_file(char *file_path) { - return 0; + FILE *stream = fopen(file_path, "rb"); + + if (stream == NULL) { + fprintf(stderr, "Could not open file %s: %s\n", file_path, strerror(errno)); + exit(EXIT_FAILURE); + } + + string_view_t file_content = { 0 }; + + fseek(stream, 0, SEEK_END); + file_content.size = ftell(stream); + fseek(stream, 0, SEEK_SET); + + file_content.chars = (char *)malloc(file_content.size); + + if (file_content.chars == NULL) { + fprintf(stderr, "Could not read file %s: %s\n", file_path, strerror(errno)); + exit(EXIT_FAILURE); + } + + fread(file_content.chars, 1, file_content.size, stream); + fclose(stream); + + return file_content; } diff --git a/src/lexer.c b/src/lexer.c new file mode 100644 index 0000000..544a54d --- /dev/null +++ b/src/lexer.c @@ -0,0 +1,224 @@ +/* + * Copyright (C) 2024 olang maintainers + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <https://www.gnu.org/licenses/>. + */ +#include "lexer.h" + +#include <assert.h> +#include <ctype.h> +#include <stdbool.h> + +void +lexer_init(lexer_t *lexer, string_view_t source) +{ + assert(lexer); + lexer->source = source; + lexer->offset = 0; + lexer->row = 0; + lexer->bol = 0; +} + +static char +lexer_next_char(lexer_t *lexer); + +static void +lexer_skip_char(lexer_t *lexer); + +static bool +lexer_is_eof(lexer_t *lexer); + +static bool +lexer_is_not_eof(lexer_t *lexer); + +static bool +_isspace(char c); + +static void +lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind); + +static void +lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset); + +static token_kind_t +lexer_str_to_token_kind(string_view_t text); + +void +lexer_next_token(lexer_t *lexer, token_t *token) +{ + if (lexer_is_eof(lexer)) { + *token = (token_t){ .kind = TOKEN_EOF }; + return; + } + + char current_char = lexer_next_char(lexer); + + if (_isspace(current_char)) { + while (_isspace(current_char) && lexer_is_not_eof(lexer)) { + lexer_skip_char(lexer); + current_char = lexer_next_char(lexer); + } + } + + while (lexer_is_not_eof(lexer)) { + if (isalpha(current_char)) { + size_t start_offset = lexer->offset; + while (isalnum(current_char) && lexer_is_not_eof(lexer)) { + lexer_skip_char(lexer); + current_char = lexer_next_char(lexer); + } + + string_view_t text = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; + + lexer_init_str_token(lexer, token, lexer_str_to_token_kind(text), start_offset); + return; + } + + if (isdigit(current_char)) { + size_t start_offset = lexer->offset; + while (isdigit(current_char) && lexer_is_not_eof(lexer)) { + lexer_skip_char(lexer); + current_char = lexer_next_char(lexer); + } + + lexer_init_str_token(lexer, token, TOKEN_NUMBER, start_offset); + return; + } + + switch (current_char) { + case '(': { + lexer_init_char_token(lexer, token, TOKEN_OPAREN); + lexer_skip_char(lexer); + return; + } + case ')': { + lexer_init_char_token(lexer, token, TOKEN_CPAREN); + lexer_skip_char(lexer); + return; + } + case ':': { + lexer_init_char_token(lexer, token, TOKEN_COLON); + lexer_skip_char(lexer); + return; + } + case '{': { + lexer_init_char_token(lexer, token, TOKEN_OCURLY); + lexer_skip_char(lexer); + return; + } + case '}': { + lexer_init_char_token(lexer, token, TOKEN_CCURLY); + lexer_skip_char(lexer); + return; + } + case '\n': { + lexer_init_char_token(lexer, token, TOKEN_LF); + lexer_skip_char(lexer); + return; + } + default: { + lexer_init_char_token(lexer, token, TOKEN_UNKNOWN); + lexer_skip_char(lexer); + return; + } + } + } + + if (lexer_is_eof(lexer)) { + *token = (token_t){ .kind = TOKEN_EOF }; + return; + } +} + +static char *token_kind_str_table[] = { + [TOKEN_UNKNOWN] = "unknown", [TOKEN_IDENTIFIER] = "identifier", + [TOKEN_NUMBER] = "number", [TOKEN_FN] = "fn", + [TOKEN_RETURN] = "return", [TOKEN_LF] = "line_feed", + [TOKEN_OPAREN] = "(", [TOKEN_CPAREN] = ")", + [TOKEN_COLON] = ":", [TOKEN_OCURLY] = "{", + [TOKEN_CCURLY] = "}", [TOKEN_EOF] = "EOF", +}; + +char * +token_kind_to_cstr(token_kind_t kind) +{ + assert(kind < sizeof(token_kind_str_table)); + return token_kind_str_table[kind]; +} + +static char +lexer_next_char(lexer_t *lexer) +{ + return lexer->source.chars[lexer->offset]; +} + +static void +lexer_skip_char(lexer_t *lexer) +{ + assert(lexer->offset < lexer->source.size); + if (lexer->source.chars[lexer->offset] == '\n') { + lexer->row++; + lexer->bol = ++lexer->offset; + } else { + lexer->offset++; + } +} + +static bool +lexer_is_eof(lexer_t *lexer) +{ + return lexer->offset >= lexer->source.size; +} + +static bool +lexer_is_not_eof(lexer_t *lexer) +{ + return !lexer_is_eof(lexer); +} + +static bool +_isspace(char c) +{ + return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v'; +} + +static void +lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind) +{ + string_view_t str = { .chars = lexer->source.chars + lexer->offset, .size = 1 }; + token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol }; + *token = (token_t){ .kind = kind, .value = str, .location = location }; +} + +static void +lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset) +{ + string_view_t str = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; + token_loc_t location = { .offset = start_offset, .row = lexer->row, .bol = lexer->bol }; + *token = (token_t){ .kind = kind, .value = str, .location = location }; +} + +static token_kind_t +lexer_str_to_token_kind(string_view_t text) +{ + if (string_view_eq_to_cstr(text, "return")) { + return TOKEN_RETURN; + } + + if (string_view_eq_to_cstr(text, "fn")) { + return TOKEN_FN; + } + + return TOKEN_IDENTIFIER; +} diff --git a/src/lexer.h b/src/lexer.h new file mode 100644 index 0000000..8c09e02 --- /dev/null +++ b/src/lexer.h @@ -0,0 +1,74 @@ +/* + * Copyright (C) 2024 olang maintainers + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <https://www.gnu.org/licenses/>. + */ +#ifndef LEXER_H +#define LEXER_H + +#include "string_view.h" +#include <stdint.h> + +typedef struct lexer +{ + string_view_t source; + size_t offset; + size_t row; + size_t bol; +} lexer_t; + +typedef enum token_kind +{ + TOKEN_UNKNOWN, + TOKEN_IDENTIFIER, + TOKEN_NUMBER, + + // Keywords + TOKEN_FN, + TOKEN_RETURN, + + // Single char + TOKEN_LF, + TOKEN_OPAREN, + TOKEN_CPAREN, + TOKEN_COLON, + TOKEN_OCURLY, + TOKEN_CCURLY, + TOKEN_EOF +} token_kind_t; + +typedef struct token_loc +{ + size_t offset; + size_t row; + size_t bol; +} token_loc_t; + +typedef struct token +{ + token_kind_t kind; + string_view_t value; + token_loc_t location; +} token_t; + +void +lexer_init(lexer_t *lexer, string_view_t source); + +void +lexer_next_token(lexer_t *lexer, token_t *token); + +char * +token_kind_to_cstr(token_kind_t kind); + +#endif /* LEXER_H */ diff --git a/tests/integration/cli_runner.c b/tests/integration/cli_runner.c index 4e0f7c4..0531bcc 100644 --- a/tests/integration/cli_runner.c +++ b/tests/integration/cli_runner.c @@ -62,7 +62,7 @@ create_tmp_file_name(char *file_name) } cli_result_t -cli_runner_compile_file(char *src) +cli_runner_compiler_dump_tokens(char *src) { assert_compiler_exists(); @@ -70,7 +70,7 @@ cli_runner_compile_file(char *src) create_tmp_file_name(result.program_path); char command[1024]; - sprintf(command, "%s -o %s %s", OLANG_COMPILER_PATH, result.program_path, src); + sprintf(command, "%s %s --dump-tokens", OLANG_COMPILER_PATH, src); result.exit_code = system(command); return result; diff --git a/tests/integration/cli_runner.h b/tests/integration/cli_runner.h index 5caa319..8f4d69a 100644 --- a/tests/integration/cli_runner.h +++ b/tests/integration/cli_runner.h @@ -23,5 +23,5 @@ typedef struct cli_result_t } cli_result_t; cli_result_t -cli_runner_compile_file(char *src); +cli_runner_compiler_dump_tokens(char *src); #endif diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c index c7a9557..ce2ed91 100644 --- a/tests/integration/cli_test.c +++ b/tests/integration/cli_test.c @@ -21,7 +21,7 @@ static MunitResult test_cli_hello_file(const MunitParameter params[], void *user_data_or_fixture) { - cli_result_t compilation_result = cli_runner_compile_file("../../examples/hello.olang"); + cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.0"); munit_assert_int(compilation_result.exit_code, ==, 0); return MUNIT_OK; } -- 2.43.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [olang/patches/.build.yml] build success 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard @ 2024-02-19 0:47 ` builds.sr.ht 2024-02-19 3:30 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero 2024-02-19 10:01 ` Carlos Maniero 2 siblings, 0 replies; 9+ messages in thread From: builds.sr.ht @ 2024-02-19 0:47 UTC (permalink / raw) To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel olang/patches/.build.yml: SUCCESS in 34s [Create --dump-tokens on compiler cli][0] v3 from [Johnny Richard][1] [0]: https://lists.sr.ht/~johnnyrichard/olang-devel/patches/49645 [1]: mailto:johnny@johnnyrichard.com ✓ #1153060 SUCCESS olang/patches/.build.yml https://builds.sr.ht/~johnnyrichard/job/1153060 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard 2024-02-19 0:47 ` [olang/patches/.build.yml] build success builds.sr.ht @ 2024-02-19 3:30 ` Carlos Maniero 2024-02-19 19:51 ` Johnny Richard 2024-02-19 10:01 ` Carlos Maniero 2 siblings, 1 reply; 9+ messages in thread From: Carlos Maniero @ 2024-02-19 3:30 UTC (permalink / raw) To: Johnny Richard, ~johnnyrichard/olang-devel Nice work man! I just have a few comments: > + while (token.kind != TOKEN_EOF) { > + printf("%s:%lu:%lu: <%s>\n", > + opts.file_path, > + token.location.row + 1, > + (token.location.offset - token.location.bol) + 1, > + token_kind_to_cstr(token.kind)); > + lexer_next_token(&lexer, &token); > + } IMO, EOF token should be printed to, as it is a token returned by the lexer. > + if (lexer_is_eof(lexer)) { > + *token = (token_t){ .kind = TOKEN_EOF }; > + return; > + } Missing token location. I know it seems silly to have the EOF position. But it is useful for parser error messages such as "expected } found EOF". Remember that this code appears twice, before and after the while. > +lexer_next_char(lexer_t *lexer) s/lexer_next_char/lexer_current_char the current name of the function give me the impression that it changes the offset. > + if (lexer->source.chars[lexer->offset] == '\n') { call lexer_next_char/lexer_current_char instead. > +static bool > +_isspace(char c) > +{ > + return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v'; > +} What do you think about just add the *\n* guard before calling the *isspace* that way it is clean for someone reading the code why you have to reimplement the function. return c != '\n' && isspace(c); > +static void > +lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind); > + > +static void > +lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset); > + > +static token_kind_t > +lexer_str_to_token_kind(string_view_t text); I don't have a suggestion to it, but IMO *lexer_init_char_token* and *lexer_init_str_token* makes me feel we are initializing a "string" and a 'char" token. But I haven't a better name, I thought calling it *lexer_init_single_char_token* and *lexer_init_multi_char_token* but IDK if it is really better. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command 2024-02-19 3:30 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero @ 2024-02-19 19:51 ` Johnny Richard 2024-02-19 19:17 ` Carlos Maniero 0 siblings, 1 reply; 9+ messages in thread From: Johnny Richard @ 2024-02-19 19:51 UTC (permalink / raw) To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel Thanks for review it. Here I have the changes you requested. Let me know if this changes are enough. Johnny Richard -------->8-------- Subject: fixup: review comments --- 0c.c | 20 +++++++++++++++----- lexer.c | 57 ++++++++++++++++++++++++++++++++++----------------------- 2 files changed, 49 insertions(+), 28 deletions(-) diff --git a/src/0c.c b/src/0c.c index e5199a7..0af9caa 100644 --- a/src/0c.c +++ b/src/0c.c @@ -42,6 +42,9 @@ typedef struct cli_opts void print_usage(FILE *stream, char *prog); +static void +print_token(char *file_path, token_t *token); + string_view_t read_entire_file(char *file_path); @@ -80,13 +83,10 @@ main(int argc, char **argv) token_t token = { 0 }; lexer_next_token(&lexer, &token); while (token.kind != TOKEN_EOF) { - printf("%s:%lu:%lu: <%s>\n", - opts.file_path, - token.location.row + 1, - (token.location.offset - token.location.bol) + 1, - token_kind_to_cstr(token.kind)); + print_token(opts.file_path, &token); lexer_next_token(&lexer, &token); } + print_token(opts.file_path, &token); free(file_content.chars); @@ -136,3 +136,13 @@ read_entire_file(char *file_path) return file_content; } + +static void +print_token(char *file_path, token_t *token) +{ + printf("%s:%lu:%lu: <%s>\n", + file_path, + token->location.row + 1, + (token->location.offset - token->location.bol) + 1, + token_kind_to_cstr(token->kind)); +} diff --git a/src/lexer.c b/src/lexer.c index 544a54d..b107762 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -31,7 +31,7 @@ lexer_init(lexer_t *lexer, string_view_t source) } static char -lexer_next_char(lexer_t *lexer); +lexer_current_char(lexer_t *lexer); static void lexer_skip_char(lexer_t *lexer); @@ -46,10 +46,13 @@ static bool _isspace(char c); static void -lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind); +lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind); static void -lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset); +lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset); + +static void +lexer_init_eof_token(lexer_t *lexer, token_t *token); static token_kind_t lexer_str_to_token_kind(string_view_t text); @@ -58,16 +61,16 @@ void lexer_next_token(lexer_t *lexer, token_t *token) { if (lexer_is_eof(lexer)) { - *token = (token_t){ .kind = TOKEN_EOF }; + lexer_init_eof_token(lexer, token); return; } - char current_char = lexer_next_char(lexer); + char current_char = lexer_current_char(lexer); if (_isspace(current_char)) { while (_isspace(current_char) && lexer_is_not_eof(lexer)) { lexer_skip_char(lexer); - current_char = lexer_next_char(lexer); + current_char = lexer_current_char(lexer); } } @@ -76,12 +79,12 @@ lexer_next_token(lexer_t *lexer, token_t *token) size_t start_offset = lexer->offset; while (isalnum(current_char) && lexer_is_not_eof(lexer)) { lexer_skip_char(lexer); - current_char = lexer_next_char(lexer); + current_char = lexer_current_char(lexer); } string_view_t text = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; - lexer_init_str_token(lexer, token, lexer_str_to_token_kind(text), start_offset); + lexer_init_str_value_token(lexer, token, lexer_str_to_token_kind(text), start_offset); return; } @@ -89,46 +92,46 @@ lexer_next_token(lexer_t *lexer, token_t *token) size_t start_offset = lexer->offset; while (isdigit(current_char) && lexer_is_not_eof(lexer)) { lexer_skip_char(lexer); - current_char = lexer_next_char(lexer); + current_char = lexer_current_char(lexer); } - lexer_init_str_token(lexer, token, TOKEN_NUMBER, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_NUMBER, start_offset); return; } switch (current_char) { case '(': { - lexer_init_char_token(lexer, token, TOKEN_OPAREN); + lexer_init_char_value_token(lexer, token, TOKEN_OPAREN); lexer_skip_char(lexer); return; } case ')': { - lexer_init_char_token(lexer, token, TOKEN_CPAREN); + lexer_init_char_value_token(lexer, token, TOKEN_CPAREN); lexer_skip_char(lexer); return; } case ':': { - lexer_init_char_token(lexer, token, TOKEN_COLON); + lexer_init_char_value_token(lexer, token, TOKEN_COLON); lexer_skip_char(lexer); return; } case '{': { - lexer_init_char_token(lexer, token, TOKEN_OCURLY); + lexer_init_char_value_token(lexer, token, TOKEN_OCURLY); lexer_skip_char(lexer); return; } case '}': { - lexer_init_char_token(lexer, token, TOKEN_CCURLY); + lexer_init_char_value_token(lexer, token, TOKEN_CCURLY); lexer_skip_char(lexer); return; } case '\n': { - lexer_init_char_token(lexer, token, TOKEN_LF); + lexer_init_char_value_token(lexer, token, TOKEN_LF); lexer_skip_char(lexer); return; } default: { - lexer_init_char_token(lexer, token, TOKEN_UNKNOWN); + lexer_init_char_value_token(lexer, token, TOKEN_UNKNOWN); lexer_skip_char(lexer); return; } @@ -136,7 +139,7 @@ lexer_next_token(lexer_t *lexer, token_t *token) } if (lexer_is_eof(lexer)) { - *token = (token_t){ .kind = TOKEN_EOF }; + lexer_init_eof_token(lexer, token); return; } } @@ -158,7 +161,7 @@ token_kind_to_cstr(token_kind_t kind) } static char -lexer_next_char(lexer_t *lexer) +lexer_current_char(lexer_t *lexer) { return lexer->source.chars[lexer->offset]; } @@ -167,7 +170,7 @@ static void lexer_skip_char(lexer_t *lexer) { assert(lexer->offset < lexer->source.size); - if (lexer->source.chars[lexer->offset] == '\n') { + if (lexer_current_char(lexer) == '\n') { lexer->row++; lexer->bol = ++lexer->offset; } else { @@ -190,11 +193,11 @@ lexer_is_not_eof(lexer_t *lexer) static bool _isspace(char c) { - return c == ' ' || c == '\f' || c == '\r' || c == '\t' || c == '\v'; + return c != '\n' && isspace(c); } static void -lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind) +lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind) { string_view_t str = { .chars = lexer->source.chars + lexer->offset, .size = 1 }; token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol }; @@ -202,13 +205,21 @@ lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind) } static void -lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset) +lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset) { string_view_t str = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; token_loc_t location = { .offset = start_offset, .row = lexer->row, .bol = lexer->bol }; *token = (token_t){ .kind = kind, .value = str, .location = location }; } +static void +lexer_init_eof_token(lexer_t *lexer, token_t *token) +{ + string_view_t str = { 0 }; + token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol }; + *token = (token_t){ .kind = TOKEN_EOF, .value = str, .location = location }; +} + static token_kind_t lexer_str_to_token_kind(string_view_t text) { ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command 2024-02-19 19:51 ` Johnny Richard @ 2024-02-19 19:17 ` Carlos Maniero 0 siblings, 0 replies; 9+ messages in thread From: Carlos Maniero @ 2024-02-19 19:17 UTC (permalink / raw) To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel Great work! Can you share a v4? So then the pipeline will be triggered. Thank you! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard 2024-02-19 0:47 ` [olang/patches/.build.yml] build success builds.sr.ht 2024-02-19 3:30 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero @ 2024-02-19 10:01 ` Carlos Maniero 2 siblings, 0 replies; 9+ messages in thread From: Carlos Maniero @ 2024-02-19 10:01 UTC (permalink / raw) To: Johnny Richard, ~johnnyrichard/olang-devel I'm sending here the integration tests of the --dump-tokens, in case you want to send a new patch with one less TODO. But feel free to ignore this message, I can send a new patch after this one is merged. Just to let you know the cli_runner was extended to intercept the program stdout. -- >8 -- tests/integration/cli_runner.c | 47 ++++++++++++++++++++++++++++++---- tests/integration/cli_runner.h | 1 + tests/integration/cli_test.c | 14 ++++++++++ 3 files changed, 57 insertions(+), 5 deletions(-) diff --git a/tests/integration/cli_runner.c b/tests/integration/cli_runner.c index 0531bcc..7e4fe9a 100644 --- a/tests/integration/cli_runner.c +++ b/tests/integration/cli_runner.c @@ -20,6 +20,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/wait.h> #include <unistd.h> #define OLANG_COMPILER_PATH "../../0c" @@ -62,16 +63,52 @@ create_tmp_file_name(char *file_name) } cli_result_t -cli_runner_compiler_dump_tokens(char *src) +cli_runner_compiler(char *src, char *args[]) { assert_compiler_exists(); - cli_result_t result; + cli_result_t result = { 0 }; create_tmp_file_name(result.program_path); - char command[1024]; - sprintf(command, "%s %s --dump-tokens", OLANG_COMPILER_PATH, src); + int fd_link[2]; + + if (pipe(fd_link) == -1) { + perror("pipe error."); + exit(1); + } + + pid_t pid = fork(); + + if (pid == -1) { + perror("fork error."); + exit(1); + } + + if (pid == 0) { + dup2(fd_link[1], STDOUT_FILENO); + close(fd_link[0]); + close(fd_link[1]); + + execv(OLANG_COMPILER_PATH, args); + perror("execl error."); + exit(127); + } else { + close(fd_link[1]); + if (read(fd_link[0], result.compiler_output, sizeof(result.compiler_output)) == -1) { + perror("read error."); + exit(1); + } + int status; + waitpid(pid, &status, 0); + result.exit_code = WEXITSTATUS(status); + } - result.exit_code = system(command); return result; } + +cli_result_t +cli_runner_compiler_dump_tokens(char *src) +{ + char *program_args[] = { "0c", "--dump-tokens", src, NULL }; + return cli_runner_compiler(src, program_args); +} diff --git a/tests/integration/cli_runner.h b/tests/integration/cli_runner.h index 8f4d69a..7ce4e7b 100644 --- a/tests/integration/cli_runner.h +++ b/tests/integration/cli_runner.h @@ -20,6 +20,7 @@ typedef struct cli_result_t { int exit_code; char program_path[255]; + char compiler_output[1024]; } cli_result_t; cli_result_t diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c index ce2ed91..1fd70c7 100644 --- a/tests/integration/cli_test.c +++ b/tests/integration/cli_test.c @@ -23,6 +23,20 @@ test_cli_hello_file(const MunitParameter params[], void *user_data_or_fixture) { cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.0"); munit_assert_int(compilation_result.exit_code, ==, 0); + munit_assert_string_equal(compilation_result.compiler_output, + "../../examples/main_exit.0:1:1: <fn>\n" + "../../examples/main_exit.0:1:4: <identifier>\n" + "../../examples/main_exit.0:1:8: <(>\n" + "../../examples/main_exit.0:1:9: <)>\n" + "../../examples/main_exit.0:1:10: <:>\n" + "../../examples/main_exit.0:1:12: <identifier>\n" + "../../examples/main_exit.0:1:16: <{>\n" + "../../examples/main_exit.0:1:17: <line_feed>\n" + "../../examples/main_exit.0:2:3: <return>\n" + "../../examples/main_exit.0:2:10: <number>\n" + "../../examples/main_exit.0:2:11: <line_feed>\n" + "../../examples/main_exit.0:3:1: <}>\n" + "../../examples/main_exit.0:3:2: <line_feed>\n"); return MUNIT_OK; } -- 2.34.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH olang v3 0/2] Create --dump-tokens on compiler cli 2024-02-19 1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard 2024-02-19 1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard @ 2024-02-19 21:07 ` Johnny Richard 2 siblings, 0 replies; 9+ messages in thread From: Johnny Richard @ 2024-02-19 21:07 UTC (permalink / raw) To: ~johnnyrichard/olang-devel Patchset SUPERSEDED by v4 Link: https://lists.sr.ht/~johnnyrichard/olang-devel/%3C20240219210541.25624-1-johnny%40johnnyrichard.com%3E ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-02-19 20:08 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-19 1:38 [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard 2024-02-19 1:38 ` [PATCH olang v3 1/2] utils: create string_view data structure Johnny Richard 2024-02-19 1:44 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard 2024-02-19 0:47 ` [olang/patches/.build.yml] build success builds.sr.ht 2024-02-19 3:30 ` [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Carlos Maniero 2024-02-19 19:51 ` Johnny Richard 2024-02-19 19:17 ` Carlos Maniero 2024-02-19 10:01 ` Carlos Maniero 2024-02-19 21:07 ` [PATCH olang v3 0/2] Create --dump-tokens on compiler cli Johnny Richard
Code repositories for project(s) associated with this public inbox https://git.johnnyrichard.com/olang.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox