From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id kIF3HHJU92VUiAAA62LTzQ:P1 (envelope-from ) for ; Sun, 17 Mar 2024 21:37:06 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id kIF3HHJU92VUiAAA62LTzQ (envelope-from ) for ; Sun, 17 Mar 2024 21:37:06 +0100 X-Envelope-To: patches@johnnyrichard.com Authentication-Results: aspmx1.migadu.com; none Received: from mail-a.sr.ht (mail-a.sr.ht [46.23.81.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 66127586AF for ; Sun, 17 Mar 2024 21:37:06 +0100 (CET) DKIM-Signature: a=rsa-sha256; bh=Qn12q62FWkQuWloODOLDIVgy6E3Kc7vm/utByFe6UJw=; c=simple/simple; d=lists.sr.ht; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-Unsubscribe:List-Subscribe:List-Archive:List-Post:List-ID; q=dns/txt; s=20240113; t=1710707826; v=1; b=PYUQHBgw5nVw715kOMkyleeH73D+jVFWCMj58JjUCnMCjJSgfaJJ/uCraJcl3uLC86QGV6pI EA2PgcTwcGoM3mycOqaBwOlnzv5soHUMmHxTAT0TAkvsrRNRQHYD/0JtlNE20yNvJ/A+uMOklJf PC/N5ASvrueKfuu/iP8CdYm7hil+BNyi8q11auTmYVVNKHaODqVgyrGkLp+iA2SyhjnMivUXIDt MAUBM+Djn6FNRGG3VeztjUEykj9VoNLuIq/dw8wYGGtiuD3pxh6QijoD3sQnPfAieP7Vqhfb2ag MmilG+jAGnDd1I27g3NCycx5WMls6e0C2s67tkv5F4HQw== Received: from lists.sr.ht (unknown [46.23.81.154]) by mail-a.sr.ht (Postfix) with ESMTPSA id 36F8820130 for ; Sun, 17 Mar 2024 20:37:06 +0000 (UTC) Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [IPv6:2001:41d0:1004:224b::b5]) by mail-a.sr.ht (Postfix) with ESMTPS id 95FBD20110 for <~johnnyrichard/olang-devel@lists.sr.ht>; Sun, 17 Mar 2024 20:37:05 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1710707825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4tb1v2sK5TGv/QpXHblxqD4iPG7zwvB5qmTkZMrEVmA=; b=pl2KBW8phgw57v2e7V1ZyXaAjKh1Do/6YzlBMErOpxZlpcUy2CMnaba4If4HpZ5xE3iz7r m/3YqOyFn8V2T9fKKfClOU3S/BYzpbT520YR7yAmES6PEU0meW7oxCoeE+3/wTa82FRqQf PFRBRT+PJ3jS5axdTOgc6aWgSUYiDxd4eNW8RvIa4deTOhQAyAb7cCtOm9l7xYwJBGYCS5 wKkz0K83KFZ4lybsYZMzLWd0wCkQmu9RhT4A2TqVGVV4cP/cElukLcL8953DJ97G/wP2iQ SXmQ+6iX+y9SxW+5AjjiNe1YCBl0zS77SpTBxXNsaue1LPSp1sOq1Qym71QLNQ== From: Johnny Richard To: ~johnnyrichard/olang-devel@lists.sr.ht Cc: Johnny Richard Subject: [PATCH olang v2 1/3] lexer: add tokenize support to binary op tokens Date: Sun, 17 Mar 2024 22:29:22 +0100 Message-ID: <20240317213638.131057-2-johnny@johnnyrichard.com> In-Reply-To: <20240317213638.131057-1-johnny@johnnyrichard.com> References: <20240317213638.131057-1-johnny@johnnyrichard.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Sourcehut-Patchset-Status: UNKNOWN List-Unsubscribe: List-Subscribe: List-Archive: Archived-At: List-Post: List-ID: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel.lists.sr.ht> Sender: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel@lists.sr.ht> X-Migadu-Flow: FLOW_IN X-Migadu-Country: NL X-Migadu-Spam-Score: -4.00 X-Spam-Score: -4.00 X-Migadu-Queue-Id: 66127586AF X-Migadu-Scanner: mx11.migadu.com X-TUID: 0PH44BB3l23T In order to parse token cmp not equals I also added the unary not token. Signed-off-by: Johnny Richard --- v2: Add support to tokenize every binary operation tokens examples/expression.ol | 3 + src/lexer.c | 182 ++++++++++++++++++++++++++++++++-- src/lexer.h | 26 +++++ tests/integration/cli_test.c | 56 ++++++++++- tests/integration/proc_exec.h | 3 +- 5 files changed, 261 insertions(+), 9 deletions(-) create mode 100644 examples/expression.ol diff --git a/examples/expression.ol b/examples/expression.ol new file mode 100644 index 0000000..efa4ab5 --- /dev/null +++ b/examples/expression.ol @@ -0,0 +1,3 @@ +fn main(): u32 { + return (10 + 1 * 2) - (10 - (1 + 1) / 2) +} diff --git a/src/lexer.c b/src/lexer.c index dd6f11d..14c2962 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -37,6 +37,9 @@ lexer_current_char(lexer_t *lexer); static void lexer_skip_char(lexer_t *lexer); +static char +lexer_peek_next_char(lexer_t *lexer); + static bool lexer_is_eof(lexer_t *lexer); @@ -101,6 +104,118 @@ lexer_next_token(lexer_t *lexer, token_t *token) } switch (current_char) { + case '=': { + size_t start_offset = lexer->offset; + + if (lexer_peek_next_char(lexer) == '=') { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset); + return; + } + + lexer_init_char_value_token(lexer, token, TOKEN_EQ); + lexer_skip_char(lexer); + return; + } + case '!': { + size_t start_offset = lexer->offset; + + if (lexer_peek_next_char(lexer) == '=') { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_NEQ, start_offset); + return; + } + + lexer_init_char_value_token(lexer, token, TOKEN_BANG); + lexer_skip_char(lexer); + return; + } + case '&': { + size_t start_offset = lexer->offset; + + if (lexer_peek_next_char(lexer) == '&') { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_AND, start_offset); + return; + } + + lexer_init_char_value_token(lexer, token, TOKEN_AND); + lexer_skip_char(lexer); + return; + } + case '|': { + size_t start_offset = lexer->offset; + + if (lexer_peek_next_char(lexer) == '|') { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_OR, start_offset); + return; + } + + lexer_init_char_value_token(lexer, token, TOKEN_PIPE); + lexer_skip_char(lexer); + return; + } + case '<': { + size_t start_offset = lexer->offset; + + switch (lexer_peek_next_char(lexer)) { + case '<': { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_LSHIFT, start_offset); + return; + } + case '=': { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_LEQ, start_offset); + return; + } + default: { + lexer_init_char_value_token(lexer, token, TOKEN_LT); + lexer_skip_char(lexer); + return; + } + } + } + case '>': { + size_t start_offset = lexer->offset; + + switch (lexer_peek_next_char(lexer)) { + case '>': { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_RSHIFT, start_offset); + return; + } + case '=': { + lexer_skip_char(lexer); + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_GEQ, start_offset); + return; + } + default: { + lexer_init_char_value_token(lexer, token, TOKEN_GT); + lexer_skip_char(lexer); + return; + } + } + } + case '^': { + lexer_init_char_value_token(lexer, token, TOKEN_CIRCUMFLEX); + lexer_skip_char(lexer); + return; + } + case '%': { + lexer_init_char_value_token(lexer, token, TOKEN_PERCENT); + lexer_skip_char(lexer); + return; + } case '(': { lexer_init_char_value_token(lexer, token, TOKEN_OPAREN); lexer_skip_char(lexer); @@ -126,6 +241,26 @@ lexer_next_token(lexer_t *lexer, token_t *token) lexer_skip_char(lexer); return; } + case '+': { + lexer_init_char_value_token(lexer, token, TOKEN_PLUS); + lexer_skip_char(lexer); + return; + } + case '-': { + lexer_init_char_value_token(lexer, token, TOKEN_DASH); + lexer_skip_char(lexer); + return; + } + case '*': { + lexer_init_char_value_token(lexer, token, TOKEN_STAR); + lexer_skip_char(lexer); + return; + } + case '/': { + lexer_init_char_value_token(lexer, token, TOKEN_SLASH); + lexer_skip_char(lexer); + return; + } case '\n': { lexer_init_char_value_token(lexer, token, TOKEN_LF); lexer_skip_char(lexer); @@ -146,12 +281,38 @@ lexer_next_token(lexer_t *lexer, token_t *token) } static char *token_kind_str_table[] = { - [TOKEN_UNKNOWN] = "unknown", [TOKEN_IDENTIFIER] = "identifier", - [TOKEN_NUMBER] = "number", [TOKEN_FN] = "fn", - [TOKEN_RETURN] = "return", [TOKEN_LF] = "line_feed", - [TOKEN_OPAREN] = "(", [TOKEN_CPAREN] = ")", - [TOKEN_COLON] = ":", [TOKEN_OCURLY] = "{", - [TOKEN_CCURLY] = "}", [TOKEN_EOF] = "EOF", + [TOKEN_UNKNOWN] = "unknown", + [TOKEN_IDENTIFIER] = "identifier", + [TOKEN_NUMBER] = "number", + [TOKEN_FN] = "fn", + [TOKEN_RETURN] = "return", + [TOKEN_LF] = "line_feed", + [TOKEN_OPAREN] = "(", + [TOKEN_CPAREN] = ")", + [TOKEN_COLON] = ":", + [TOKEN_OCURLY] = "{", + [TOKEN_CCURLY] = "}", + [TOKEN_PLUS] = "+", + [TOKEN_DASH] = "-", + [TOKEN_STAR] = "*", + [TOKEN_SLASH] = "/", + [TOKEN_EQ] = "=", + [TOKEN_CMP_EQ] = "==", + [TOKEN_BANG] = "!", + [TOKEN_CMP_NEQ] = "!=", + [TOKEN_LT] = "<", + [TOKEN_GT] = ">", + [TOKEN_CMP_LEQ] = "<=", + [TOKEN_CMP_GEQ] = ">=", + [TOKEN_PERCENT] = "%", + [TOKEN_BITWISE_LSHIFT] = "<<", + [TOKEN_BITWISE_RSHIFT] = ">>", + [TOKEN_CIRCUMFLEX] = "^", + [TOKEN_PIPE] = "|", + [TOKEN_LOGICAL_OR] = "||", + [TOKEN_AND] = "&", + [TOKEN_LOGICAL_AND] = "&&", + [TOKEN_EOF] = "EOF", }; char * @@ -167,6 +328,15 @@ lexer_current_char(lexer_t *lexer) return lexer->source.chars[lexer->offset]; } +static char +lexer_peek_next_char(lexer_t *lexer) +{ + if (lexer->offset + 1 >= lexer->source.size) { + return 0; + } + return lexer->source.chars[lexer->offset + 1]; +} + static void lexer_skip_char(lexer_t *lexer) { diff --git a/src/lexer.h b/src/lexer.h index cb91d7e..5ed777b 100644 --- a/src/lexer.h +++ b/src/lexer.h @@ -39,7 +39,33 @@ typedef enum token_kind TOKEN_FN, TOKEN_RETURN, + // Equality operators + TOKEN_CMP_EQ, + TOKEN_CMP_NEQ, + TOKEN_CMP_LEQ, + TOKEN_CMP_GEQ, + + // Logical Operators + TOKEN_LOGICAL_OR, + TOKEN_LOGICAL_AND, + + // Bitwise Operators + TOKEN_BITWISE_LSHIFT, + TOKEN_BITWISE_RSHIFT, + // Single char + TOKEN_BANG, + TOKEN_GT, + TOKEN_LT, + TOKEN_PERCENT, + TOKEN_AND, + TOKEN_PIPE, + TOKEN_CIRCUMFLEX, + TOKEN_EQ, + TOKEN_PLUS, + TOKEN_DASH, + TOKEN_SLASH, + TOKEN_STAR, TOKEN_LF, TOKEN_OPAREN, TOKEN_CPAREN, diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c index 8cc22f9..d46471b 100644 --- a/tests/integration/cli_test.c +++ b/tests/integration/cli_test.c @@ -20,7 +20,7 @@ #include static MunitResult -test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture) +test_cli_dump_tokens_example_main_exit(const MunitParameter params[], void *user_data_or_fixture) { cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.ol"); munit_assert_int(compilation_result.exec.exit_code, ==, 0); @@ -42,6 +42,47 @@ test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture) return MUNIT_OK; } +static MunitResult +test_cli_dump_tokens_example_expression(const MunitParameter params[], void *user_data_or_fixture) +{ + cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/expression.ol"); + munit_assert_int(compilation_result.exec.exit_code, ==, 0); + munit_assert_string_equal(compilation_result.exec.stdout_buf, + "../../examples/expression.ol:1:1: \n" + "../../examples/expression.ol:1:4: \n" + "../../examples/expression.ol:1:8: <(>\n" + "../../examples/expression.ol:1:9: <)>\n" + "../../examples/expression.ol:1:10: <:>\n" + "../../examples/expression.ol:1:12: \n" + "../../examples/expression.ol:1:16: <{>\n" + "../../examples/expression.ol:1:17: \n" + "../../examples/expression.ol:2:3: \n" + "../../examples/expression.ol:2:10: <(>\n" + "../../examples/expression.ol:2:11: \n" + "../../examples/expression.ol:2:14: <+>\n" + "../../examples/expression.ol:2:16: \n" + "../../examples/expression.ol:2:18: <*>\n" + "../../examples/expression.ol:2:20: \n" + "../../examples/expression.ol:2:21: <)>\n" + "../../examples/expression.ol:2:23: <->\n" + "../../examples/expression.ol:2:25: <(>\n" + "../../examples/expression.ol:2:26: \n" + "../../examples/expression.ol:2:29: <->\n" + "../../examples/expression.ol:2:31: <(>\n" + "../../examples/expression.ol:2:32: \n" + "../../examples/expression.ol:2:34: <+>\n" + "../../examples/expression.ol:2:36: \n" + "../../examples/expression.ol:2:37: <)>\n" + "../../examples/expression.ol:2:39: \n" + "../../examples/expression.ol:2:41: \n" + "../../examples/expression.ol:2:42: <)>\n" + "../../examples/expression.ol:2:43: \n" + "../../examples/expression.ol:3:1: <}>\n" + "../../examples/expression.ol:3:2: \n" + "../../examples/expression.ol:4:1: \n"); + return MUNIT_OK; +} + static MunitResult test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_or_fixture) { @@ -62,7 +103,18 @@ test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_ } static MunitTest tests[] = { - { "/test_cli_dump_tokens", test_cli_dump_tokens, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }, + { "/test_cli_dump_tokens_example_main_exit", + test_cli_dump_tokens_example_main_exit, + NULL, + NULL, + MUNIT_TEST_OPTION_NONE, + NULL }, + { "/test_cli_dump_tokens_example_expression", + test_cli_dump_tokens_example_expression, + NULL, + NULL, + MUNIT_TEST_OPTION_NONE, + NULL }, { "/test_cli_compile_minimal_program", test_cli_compile_minimal_program, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }, { NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL } }; diff --git a/tests/integration/proc_exec.h b/tests/integration/proc_exec.h index 135aa6a..45c2977 100644 --- a/tests/integration/proc_exec.h +++ b/tests/integration/proc_exec.h @@ -21,7 +21,8 @@ typedef struct proc_exec_result { int exit_code; - char stdout_buf[1024]; + // FIXME: output buffer shouldn't be fixed size + char stdout_buf[2048]; } proc_exec_result_t; typedef struct proc_exec_command -- 2.44.0