From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id AFn5Fo/w92X4vwAA62LTzQ:P1 (envelope-from ) for ; Mon, 18 Mar 2024 08:43:11 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id AFn5Fo/w92X4vwAA62LTzQ (envelope-from ) for ; Mon, 18 Mar 2024 08:43:11 +0100 X-Envelope-To: patches@johnnyrichard.com Authentication-Results: aspmx1.migadu.com; none Received: from mail-a.sr.ht (mail-a.sr.ht [46.23.81.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 50E463FE9C for ; Mon, 18 Mar 2024 08:43:11 +0100 (CET) DKIM-Signature: a=rsa-sha256; bh=Y1x9F0mDjvtbfJuyZM5k2XLJ0ST2wNjqEFnlf1x9j1g=; c=simple/simple; d=lists.sr.ht; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-Unsubscribe:List-Subscribe:List-Archive:List-Post:List-ID; q=dns/txt; s=20240113; t=1710747791; v=1; b=fa7rVP01LQKZlDRIII82xT89vaReWhceKdxmpZqlSQFiwJF+qiNFJMDk2KA9Mz3NnVUL7y9b ur3Ei2kl4VhzDdrcw+NIr1Pi1+etSgpP2Is7R3mAKDQ9fLayAQ2rHKuCKHRYsPXRSj2GFqpDBUL Ctrkd1Nj5oylt7Z7PHZLhESzjHForgV906JN6JB5VafK6AdTs00vZMw4DhSAFDHf8+rnR7+fYo0 TGwl9LANXEDA4hbPIlRYdZD1t4qaodGJ9mtvdJX4/ICV7jcdHdQG5nusaHUnvHDHQ2f48bmu509 91XmET3TMyTOvm+WV5ZM6bBej0izZFGyxhonstQ3uMUNQ== Received: from lists.sr.ht (unknown [46.23.81.154]) by mail-a.sr.ht (Postfix) with ESMTPSA id F329420111 for ; Mon, 18 Mar 2024 07:43:10 +0000 (UTC) Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [IPv6:2001:41d0:1004:224b::b2]) by mail-a.sr.ht (Postfix) with ESMTPS id 5719A20102 for <~johnnyrichard/olang-devel@lists.sr.ht>; Mon, 18 Mar 2024 07:43:10 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1710747790; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0rZ8Sp+sBoGmxSxvFC5kaeGLLi29lIoo0f1Qt7HIAxQ=; b=kJPiqqZkiosZI1qaEI3X998SI7wvjQw+pHnzw5xiSp+YlNEPvqmrX1bTJU4dMgLC5cPsTv xcIBTKdyTirjR8xJ4PSemLTD2WCD1XeY7yQclFKoKp9oD7k3nkn1ABH7iTZAlkZOBgA8qM rbwKtekFsNW28x1Ql/3iHbyxzkNVGd7fc6lyPL3ma983V7kYyMnwLk0juabrcJ/FFfMAYt wdJ5+hEFe63jpm2hD+1lfyT3oP/beOM7Av95L7PR2TkQHDtvDNs4cNmRUwuDD5nUSAYhYR d5gebA0XY6Ht3BepAos6qDPB8aQmNZckuasTnVV8Qlu2XyAPqT9hDjNSeo1dNg== From: Johnny Richard To: ~johnnyrichard/olang-devel@lists.sr.ht Cc: Johnny Richard Subject: [PATCH olang v3 1/3] lexer: add tokenize support to binary op tokens Date: Mon, 18 Mar 2024 09:39:51 +0100 Message-ID: <20240318084254.142417-2-johnny@johnnyrichard.com> In-Reply-To: <20240318084254.142417-1-johnny@johnnyrichard.com> References: <20240318084254.142417-1-johnny@johnnyrichard.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Sourcehut-Patchset-Status: UNKNOWN List-Unsubscribe: List-Subscribe: List-Archive: Archived-At: List-Post: List-ID: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel.lists.sr.ht> Sender: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel@lists.sr.ht> X-Migadu-Flow: FLOW_IN X-Migadu-Country: NL X-Migadu-Spam-Score: -4.00 X-Spam-Score: -4.00 X-Migadu-Queue-Id: 50E463FE9C X-Migadu-Scanner: mx11.migadu.com X-TUID: R+kAsTAruMIm In order to parse token cmp not equals I also added the unary not token. Signed-off-by: Johnny Richard --- v2: Add support to tokenize every binary operation tokens v3: Remove peek next char (lookahead) examples/expression.ol | 3 + src/lexer.c | 162 ++++++++++++++++++++++++++++++++-- src/lexer.h | 26 ++++++ tests/integration/cli_test.c | 56 +++++++++++- tests/integration/proc_exec.h | 3 +- 5 files changed, 241 insertions(+), 9 deletions(-) create mode 100644 examples/expression.ol diff --git a/examples/expression.ol b/examples/expression.ol new file mode 100644 index 0000000..efa4ab5 --- /dev/null +++ b/examples/expression.ol @@ -0,0 +1,3 @@ +fn main(): u32 { + return (10 + 1 * 2) - (10 - (1 + 1) / 2) +} diff --git a/src/lexer.c b/src/lexer.c index dd6f11d..23f0326 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -101,6 +101,110 @@ lexer_next_token(lexer_t *lexer, token_t *token) } switch (current_char) { + case '=': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + if (lexer_current_char(lexer) == '=') { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset); + return; + } + + lexer_init_str_value_token(lexer, token, TOKEN_EQ, start_offset); + return; + } + case '!': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + if (lexer_current_char(lexer) == '=') { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_NEQ, start_offset); + return; + } + + lexer_init_str_value_token(lexer, token, TOKEN_BANG, start_offset); + return; + } + case '&': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + if (lexer_current_char(lexer) == '&') { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_AND, start_offset); + return; + } + + lexer_init_str_value_token(lexer, token, TOKEN_AND, start_offset); + return; + } + case '|': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + if (lexer_current_char(lexer) == '|') { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_OR, start_offset); + return; + } + + lexer_init_str_value_token(lexer, token, TOKEN_PIPE, start_offset); + return; + } + case '<': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + switch (lexer_current_char(lexer)) { + case '<': { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_LSHIFT, start_offset); + return; + } + case '=': { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_LEQ, start_offset); + return; + } + default: { + lexer_init_str_value_token(lexer, token, TOKEN_LT, start_offset); + return; + } + } + } + case '>': { + size_t start_offset = lexer->offset; + lexer_skip_char(lexer); + + switch (lexer_current_char(lexer)) { + case '>': { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_RSHIFT, start_offset); + return; + } + case '=': { + lexer_skip_char(lexer); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_GEQ, start_offset); + return; + } + default: { + lexer_init_str_value_token(lexer, token, TOKEN_GT, start_offset); + return; + } + } + } + case '^': { + lexer_init_char_value_token(lexer, token, TOKEN_CIRCUMFLEX); + lexer_skip_char(lexer); + return; + } + case '%': { + lexer_init_char_value_token(lexer, token, TOKEN_PERCENT); + lexer_skip_char(lexer); + return; + } case '(': { lexer_init_char_value_token(lexer, token, TOKEN_OPAREN); lexer_skip_char(lexer); @@ -126,6 +230,26 @@ lexer_next_token(lexer_t *lexer, token_t *token) lexer_skip_char(lexer); return; } + case '+': { + lexer_init_char_value_token(lexer, token, TOKEN_PLUS); + lexer_skip_char(lexer); + return; + } + case '-': { + lexer_init_char_value_token(lexer, token, TOKEN_DASH); + lexer_skip_char(lexer); + return; + } + case '*': { + lexer_init_char_value_token(lexer, token, TOKEN_STAR); + lexer_skip_char(lexer); + return; + } + case '/': { + lexer_init_char_value_token(lexer, token, TOKEN_SLASH); + lexer_skip_char(lexer); + return; + } case '\n': { lexer_init_char_value_token(lexer, token, TOKEN_LF); lexer_skip_char(lexer); @@ -146,12 +270,38 @@ lexer_next_token(lexer_t *lexer, token_t *token) } static char *token_kind_str_table[] = { - [TOKEN_UNKNOWN] = "unknown", [TOKEN_IDENTIFIER] = "identifier", - [TOKEN_NUMBER] = "number", [TOKEN_FN] = "fn", - [TOKEN_RETURN] = "return", [TOKEN_LF] = "line_feed", - [TOKEN_OPAREN] = "(", [TOKEN_CPAREN] = ")", - [TOKEN_COLON] = ":", [TOKEN_OCURLY] = "{", - [TOKEN_CCURLY] = "}", [TOKEN_EOF] = "EOF", + [TOKEN_UNKNOWN] = "unknown", + [TOKEN_IDENTIFIER] = "identifier", + [TOKEN_NUMBER] = "number", + [TOKEN_FN] = "fn", + [TOKEN_RETURN] = "return", + [TOKEN_LF] = "line_feed", + [TOKEN_OPAREN] = "(", + [TOKEN_CPAREN] = ")", + [TOKEN_COLON] = ":", + [TOKEN_OCURLY] = "{", + [TOKEN_CCURLY] = "}", + [TOKEN_PLUS] = "+", + [TOKEN_DASH] = "-", + [TOKEN_STAR] = "*", + [TOKEN_SLASH] = "/", + [TOKEN_EQ] = "=", + [TOKEN_CMP_EQ] = "==", + [TOKEN_BANG] = "!", + [TOKEN_CMP_NEQ] = "!=", + [TOKEN_LT] = "<", + [TOKEN_GT] = ">", + [TOKEN_CMP_LEQ] = "<=", + [TOKEN_CMP_GEQ] = ">=", + [TOKEN_PERCENT] = "%", + [TOKEN_BITWISE_LSHIFT] = "<<", + [TOKEN_BITWISE_RSHIFT] = ">>", + [TOKEN_CIRCUMFLEX] = "^", + [TOKEN_PIPE] = "|", + [TOKEN_LOGICAL_OR] = "||", + [TOKEN_AND] = "&", + [TOKEN_LOGICAL_AND] = "&&", + [TOKEN_EOF] = "EOF", }; char * diff --git a/src/lexer.h b/src/lexer.h index cb91d7e..5ed777b 100644 --- a/src/lexer.h +++ b/src/lexer.h @@ -39,7 +39,33 @@ typedef enum token_kind TOKEN_FN, TOKEN_RETURN, + // Equality operators + TOKEN_CMP_EQ, + TOKEN_CMP_NEQ, + TOKEN_CMP_LEQ, + TOKEN_CMP_GEQ, + + // Logical Operators + TOKEN_LOGICAL_OR, + TOKEN_LOGICAL_AND, + + // Bitwise Operators + TOKEN_BITWISE_LSHIFT, + TOKEN_BITWISE_RSHIFT, + // Single char + TOKEN_BANG, + TOKEN_GT, + TOKEN_LT, + TOKEN_PERCENT, + TOKEN_AND, + TOKEN_PIPE, + TOKEN_CIRCUMFLEX, + TOKEN_EQ, + TOKEN_PLUS, + TOKEN_DASH, + TOKEN_SLASH, + TOKEN_STAR, TOKEN_LF, TOKEN_OPAREN, TOKEN_CPAREN, diff --git a/tests/integration/cli_test.c b/tests/integration/cli_test.c index 8cc22f9..d46471b 100644 --- a/tests/integration/cli_test.c +++ b/tests/integration/cli_test.c @@ -20,7 +20,7 @@ #include static MunitResult -test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture) +test_cli_dump_tokens_example_main_exit(const MunitParameter params[], void *user_data_or_fixture) { cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/main_exit.ol"); munit_assert_int(compilation_result.exec.exit_code, ==, 0); @@ -42,6 +42,47 @@ test_cli_dump_tokens(const MunitParameter params[], void *user_data_or_fixture) return MUNIT_OK; } +static MunitResult +test_cli_dump_tokens_example_expression(const MunitParameter params[], void *user_data_or_fixture) +{ + cli_result_t compilation_result = cli_runner_compiler_dump_tokens("../../examples/expression.ol"); + munit_assert_int(compilation_result.exec.exit_code, ==, 0); + munit_assert_string_equal(compilation_result.exec.stdout_buf, + "../../examples/expression.ol:1:1: \n" + "../../examples/expression.ol:1:4: \n" + "../../examples/expression.ol:1:8: <(>\n" + "../../examples/expression.ol:1:9: <)>\n" + "../../examples/expression.ol:1:10: <:>\n" + "../../examples/expression.ol:1:12: \n" + "../../examples/expression.ol:1:16: <{>\n" + "../../examples/expression.ol:1:17: \n" + "../../examples/expression.ol:2:3: \n" + "../../examples/expression.ol:2:10: <(>\n" + "../../examples/expression.ol:2:11: \n" + "../../examples/expression.ol:2:14: <+>\n" + "../../examples/expression.ol:2:16: \n" + "../../examples/expression.ol:2:18: <*>\n" + "../../examples/expression.ol:2:20: \n" + "../../examples/expression.ol:2:21: <)>\n" + "../../examples/expression.ol:2:23: <->\n" + "../../examples/expression.ol:2:25: <(>\n" + "../../examples/expression.ol:2:26: \n" + "../../examples/expression.ol:2:29: <->\n" + "../../examples/expression.ol:2:31: <(>\n" + "../../examples/expression.ol:2:32: \n" + "../../examples/expression.ol:2:34: <+>\n" + "../../examples/expression.ol:2:36: \n" + "../../examples/expression.ol:2:37: <)>\n" + "../../examples/expression.ol:2:39: \n" + "../../examples/expression.ol:2:41: \n" + "../../examples/expression.ol:2:42: <)>\n" + "../../examples/expression.ol:2:43: \n" + "../../examples/expression.ol:3:1: <}>\n" + "../../examples/expression.ol:3:2: \n" + "../../examples/expression.ol:4:1: \n"); + return MUNIT_OK; +} + static MunitResult test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_or_fixture) { @@ -62,7 +103,18 @@ test_cli_compile_minimal_program(const MunitParameter params[], void *user_data_ } static MunitTest tests[] = { - { "/test_cli_dump_tokens", test_cli_dump_tokens, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }, + { "/test_cli_dump_tokens_example_main_exit", + test_cli_dump_tokens_example_main_exit, + NULL, + NULL, + MUNIT_TEST_OPTION_NONE, + NULL }, + { "/test_cli_dump_tokens_example_expression", + test_cli_dump_tokens_example_expression, + NULL, + NULL, + MUNIT_TEST_OPTION_NONE, + NULL }, { "/test_cli_compile_minimal_program", test_cli_compile_minimal_program, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL }, { NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL } }; diff --git a/tests/integration/proc_exec.h b/tests/integration/proc_exec.h index 135aa6a..45c2977 100644 --- a/tests/integration/proc_exec.h +++ b/tests/integration/proc_exec.h @@ -21,7 +21,8 @@ typedef struct proc_exec_result { int exit_code; - char stdout_buf[1024]; + // FIXME: output buffer shouldn't be fixed size + char stdout_buf[2048]; } proc_exec_result_t; typedef struct proc_exec_command -- 2.44.0