From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id wI8tBZLw92X2XwAA62LTzQ:P1 (envelope-from ) for ; Mon, 18 Mar 2024 08:43:14 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id wI8tBZLw92X2XwAA62LTzQ (envelope-from ) for ; Mon, 18 Mar 2024 08:43:14 +0100 X-Envelope-To: patches@johnnyrichard.com Authentication-Results: aspmx1.migadu.com; none Received: from mail-a.sr.ht (mail-a.sr.ht [46.23.81.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 05D5B7CA6E for ; Mon, 18 Mar 2024 08:43:14 +0100 (CET) DKIM-Signature: a=rsa-sha256; bh=Yew43PNXzQMH0z8KT69FeTgodxCurCJi7TdZ0wenu1g=; c=simple/simple; d=lists.sr.ht; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-Unsubscribe:List-Subscribe:List-Archive:List-Post:List-ID; q=dns/txt; s=20240113; t=1710747793; v=1; b=eJDJo9swRe0Kmeo1igacqX2lnfZnFNt7OU95q3NVTx83quw8NZurjKAI0b06jNxtwDsK5kCa Ey7X87eWx8xROcS66Zx9b/DASIvZ1093xnyTDQQfhO5IOuw3if2/OGIBMhejMejxhBaV6JKjVJP 8lma9+bVigA4r+/HIZrFjrmuHWp5QC1Ov6e6nF5r0CSp1zD++2ayID3iPjWvnX8Tbzxb/L6lua/ pviem6XscwVp23MGVshab8G6kOk5lOezpQDozGP6FEV8SUfaIRZeY/9S2gvf2VA4tsdwieEBMmS ylIZ5nr4ghxD8a5AFrJhaHFOjfbq17BBN3AWLjQ6bzXVA== Received: from lists.sr.ht (unknown [46.23.81.154]) by mail-a.sr.ht (Postfix) with ESMTPSA id C740720102 for ; Mon, 18 Mar 2024 07:43:13 +0000 (UTC) Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) by mail-a.sr.ht (Postfix) with ESMTPS id 0DC5F200F1 for <~johnnyrichard/olang-devel@lists.sr.ht>; Mon, 18 Mar 2024 07:43:13 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1710747792; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YeuUpm9ouEdH7l6AJzS3sQBiH6wEtPVijgJJXfCLCk8=; b=qW8hKL0MYt5kSgKhE4tN+39VqUWaYxtSQcFGAmzj1OgjnqhY/6Z49a31EfUFyh/5e+vbDw doXn10rLRt1G4DaWpBqKwEUOfHthWgvf+fVTu7U53pEKmjU/tFmZ4IhHy/l1j+o2iOUgck IU2r2u7n8ifQlQq9Do7b7ht9iA5gjJkTL1mueQh4FCka1iz1WUUMj0uqpTMh2J+jZjSd4T TLfvyYmtsyuH+R2GdB6Ij+S0EJN0mhYXopGmS6wEwHDVTc+aph2+cXvdzjo1IeJ6G6GRTQ qf1D+2WghroeeJYW6kTdFciUo8Fm3TmioMv3ljhxtp6G1uEL9qZpnb0nSktieg== From: Johnny Richard To: ~johnnyrichard/olang-devel@lists.sr.ht Cc: Johnny Richard Subject: [PATCH olang v3 3/3] parser: add all binary operation expressions Date: Mon, 18 Mar 2024 09:39:53 +0100 Message-ID: <20240318084254.142417-4-johnny@johnnyrichard.com> In-Reply-To: <20240318084254.142417-1-johnny@johnnyrichard.com> References: <20240318084254.142417-1-johnny@johnnyrichard.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Sourcehut-Patchset-Status: PROPOSED List-Unsubscribe: List-Subscribe: List-Archive: Archived-At: List-Post: List-ID: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel.lists.sr.ht> Sender: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel@lists.sr.ht> X-Migadu-Flow: FLOW_IN X-Migadu-Country: NL X-Migadu-Scanner: mx10.migadu.com X-Migadu-Spam-Score: -4.00 X-Spam-Score: -4.00 X-Migadu-Queue-Id: 05D5B7CA6E X-TUID: NoRxy2exTpZm This implementation uses precedence climbing method to guarantee precedence. Now is pretty difficult to visualize the ast tree since we don't have a ast pretty printer. I have validated it by running **gdb**. Testing the ast with unit tests is quite annoying without a good "DSL". We can design such DSL in the near future. =^) Signed-off-by: Johnny Richard --- v2: Replace the previous solution with precedence based on function calls to climbing method. The solution is more extensible and smaller since it already maps every possible binary operation at this very moment. src/lexer.c | 28 +++++++ src/lexer.h | 3 + src/parser.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 227 insertions(+), 13 deletions(-) diff --git a/src/lexer.c b/src/lexer.c index 23f0326..801e4d0 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -311,6 +311,34 @@ token_kind_to_cstr(token_kind_t kind) return token_kind_str_table[kind]; } +bool +token_kind_is_binary_op(token_kind_t kind) +{ + switch (kind) { + case TOKEN_PLUS: + case TOKEN_DASH: + case TOKEN_SLASH: + case TOKEN_STAR: + case TOKEN_PERCENT: + case TOKEN_BITWISE_LSHIFT: + case TOKEN_BITWISE_RSHIFT: + case TOKEN_LT: + case TOKEN_CMP_LEQ: + case TOKEN_GT: + case TOKEN_CMP_GEQ: + case TOKEN_CMP_EQ: + case TOKEN_CMP_NEQ: + case TOKEN_AND: + case TOKEN_CIRCUMFLEX: + case TOKEN_PIPE: + case TOKEN_LOGICAL_AND: + case TOKEN_LOGICAL_OR: + return true; + default: + return false; + } +} + static char lexer_current_char(lexer_t *lexer) { diff --git a/src/lexer.h b/src/lexer.h index 5ed777b..80fb25a 100644 --- a/src/lexer.h +++ b/src/lexer.h @@ -104,6 +104,9 @@ lexer_lookahead(lexer_t *lexer, token_t *token, size_t n); char * token_kind_to_cstr(token_kind_t kind); +bool +token_kind_is_binary_op(token_kind_t kind); + string_view_t lexer_get_token_line(lexer_t *lexer, token_t *token); diff --git a/src/parser.c b/src/parser.c index 5b7d39a..76ef91a 100644 --- a/src/parser.c +++ b/src/parser.c @@ -39,6 +39,12 @@ parser_parse_block(parser_t *parser); ast_node_t * parser_parse_fn_definition(parser_t *parser); +static ast_node_t * +parser_parse_expr(parser_t *parser); + +static ast_node_t * +parser_parse_factor(parser_t *parser); + static void skip_line_feeds(lexer_t *lexer); @@ -57,10 +63,189 @@ ast_node_t * parser_parse_program(parser_t *parser) { ast_node_t *fn = parser_parse_fn_definition(parser); + if (fn == NULL) { + return NULL; + } return ast_new_program(parser->arena, fn); } +static ast_binary_op_kind_t +token_kind_to_binary_op_kind(token_kind_t kind) +{ + switch (kind) { + case TOKEN_PLUS: + return AST_BINOP_ADDITION; + case TOKEN_DASH: + return AST_BINOP_SUBTRACTION; + case TOKEN_SLASH: + return AST_BINOP_DIVISION; + case TOKEN_STAR: + return AST_BINOP_MULTIPLICATION; + case TOKEN_PERCENT: + return AST_BINOP_REMINDER; + case TOKEN_BITWISE_LSHIFT: + return AST_BINOP_BITWISE_LSHIFT; + case TOKEN_BITWISE_RSHIFT: + return AST_BINOP_BITWISE_RSHIFT; + case TOKEN_CIRCUMFLEX: + return AST_BINOP_BITWISE_XOR; + case TOKEN_AND: + return AST_BINOP_BITWISE_AND; + case TOKEN_PIPE: + return AST_BINOP_BITWISE_OR; + case TOKEN_LT: + return AST_BINOP_CMP_LT; + case TOKEN_GT: + return AST_BINOP_CMP_GT; + case TOKEN_CMP_LEQ: + return AST_BINOP_CMP_LEQ; + case TOKEN_CMP_GEQ: + return AST_BINOP_CMP_GEQ; + case TOKEN_CMP_EQ: + return AST_BINOP_CMP_EQ; + case TOKEN_CMP_NEQ: + return AST_BINOP_CMP_NEQ; + case TOKEN_LOGICAL_AND: + return AST_BINOP_LOGICAL_AND; + case TOKEN_LOGICAL_OR: + return AST_BINOP_LOGICAL_OR; + default: { + fprintf(stderr, "error: token kind (%s) not compatible with binary op kind\n", token_kind_to_cstr(kind)); + assert(false); + } + } +} + +typedef enum +{ + BINOP_MIN_PREC, + BINOP_LOGICAL_OR_PREC, + BINOP_LOGICAL_AND_PREC, + BINOP_BITWISE_OR_PREC, + BINOP_BITWISE_XOR_PREC, + BINOP_BITWISE_AND_PREC, + BINOP_CMP_EQ_AND_NEQ_PREC, + BINOP_CMP_LT_AND_GT_PREC, + BINOP_BITWISE_SHIFT_PREC, + BINOP_ADDITIVE_PREC, + BINOP_MULTIPLICATIVE_PREC, +} binary_op_precedence_t; + +static binary_op_precedence_t +get_binary_op_precedence(token_kind_t kind) +{ + switch (kind) { + case TOKEN_PLUS: + case TOKEN_DASH: + return BINOP_ADDITIVE_PREC; + case TOKEN_SLASH: + case TOKEN_STAR: + case TOKEN_PERCENT: + return BINOP_MULTIPLICATIVE_PREC; + case TOKEN_BITWISE_LSHIFT: + case TOKEN_BITWISE_RSHIFT: + return BINOP_BITWISE_SHIFT_PREC; + case TOKEN_LT: + case TOKEN_GT: + return BINOP_CMP_LT_AND_GT_PREC; + case TOKEN_EQ: + case TOKEN_CMP_NEQ: + return BINOP_CMP_EQ_AND_NEQ_PREC; + case TOKEN_AND: + return BINOP_BITWISE_AND_PREC; + case TOKEN_CIRCUMFLEX: + return BINOP_BITWISE_XOR_PREC; + case TOKEN_PIPE: + return BINOP_BITWISE_OR_PREC; + case TOKEN_LOGICAL_AND: + return BINOP_LOGICAL_AND_PREC; + case TOKEN_LOGICAL_OR: + return BINOP_LOGICAL_OR_PREC; + default: + assert(false); + } +} + +static ast_node_t * +parser_parse_expr_1(parser_t *parser, ast_node_t *lhs, size_t prev_precedence) +{ + ast_node_t *expr = NULL; + + token_t lookahead_token; + lexer_peek_next(parser->lexer, &lookahead_token); + + while (token_kind_is_binary_op(lookahead_token.kind) && + get_binary_op_precedence(lookahead_token.kind) >= prev_precedence) { + token_t token_op; + lexer_next_token(parser->lexer, &token_op); + + ast_node_t *rhs = parser_parse_factor(parser); + if (rhs == NULL) { + return NULL; + } + + lexer_peek_next(parser->lexer, &lookahead_token); + + while (token_kind_is_binary_op(lookahead_token.kind) && + get_binary_op_precedence(lookahead_token.kind) > get_binary_op_precedence(token_op.kind)) { + rhs = parser_parse_expr_1(parser, rhs, get_binary_op_precedence(token_op.kind)); + lexer_peek_next(parser->lexer, &lookahead_token); + } + + expr = ast_new_node_bin_op(parser->arena, token_kind_to_binary_op_kind(token_op.kind), lhs, rhs); + if (expr == NULL) { + return NULL; + } + } + + if (expr == NULL) { + return lhs; + } + + return expr; +} + +static ast_node_t * +parser_parse_expr(parser_t *parser) +{ + ast_node_t *lhs = parser_parse_factor(parser); + if (lhs == NULL) { + return NULL; + } + + return parser_parse_expr_1(parser, lhs, BINOP_MIN_PREC); +} + +static ast_node_t * +parser_parse_factor(parser_t *parser) +{ + token_t token; + lexer_next_token(parser->lexer, &token); + + switch (token.kind) { + case TOKEN_NUMBER: + return ast_new_node_literal_u32(parser->arena, string_view_to_u32(token.value)); + + case TOKEN_OPAREN: { + ast_node_t *expr = parser_parse_expr(parser); + if (expr == NULL) { + return NULL; + } + + if (!skip_expected_token(parser, TOKEN_CPAREN)) { + return NULL; + } + + return expr; + } + default: { + fprintf(stderr, "error: parse_factor: unsupported or invalid token (%s)\n", token_kind_to_cstr(token.kind)); + assert(false); + } + } +} + ast_node_t * parser_parse_fn_definition(parser_t *parser) { @@ -131,41 +316,39 @@ parser_parse_type(parser_t *parser, type_t *type) static ast_node_t * parser_parse_block(parser_t *parser) { - token_t number_token; if (!skip_expected_token(parser, TOKEN_OCURLY)) { - return false; + return NULL; } skip_line_feeds(parser->lexer); ast_node_t *node_block = ast_new_node_block(parser->arena); + if (node_block == NULL) { + return NULL; + } if (!skip_expected_token(parser, TOKEN_RETURN)) { - return false; + return NULL; } ast_node_t *node_return_stmt = ast_new_node_return_stmt(parser->arena); assert(node_return_stmt); - if (!expected_token(parser, &number_token, TOKEN_NUMBER)) { - return false; + ast_node_t *expr = parser_parse_expr(parser); + if (expr == NULL) { + return NULL; } - ast_node_t *literal_node = ast_new_node_literal_u32(parser->arena, string_view_to_u32(number_token.value)); - assert(literal_node); - - node_return_stmt->data.as_return_stmt.data = literal_node; + node_return_stmt->data.as_return_stmt.data = expr; list_append(node_block->data.as_block.nodes, node_return_stmt); - if (!skip_expected_token(parser, TOKEN_LF)) { - return false; + return NULL; } skip_line_feeds(parser->lexer); - if (!skip_expected_token(parser, TOKEN_CCURLY)) { - return false; + return NULL; } return node_block; -- 2.44.0