From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id aE3KIEEPAGcTBwEAe85BDQ:P1 (envelope-from ) for ; Fri, 04 Oct 2024 17:52:33 +0200 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id aE3KIEEPAGcTBwEAe85BDQ (envelope-from ) for ; Fri, 04 Oct 2024 17:52:33 +0200 X-Envelope-To: patches@johnnyrichard.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=lists.sr.ht header.s=20240113 header.b=XXjcvVfQ; dkim=pass header.d=johnnyrichard.com header.s=key1 header.b=ZC1L8wg+; dmarc=pass (policy=quarantine) header.from=johnnyrichard.com; spf=pass (aspmx1.migadu.com: domain of lists@sr.ht designates 46.23.81.152 as permitted sender) smtp.mailfrom=lists@sr.ht ARC-Seal: i=1; s=key1; d=johnnyrichard.com; t=1728057153; a=rsa-sha256; cv=none; b=NR0P1iIw/+DWeDJJU835yfUX8H87+XyDIWKrIMZsSFHjgJsUEhCfowNpFfUdn9qNJBuYB6 zZuyUMFTWD7+c/G3wFCYUsYGVu4lhg0l3Rg3bqxy5bAHEv+4PERYPfrwkWveZtIif0/7/D 3VUgNVO5v5E/XgzJa5JSv9wx7BDir6uUzP1bwN9unFBq2q7ikkFMhIi127vIncZIOmTILO 222vTRGJSPgPDyKCYXXnCGBElKjE/4zofaPBAan9kY0EuhQXjZT1SJLJkuu3ViFvtSLTzQ RsawRC1sSPvC4P7HujdhNL9xGtaqkUZHxf+mWI8UusuP2IiIiTw7bAhr9mw8rA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=lists.sr.ht header.s=20240113 header.b=XXjcvVfQ; dkim=pass header.d=johnnyrichard.com header.s=key1 header.b=ZC1L8wg+; dmarc=pass (policy=quarantine) header.from=johnnyrichard.com; spf=pass (aspmx1.migadu.com: domain of lists@sr.ht designates 46.23.81.152 as permitted sender) smtp.mailfrom=lists@sr.ht ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1728057153; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:list-id: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=qVOpuN8XeYL13kWDVuGi4Q2kEPkuAvkDhDc3qNXt/FI=; b=AyVjUGAHNoyYIdW/+VTWnL4zhazzuA1zhMW4/z7EoU8GPkIrGRvKkIaFgQPFzXSaWycrhj iD67r6zqJG6/zhTXL5pANtxJGJaVFZgo499uwKVnGlB4lHJTl28MR9CPcgLuRzwm/O/yKp NQ3ro2OK1RsechCHlPgeqxmil6LSo3FqoHg7tqTvmF7xsagOnAcsp1ipUiNeAoYPxkXkGt NcZv+lCgaRDEC6YelnR8SfsqnW4GwMVyDCC4Gh2Cr2oHIlX/uXbo6JSxDX/91MSrxFRb0S vyTjHDW7ZP5P+tr8jDx7gO0bq+f0Mmn8krqbJhRMjYKoN9abnA3j9GlkYOAObQ== Received: from mail-a.sr.ht (mail-a.sr.ht [46.23.81.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 16BDF60220 for ; Fri, 04 Oct 2024 17:52:30 +0200 (CEST) DKIM-Signature: a=rsa-sha256; bh=N67EHvs0Q+6iS62IuR9Yl+XB2aoyh9VZ/569mUoKQiw=; c=simple/simple; d=lists.sr.ht; h=From:To:Cc:Subject:Date:List-Unsubscribe:List-Subscribe:List-Archive:List-Post:List-ID; q=dns/txt; s=20240113; t=1728057149; v=1; b=XXjcvVfQKu9Nlwwea3LI5SHPbtGKlWaqrHYDR6EepIP1pkAhvkyiqXxRxaCNICS9uXTs+Ghu PD5/SQcMxBnHnfb2zlDcUFvB2ied2Lv8Yg5P2TuHmswF596M5Sa2NGa+MQdbZN+dWRmUurrUTIq km2AkaSZLX+CKv2onTXbvuTVGpnCQhsfHZubZYYviuxLoQ7J0h1l+xSuEGJp+tYNvGMn35m5VdA 6dwngXovT2Fyfy4eC80CP6MxQ9GJybfYUkjtHBiH9sz+rpuFHGuoy9/6LpDKUiNU/3xHtv+gJjX JW1D+QJYAzjDilDFmJhGBlI9WabeKIsvAHbRMR65GzsSA== Received: from lists.sr.ht (unknown [46.23.81.154]) by mail-a.sr.ht (Postfix) with ESMTPSA id 9F96A2021E for ; Fri, 04 Oct 2024 15:52:29 +0000 (UTC) Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) by mail-a.sr.ht (Postfix) with ESMTPS id DC0D020117 for <~johnnyrichard/olang-devel@lists.sr.ht>; Fri, 04 Oct 2024 15:52:28 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1728057146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=qVOpuN8XeYL13kWDVuGi4Q2kEPkuAvkDhDc3qNXt/FI=; b=ZC1L8wg+DNT8qmg+mEwLl0a1G0/OmMHgAVpsuMFqglzlOjMysfTdprB2oEF7yh+x6A1xoF nMZ1rG843+JMCCBWMO7o+wbswhIO5Yui6gNlTiX9ptA/toZqUkl6JeZPTHjm2PY6IlPCY+ aymNDETxltI+x6cBNoL/PjSynUcnywwjjsqQkHS10a9nvLDyC/kaVIW75Q7RmloC7wQ9IW h4VNcWBko4JwcLXGByISZfTtI5NOWn7pIPnRSi+bbwUhM5dYWF0ktJey0PPQGIC1nDCd3L TyH0+JqGWFRhX2GMWvbrOBV5pZabTAR1jTtJt5Jd3IoJxCd+vzTS9cwIE1UCRg== From: Johnny Richard To: ~johnnyrichard/olang-devel@lists.sr.ht Cc: Johnny Richard Subject: [PATCH olang v1] lexer: add lexer cursor abstraction Date: Fri, 4 Oct 2024 19:51:45 +0200 Message-ID: <20241004175213.36138-1-johnny@johnnyrichard.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Sourcehut-Patchset-Status: PROPOSED List-Unsubscribe: List-Subscribe: List-Archive: Archived-At: List-Post: List-ID: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel.lists.sr.ht> Sender: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel@lists.sr.ht> X-Migadu-Flow: FLOW_IN X-Migadu-Country: NL X-Migadu-Spam-Score: -0.83 X-Spam-Score: -0.83 X-Migadu-Queue-Id: 16BDF60220 X-Migadu-Scanner: mx12.migadu.com X-TUID: Zixd7kkRhXPq In order to simplify the navigation and lexer state we are using a common structure between tokens and lexer. Signed-off-by: Johnny Richard --- src/lexer.c | 96 ++++++++++++++++++++++++---------------------------- src/lexer.h | 18 +++++----- src/main.c | 4 +-- src/parser.c | 6 ++-- 4 files changed, 58 insertions(+), 66 deletions(-) diff --git a/src/lexer.c b/src/lexer.c index 6fe0151..8de40a0 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -26,9 +26,9 @@ lexer_init(lexer_t *lexer, string_view_t source) { assert(lexer); lexer->source = source; - lexer->offset = 0; - lexer->row = 0; - lexer->bol = 0; + lexer->cur.offset = 0; + lexer->cur.row = 0; + lexer->cur.bol = 0; } static char @@ -50,7 +50,7 @@ static void lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind); static void -lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset); +lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, lexer_cursor_t cur); static void lexer_init_eof_token(lexer_t *lexer, token_t *token); @@ -84,120 +84,121 @@ lexer_next_token(lexer_t *lexer, token_t *token) } if (isalpha(current_char)) { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; while (isalnum(current_char) && lexer_is_not_eof(lexer)) { lexer_skip_char(lexer); current_char = lexer_current_char(lexer); } - string_view_t text = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; + string_view_t text = { .chars = lexer->source.chars + start_cur.offset, + .size = lexer->cur.offset - start_cur.offset }; - lexer_init_str_value_token(lexer, token, lexer_str_to_token_kind(text), start_offset); + lexer_init_str_value_token(lexer, token, lexer_str_to_token_kind(text), start_cur); return; } if (isdigit(current_char)) { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; while (isdigit(current_char) && lexer_is_not_eof(lexer)) { lexer_skip_char(lexer); current_char = lexer_current_char(lexer); } - lexer_init_str_value_token(lexer, token, TOKEN_NUMBER, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_NUMBER, start_cur); return; } switch (current_char) { case '=': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); if (lexer_current_char(lexer) == '=') { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_EQ, start_cur); return; } - lexer_init_str_value_token(lexer, token, TOKEN_EQ, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_EQ, start_cur); return; } case '!': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); if (lexer_current_char(lexer) == '=') { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_CMP_NEQ, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_NEQ, start_cur); return; } - lexer_init_str_value_token(lexer, token, TOKEN_BANG, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_BANG, start_cur); return; } case '&': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); if (lexer_current_char(lexer) == '&') { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_AND, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_AND, start_cur); return; } - lexer_init_str_value_token(lexer, token, TOKEN_AND, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_AND, start_cur); return; } case '|': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); if (lexer_current_char(lexer) == '|') { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_OR, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_LOGICAL_OR, start_cur); return; } - lexer_init_str_value_token(lexer, token, TOKEN_PIPE, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_PIPE, start_cur); return; } case '<': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); switch (lexer_current_char(lexer)) { case '<': { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_LSHIFT, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_LSHIFT, start_cur); return; } case '=': { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_CMP_LEQ, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_LEQ, start_cur); return; } default: { - lexer_init_str_value_token(lexer, token, TOKEN_LT, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_LT, start_cur); return; } } } case '>': { - size_t start_offset = lexer->offset; + lexer_cursor_t start_cur = lexer->cur; lexer_skip_char(lexer); switch (lexer_current_char(lexer)) { case '>': { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_RSHIFT, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_BITWISE_RSHIFT, start_cur); return; } case '=': { lexer_skip_char(lexer); - lexer_init_str_value_token(lexer, token, TOKEN_CMP_GEQ, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_CMP_GEQ, start_cur); return; } default: { - lexer_init_str_value_token(lexer, token, TOKEN_GT, start_offset); + lexer_init_str_value_token(lexer, token, TOKEN_GT, start_cur); return; } } @@ -358,25 +359,25 @@ token_kind_is_binary_op(token_kind_t kind) static char lexer_current_char(lexer_t *lexer) { - return lexer->source.chars[lexer->offset]; + return lexer->source.chars[lexer->cur.offset]; } static void lexer_skip_char(lexer_t *lexer) { - assert(lexer->offset < lexer->source.size); + assert(lexer->cur.offset < lexer->source.size); if (lexer_current_char(lexer) == '\n') { - lexer->row++; - lexer->bol = ++lexer->offset; + lexer->cur.row++; + lexer->cur.bol = ++lexer->cur.offset; } else { - lexer->offset++; + lexer->cur.offset++; } } static bool lexer_is_eof(lexer_t *lexer) { - return lexer->offset >= lexer->source.size; + return lexer->cur.offset >= lexer->source.size; } static bool @@ -394,25 +395,22 @@ _isspace(char c) static void lexer_init_char_value_token(lexer_t *lexer, token_t *token, token_kind_t kind) { - string_view_t str = { .chars = lexer->source.chars + lexer->offset, .size = 1 }; - token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol }; - *token = (token_t){ .kind = kind, .value = str, .location = location }; + string_view_t str = { .chars = lexer->source.chars + lexer->cur.offset, .size = 1 }; + *token = (token_t){ .kind = kind, .value = str, .cur = lexer->cur }; } static void -lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, size_t start_offset) +lexer_init_str_value_token(lexer_t *lexer, token_t *token, token_kind_t kind, lexer_cursor_t cur) { - string_view_t str = { .chars = lexer->source.chars + start_offset, .size = lexer->offset - start_offset }; - token_loc_t location = { .offset = start_offset, .row = lexer->row, .bol = lexer->bol }; - *token = (token_t){ .kind = kind, .value = str, .location = location }; + string_view_t str = { .chars = lexer->source.chars + cur.offset, .size = lexer->cur.offset - cur.offset }; + *token = (token_t){ .kind = kind, .value = str, .cur = cur }; } static void lexer_init_eof_token(lexer_t *lexer, token_t *token) { string_view_t str = { 0 }; - token_loc_t location = { .offset = lexer->offset, .row = lexer->row, .bol = lexer->bol }; - *token = (token_t){ .kind = TOKEN_EOF, .value = str, .location = location }; + *token = (token_t){ .kind = TOKEN_EOF, .value = str, .cur = lexer->cur }; } static token_kind_t @@ -450,23 +448,19 @@ lexer_peek_next(lexer_t *lexer, token_t *token) void lexer_lookahead(lexer_t *lexer, token_t *token, size_t n) { - size_t previous_offset = lexer->offset; - size_t previous_row = lexer->row; - size_t previous_bol = lexer->bol; + lexer_cursor_t previous_cur = lexer->cur; for (size_t i = 0; i < n; ++i) { lexer_next_token(lexer, token); } - lexer->offset = previous_offset; - lexer->row = previous_row; - lexer->bol = previous_bol; + lexer->cur = previous_cur; } string_view_t lexer_get_token_line(lexer_t *lexer, token_t *token) { - size_t offset = token->location.bol; + size_t offset = token->cur.bol; string_view_t line = { .chars = lexer->source.chars + offset, .size = 0 }; while ((line.size + offset) < lexer->source.size && line.chars[line.size] != '\n' && line.chars[line.size] != 0) { diff --git a/src/lexer.h b/src/lexer.h index 2746e3e..1aecb11 100644 --- a/src/lexer.h +++ b/src/lexer.h @@ -21,12 +21,17 @@ #include #include -typedef struct lexer +typedef struct lexer_cursor { - string_view_t source; size_t offset; size_t row; size_t bol; +} lexer_cursor_t; + +typedef struct lexer +{ + string_view_t source; + lexer_cursor_t cur; } lexer_t; typedef enum token_kind @@ -79,18 +84,11 @@ typedef enum token_kind TOKEN_EOF } token_kind_t; -typedef struct token_loc -{ - size_t offset; - size_t row; - size_t bol; -} token_loc_t; - typedef struct token { token_kind_t kind; string_view_t value; - token_loc_t location; + lexer_cursor_t cur; } token_t; void diff --git a/src/main.c b/src/main.c index 60b17bf..9d66455 100644 --- a/src/main.c +++ b/src/main.c @@ -246,7 +246,7 @@ print_token(char *file_path, token_t *token) { printf("%s:%lu:%lu: <%s>\n", file_path, - token->location.row + 1, - (token->location.offset - token->location.bol) + 1, + token->cur.row + 1, + (token->cur.offset - token->cur.bol) + 1, token_kind_to_cstr(token->kind)); } diff --git a/src/parser.c b/src/parser.c index a025ed4..26e5465 100644 --- a/src/parser.c +++ b/src/parser.c @@ -623,14 +623,14 @@ expected_token(parser_t *parser, token_t *token, token_kind_t expected_kind) fprintf(stderr, "%s:%lu:%lu: error: got '" SV_FMT "' token but expect <%s>\n", parser->file_path, - token->location.row + 1, - (token->location.offset - token->location.bol) + 1, + token->cur.row + 1, + (token->cur.offset - token->cur.bol) + 1, SV_ARG(token->value), token_kind_to_cstr(expected_kind)); string_view_t line = lexer_get_token_line(parser->lexer, token); fprintf(stderr, "" SV_FMT "\n", SV_ARG(line)); - fprintf(stderr, "%*s\n", (int)(token->location.offset - token->location.bol + 1), "^"); + fprintf(stderr, "%*s\n", (int)(token->cur.offset - token->cur.bol + 1), "^"); exit(EXIT_FAILURE); } base-commit: 9a9b1e51387cc60eb2a388713431f659cf4703c9 -- 2.46.0