From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail-a.sr.ht; dkim=pass header.d=maniero.me header.i=@maniero.me Received: from heron.birch.relay.mailchannels.net (heron.birch.relay.mailchannels.net [23.83.209.82]) by mail-a.sr.ht (Postfix) with ESMTPS id B01332009F for <~johnnyrichard/olang-devel@lists.sr.ht>; Mon, 19 Feb 2024 03:30:46 +0000 (UTC) X-Sender-Id: hostingeremail|x-authuser|carlos@maniero.me Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id E23177A2B81 for <~johnnyrichard/olang-devel@lists.sr.ht>; Mon, 19 Feb 2024 03:30:44 +0000 (UTC) Received: from fr-int-smtpout5.hostinger.io (unknown [127.0.0.6]) (Authenticated sender: hostingeremail) by relay.mailchannels.net (Postfix) with ESMTPA id 0D6007A1468 for <~johnnyrichard/olang-devel@lists.sr.ht>; Mon, 19 Feb 2024 03:30:43 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1708313444; a=rsa-sha256; cv=none; b=nsSrRRBqgksvzpSzSCElULGCKqqIFsFnQ/0PeLprDC07sLFGs9ocBR7vMxmleG92hYz+ms P/ejSog+CNpBhy2MTO8fM3M4/4pUUqu32uib8KLeE3c/32zUkjmiaDS6Ba4eZfwEtSWwpU pF63d1HspiEjcvW6heiup70vk57ekS55zYlLSEzB7wPi/yHmmpUMe+hGYohMHSC0bemapJ s4WA/z4K3fqFhaxY66EIzmQer3GzfxBegqhWwfuE7TzHAwbPYa0531Nvbq33ZFZDZMk900 0qPqlz/E7y9Ys6zQmqZ2/G+Cg3m+LTkWZ4BWNU7h3jiN238uWAdedGhlJaEWCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1708313444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nTV5TZHOw+xeMeeiNCseQU/OYFHwuu8n6izgYBl5vBM=; b=zQqz3NuadngEHnBnaB/x8rAJyce3wjwqVSQWeXFKw6QCVglCTNrHGjXTUezkKSlN/6KvMg cl2jL/zMO6QVml0w4oTNHd6Pt/gTaMbOdagRlD7ANTsYPxiPyOuXEisAyuQaXTbR44i313 C1JYHGQ3N0z9PEk+mEzEZKfoMsfsJVlNBef+q+d2UYmfkMnugUwIgpZ8rTmk6/U2nzRSBy oijCbgCCSNpvCtBDESFuFpv11jy1DIGN6X/g1EcM+eFam2cQoloF84xsC/J2zdvWsIn4DL yXk4rbNUSyKDzQPdRemyTG5njmYgYrQNBHr6CDFEV049h5RPgPzPpauLPYyOSQ== ARC-Authentication-Results: i=1; rspamd-55b4bfd7cb-9kh72; auth=pass smtp.auth=hostingeremail smtp.mailfrom=carlos@maniero.me X-Sender-Id: hostingeremail|x-authuser|carlos@maniero.me X-MC-Relay: Neutral X-MailChannels-SenderId: hostingeremail|x-authuser|carlos@maniero.me X-MailChannels-Auth-Id: hostingeremail X-Eyes-Share: 4210746e30bf9ae3_1708313444574_4183849700 X-MC-Loop-Signature: 1708313444574:2139313175 X-MC-Ingress-Time: 1708313444573 Received: from fr-int-smtpout5.hostinger.io ([UNAVAILABLE]. [89.116.146.168]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.107.7.251 (trex/6.9.2); Mon, 19 Feb 2024 03:30:44 +0000 Mime-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maniero.me; s=hostingermail1; t=1708313442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nTV5TZHOw+xeMeeiNCseQU/OYFHwuu8n6izgYBl5vBM=; b=IUHDPFuCNdqjH8o7X131TNZqVG/epvLwkHKo0ML4wMSekfySwn5k4N8ab9z/sEiHXqFOB4 RTPy+WxQ6Q45pUbCwG2EOyGscFyzSRMLSLEZeg8OuTtiPY7RAMzJ6VN8Nkpu9y0cqbmdcr TkEDlLqZM3yy92W8KyUDCuE28Z8axLs8rq2zTrMC/B6m1zS6970nXI91z2LaAoxmSZbP6H weNJU7LGRRG9zJhnLOSVlcgRnuYf5R7rTuhnIecxEILH3wS9da8Yf2z3qspxCF8erRFfdj VaaRMBEXdKKzVV3B/Bo671UtKBBETED8BjO6gWv+x9MFQHroFDo3otAsjJXTAQ== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Mon, 19 Feb 2024 00:30:36 -0300 Message-Id: To: "Johnny Richard" , <~johnnyrichard/olang-devel@lists.sr.ht> Subject: Re: [PATCH olang v3 2/2] lexer: create --dump-tokens cli command From: "Carlos Maniero" X-Mailer: aerc 0.15.2-211-g37d5fc691aff References: <20240219013843.15707-1-johnny@johnnyrichard.com> <20240219013843.15707-4-johnny@johnnyrichard.com> In-Reply-To: <20240219013843.15707-4-johnny@johnnyrichard.com> X-CM-Envelope: MS4xfOXOZgaShJXNUmKX3WVxVSIZCK8/Q1a8Gu7P8Ek85/V/n0dSoWpgT/I/NEezI8P34PJrnAlleg95IWUFf4z8RE1DzSWjfcXCZCeybpWg8AgOQHqV6G7i YFXFIGwhWd7+Kd58wuGrkbRZKtRhj986ViWOCiNICuqzXv1EtzRl7nbOxy+OKIUoz0XWnRsLIhFPnCBfXprBvUzX0KvSLQOdkkkVRBAM+/J+rzI8ZLFL9Lzj ExaADJah3+zw7pJTpyay1g== X-CM-Analysis: v=2.4 cv=FdIxxo+6 c=1 sm=1 tr=0 ts=65d2cb62 a=5+VMC1FZ3J4mVPAKpPmAqg==:117 a=5+VMC1FZ3J4mVPAKpPmAqg==:17 a=IkcTkHD0fZMA:10 a=MKtGQD3n3ToA:10 a=1oJP67jkp3AA:10 a=BXDaF_L80NYA:10 a=oGqNJJqFnunizNADSCkA:9 a=QEXdDO2ut3YA:10 X-AuthUser: carlos@maniero.me X-TUID: p30dpYKeyXGc Nice work man! I just have a few comments: > + while (token.kind !=3D TOKEN_EOF) { > + printf("%s:%lu:%lu: <%s>\n", > + opts.file_path, > + token.location.row + 1, > + (token.location.offset - token.location.bol) + 1, > + token_kind_to_cstr(token.kind)); > + lexer_next_token(&lexer, &token); > + } IMO, EOF token should be printed to, as it is a token returned by the lexer. > + if (lexer_is_eof(lexer)) { > + *token =3D (token_t){ .kind =3D TOKEN_EOF }; > + return; > + } Missing token location. I know it seems silly to have the EOF position. But it is useful for parser error messages such as "expected } found EOF". Remember that this code appears twice, before and after the while. > +lexer_next_char(lexer_t *lexer) s/lexer_next_char/lexer_current_char the current name of the function give me the impression that it changes the offset. > + if (lexer->source.chars[lexer->offset] =3D=3D '\n') { call lexer_next_char/lexer_current_char instead. > +static bool > +_isspace(char c) > +{ > + return c =3D=3D ' ' || c =3D=3D '\f' || c =3D=3D '\r' || c =3D=3D '\= t' || c =3D=3D '\v'; > +} What do you think about just add the *\n* guard before calling the *isspace* that way it is clean for someone reading the code why you have to reimplement the function. return c !=3D '\n' && isspace(c); > +static void > +lexer_init_char_token(lexer_t *lexer, token_t *token, token_kind_t kind)= ; > + > +static void > +lexer_init_str_token(lexer_t *lexer, token_t *token, token_kind_t kind, = size_t start_offset); > + > +static token_kind_t > +lexer_str_to_token_kind(string_view_t text); I don't have a suggestion to it, but IMO *lexer_init_char_token* and *lexer_init_str_token* makes me feel we are initializing a "string" and a 'char" token. But I haven't a better name, I thought calling it *lexer_init_single_char_token* and *lexer_init_multi_char_token* but IDK if it is really better.