From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:700:3204::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 8J2IESB98mVcTQEAbAwnHQ (envelope-from ) for ; Thu, 14 Mar 2024 05:29:20 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id 2Ib7CSB98mWKaAEAe85BDQ (envelope-from ) for ; Thu, 14 Mar 2024 05:29:20 +0100 X-Envelope-To: patches@johnnyrichard.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=lists.sr.ht header.s=20240113 header.b=kJz6pTEN; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oanh6Acv; spf=pass (aspmx1.migadu.com: domain of lists@sr.ht designates 46.23.81.152 as permitted sender) smtp.mailfrom=lists@sr.ht; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=johnnyrichard.com; s=key1; t=1710390559; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=Ws1LDQjqg/UeSZ1Vg7nWt/x781DgnQ0Q/Q17cGBQ8Jw=; b=oPTwa2TskHw/KtUpPVKnle+T1yr3w59fX2n9nFSgSQomiisy61dKkW7ogCebWgfxp60pv8 JB73bV/Nxr1Fd0fHE7gObBBykLpkUsfK460PdDuZ5iEUhYeh4yDxIQ5Ob68SuWgTN82Khf dKpcqH30maP0w7dKPAwwX0bOO9N3oVZciiLOtyAgJ+OEV0RXJmCMhsbqPxFj+7Y7Je91a0 3laSoIzZPBDhfKbax9elvkI92CorlYbss1aH6aJfHjo36wIZZmjCwg2ZlmBT/drLKUVAK7 U4/7u6fSGSPF53OTXyyLSIiBQ3S2ndUkHMjMHEAS7WbhHYhOIVGQBV/JN1++pA== ARC-Seal: i=1; s=key1; d=johnnyrichard.com; t=1710390559; a=rsa-sha256; cv=none; b=asDDZM5cHoQRJxYQ6OXq7nOnluzRrjVuNSEIfqvnYjpTwz3xd5vsi3WzbDBJOAnlWB5SUO TCyq8MT+aeIZM4gnN/BunUygA3XmPyhIfJ50V68+OUe95r+/22X1danfz6NdfihKijqdnr mtsU3af5xi04pN4Qh3yhrLJrRpd9UMa/1wQFYLWXY9m7CveDVWLFbGgFF8ZfhXIn6kuvVP YX9X8LG+ScGSrx1F+GuOSXSeKG05/qD4fLTdAmA95an9Xg/he334Ggw7ytLNFUgjntNHn4 /OL6gu6bcI/dbKw90yCF1Y21oyPzIntx6lVrhyX9cNRPnX/K6xvyrIOirtza5A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=lists.sr.ht header.s=20240113 header.b=kJz6pTEN; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oanh6Acv; spf=pass (aspmx1.migadu.com: domain of lists@sr.ht designates 46.23.81.152 as permitted sender) smtp.mailfrom=lists@sr.ht; dmarc=pass (policy=none) header.from=gmail.com Received: from mail-a.sr.ht (mail-a.sr.ht [46.23.81.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 999C63A98A for ; Thu, 14 Mar 2024 05:29:19 +0100 (CET) DKIM-Signature: a=rsa-sha256; bh=BiKaE8SKxJboaZBKhOEAEvv8juXUc+sXG4n0sucg9Mg=; c=simple/simple; d=lists.sr.ht; h=Date:To:Cc:References:Subject:From:In-Reply-To:List-Unsubscribe:List-Subscribe:List-Archive:List-Post:List-ID; q=dns/txt; s=20240113; t=1710390558; v=1; b=kJz6pTEN3iJ0Qnv+dMYjeLznlTM5DkIdenX8AijFgp2YTe+j/Qm7ERgps6XKinvNyeeQSWn/ WZ/r0rSysirjS+t1zLZYUgXiVRMdZbTfUvcS2HMkMJKVTuruPPTIYIZfSDVe8lm4aaRL54yvaYq YKlP2y6yWgzoAyZKZ5yFkcRbpM3V3TaXdAVLBgoBcgqzUjsZ2rm8O0JlSj0p8KoxBomjZURJLrM zJujPpwZXxYox38prVaWBHpDqG56/EmOGKg9Y16seQ+mb+kXHLf8pJeB386xgSZ5v1++biBy0ti EKVZFUQShDeEJtM5ZW797rGSEo8WgZhkHZbL4xOzbJV2w== Received: from lists.sr.ht (unknown [46.23.81.154]) by mail-a.sr.ht (Postfix) with ESMTPSA id 96919202C3 for ; Thu, 14 Mar 2024 04:29:18 +0000 (UTC) Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by mail-a.sr.ht (Postfix) with ESMTPS id 9E915202BB for <~johnnyrichard/olang-devel@lists.sr.ht>; Thu, 14 Mar 2024 04:29:17 +0000 (UTC) Received: by mail-oi1-x229.google.com with SMTP id 5614622812f47-3bbbc6e51d0so398047b6e.3 for <~johnnyrichard/olang-devel@lists.sr.ht>; Wed, 13 Mar 2024 21:29:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710390556; x=1710995356; darn=lists.sr.ht; h=content-transfer-encoding:in-reply-to:from:content-language:subject :references:cc:to:user-agent:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=Ws1LDQjqg/UeSZ1Vg7nWt/x781DgnQ0Q/Q17cGBQ8Jw=; b=Oanh6Acv8Hk/nOfRcW+pCdUplXmQP8ssBO0QKC5YD82NVJAXyTrfQgTp9euSWpcOGG KOASzdyAT0Qu+5z3JLl0YGNS1s9Yx0HmaiDGPYk3vfgtQKSUUwgxeJqGVl7ZgeF2Mpsn otZa/SvtNqjmp9phQylW3Z6TlDIoxuosAAOqhKbvl9xVo+jgW/VlRhxFkgnDEP1fco2N z9OoxLaILhwztWEky3WhgosGja0cNj78j79j1OXoQ5X8vwZIw4XkjiP+H+27v/Rok0/K h/4RvbkDXDYcJJ7RO2HITpsd7BiNFqm7PIz1pDk6k/d98TLSAjIlQo72ZKtUL5vt5dRl DyGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710390556; x=1710995356; h=content-transfer-encoding:in-reply-to:from:content-language:subject :references:cc:to:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ws1LDQjqg/UeSZ1Vg7nWt/x781DgnQ0Q/Q17cGBQ8Jw=; b=ra8XycVTcqamJPfWwITpEQ/aZi7CfOr+R2dp+rJ9IqReNo+f7lqrp0q3QvntDNk+QE 1Pc7QMkz2BK62tPYgswtJbI1Yg2109LkamVbndHEXdGkOYxDwvTUAcO8ce9xl8/AVxDI x3XaduYJnQtR7A0cKpfk4CTllEcGG8j1aB0zvyMS3Hu7gsy+jx5OAnPDmJvB42iiW52c Tki2UEMwW+vbX2YvYFJJbYKF353sAKARmQwCTlFKsN/aDXOCWLCRhhUOcAnzCtMYnEC2 Xl4wwVToS65MMdG/L/eTNpgp2G2+4B3tOB4auaOFW2qbpBISg+X9oi8KhOz/jbKOYW63 WVow== X-Gm-Message-State: AOJu0Ywq7gIs2PBchy32lu1U5SKXVbc9V6gqxTnPU5X1ZsyZ37VzKL3O ccK1I8H+Vsb7TpTTZ9J68Tf4SYRMlgyT9V5orR6pLyhjQlZBtiSIWfOxe+uNKHw= X-Google-Smtp-Source: AGHT+IER8A0S8FW/4DyvCPgBDK2LR4LTgt9HaKfKb41MHMMX4ymPpCdQO/sojZBYRK+SezMIzyjzdg== X-Received: by 2002:a05:6808:318c:b0:3c2:3f90:1aa0 with SMTP id cd12-20020a056808318c00b003c23f901aa0mr900372oib.39.1710390555884; Wed, 13 Mar 2024 21:29:15 -0700 (PDT) Received: from ?IPV6:2804:2e00:80a9:5800:5e4:b444:7b57:c219? ([2804:2e00:80a9:5800:5e4:b444:7b57:c219]) by smtp.gmail.com with ESMTPSA id fm22-20020a056a002f9600b006e5a3db5875sm464095pfb.13.2024.03.13.21.29.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Mar 2024 21:29:15 -0700 (PDT) Message-ID: <88cb1a82-809e-4db5-95cd-2bbe828d0166@gmail.com> Date: Thu, 14 Mar 2024 01:29:09 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: ~johnnyrichard/olang-devel@lists.sr.ht Cc: "builds.sr.ht" References: Subject: Re: [olang/patches/.build.yml] build success Content-Language: en-US From: Ricardo Kagawa In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Unsubscribe: List-Subscribe: List-Archive: Archived-At: List-Post: List-ID: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel.lists.sr.ht> Sender: ~johnnyrichard/olang-devel <~johnnyrichard/olang-devel@lists.sr.ht> X-Migadu-Flow: FLOW_IN X-Migadu-Country: NL X-Migadu-Spam-Score: -10.53 X-Spam-Score: -10.53 X-Migadu-Queue-Id: 999C63A98A X-Migadu-Scanner: mx11.migadu.com X-TUID: HlEKnp75gZ6+ >> This grammar adds the token SEMICOLON (';') for every statement. I know we >> agreed make it optional, but the SEMICOLON makes the parser much more >> convenient to implement. >> >> And this is the first topic I would like to discuss. Let me know if you >> agree otherwise I can adapt the grammar to make SEMICOLON optional. > > (...) Therefore, I'm curious about your statement that using a > semicolon makes the parser much more convenient to implement. Could you > elaborate on this? Have you encountered any new considerations that might > complicate the implementation? My limited understanding is that the semicolon would indeed be more convenient, as it would be a definitive end-of-statement symbol, requiring no lookahead to resolve as such. The LF token could be ambiguous on its own (between end-of-statement and white space), so some lookahead would be required to resolve it. But it should be alright, as long as the language remains context-free. Even if it becomes ambiguous, non-deterministic, or requires a long lookahead. Ideally it should be determinitstic for linear time performance, but it seems there are parsers that can run close to it in the average case, as long as the language remains close to deterministic. And I don't have a strong opinion on the semicolon issue, except that it must be an option. But whatever we do, we must avoid the following pitfall from JavaScript: ```javascript example ;(x) ``` The semicolon is mandatory here, because otherwise `(x)` is handled as an argument list, and `example` would be called as a function. That is, it would be a multi-line statement, instead of two separate statements. And why anyone would do this? ```javascript const x = y.example ;(() => { console.log(x) })() ``` Immediately invoked function expressions are a thing in JavaScript, and it would not be uncommon to have some expression ending with an identifier right before them. >> The grammar was made by using a EBNF evaluator tool[1]. >> >> [1]: https://mdkrajnak.github.io/ebnftest/ > > I would add this link at the markdown, so then people can play with it. I would make an even stronger argument for including the link in the docs. A good language specification also specifies which language specification grammar is used for the specification itself. And the EBNF in particular is not properly standardized, so you really need to specify which EBNF variant you are using. The link should thus be good enough to refer to the EBNF implementation used in this specification, although a permanent (version locked) link would be better. ---- As for my revision of the grammar: - Separated rules into sections. - Added optional white space around the program. - You don't actually need non-terminal symbols for keywords. Especially if you are including the keyword in the symbol name. - You don't need non-terminal symbols for symbols either, unless you have a more "semantic" name for it. There should not be another "semicolon" besides `;`, for example. - In Johnny's version the function name is a single identifier. I don't know why Carlos's version made it multiple. I have made it single again. - In Johnny's version the space before the return type is optional. I don't know why Carlos's version made it mandatory. I have made it optional again. - Replaced `` in `` with `` to express that this identifier is the name of the declared function. Then, `` is just ``. - Renamed `` to ``, since parameters are the variables in a function declaration, while arguments are the values bound to those variables during function calls. - Replaced `` for `` in `` to express that this type identifier is the return type of the function. Then, `` is just ``. - Replaced `` in `` for `` to express that this block is the body of the declared function. - Reworked ``, `` and `` to allow for: - Single statement followd by optional end-of-statement; - Statement list with mandatory end-of-statement between statements; - But the statements could be made optional, yet I did not in this version, as there is no `void` return type, currently. - Replaced `` in `` with `` to prepare for them in the future. The only allowed expression is still an integer literal, though. - Renamed `` to ``, and reworked it to actually represent decimal integer literals. Sequences of zero digits are now forbidden at the left side, but a lone zero digit is still allowed. - Reworked `` to better express that it starts with `` or underline, followed by zero or more ``, `` or underline. - Removed `_` from `` to better reflect the name (as underline is not an alphabetic character). - Renamed `` for `` to avoid ambiguity with the character U+0020 Space, and made it a one-or-more list. Also introduced `` for "optional white space". Shorter names were preferred here due to these symbols in particular being used very frequently. - Also introduced `` as either LF, CR or CRLF. Otherwise the CRLF sequence would be parsed as two separate line breaks. Not that it would matter that much, except maybe for mapping line numbers. ``` (* Entry Point *) ::= (* Functions *) ::= 'fn' ':' ::= ::= '(' ')' ::= ::= (* Statements *) ::= '{' ( )* ? '}' ::= ';' | ::= ::= 'return' (* Expressions *) ::= (* Identifiers *) ::= 'u32' ::= ( | '_') ( | | '_')* (* Literals *) ::= ::= #'[1-9]' * | '0' (* Utilities *) ::= + ::= * ::= | ::= '\n' | '\r' | '\r\n' ::= #'[ \t]' ::= #'[a-zA-Z]' ::= #'[0-9]' ``` Further discussion: - Is the language going to support Unicode? If so, `` could use the _L:Letter_ Unicode category instead of being limited to `[a-zA-Z]`. But the EBNF tool does not support Unicode categories in its regular expressions (it does not support flags). Also don't forget to rename it to `` in that case. - It would help developers in non-English speaking countries, but it could be difficult to work with multi-byte characters and Unicode normalization. - There are more linear space and line break characters than the ones included here, even within ASCII, although they are not all that important. Even more in Unicode (some under _Cc:Other/control_, others under _Z:Separator_). Should we support them? - The function definition could accept a single expression as an alternative to its ``, similar to Kotlin. - The integer literal could include optional underline separators for readability. Just need to be careful not to start with underline, to avoid ambiguity with identifiers. - I guess we don't have to support the full set of Unicode digits, since we don't know if these digits would even be decimal in the first place. The numbering system could be very different from our own, so it is likely not feasible to support them. - I have not checked if this syntax would avoid that edge case with JavaScript I mentioned in the beginning. I might check that next time (I'm still not sure of how). - It might seem strange that I included semantic non-terminals here, despite having removed non-terminals for symbols and keywords. I can't say for sure, since this is my first time trying this style, but I suspect that besides making the language specification easier to understand, the important bits to hook into in the parser will be around these symbols. That is, it could simplify some work on the parser.