public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed
From: Ricardo Kagawa <ricardo.kagawa@gmail.com>
To: ~johnnyrichard/olang-devel@lists.sr.ht
Cc: "builds.sr.ht" <builds@sr.ht>
Subject: Re: [olang/patches/.build.yml] build success
Date: Thu, 14 Mar 2024 01:29:09 -0300	[thread overview]
Message-ID: <88cb1a82-809e-4db5-95cd-2bbe828d0166@gmail.com> (raw)
In-Reply-To: <CZOQWPXD2V2J.1V3XBHT3ZQUED@fra01>

 >> This grammar adds the token SEMICOLON (';') for every statement.  I 
know we

 >> agreed make it optional, but the SEMICOLON makes the parser much more

 >> convenient to implement.

 >>

 >> And this is the first topic I would like to discuss. Let me know if you

 >> agree otherwise I can adapt the grammar to make SEMICOLON optional.

 >

 > (...) Therefore, I'm curious about your statement that using a

 > semicolon makes the parser much more convenient to implement. Could you

 > elaborate on this? Have you encountered any new considerations that might

 > complicate the implementation?



My limited understanding is that the semicolon would indeed be more

convenient, as it would be a definitive end-of-statement symbol,

requiring no lookahead to resolve as such. The LF token could be

ambiguous on its own (between end-of-statement and white space), so

some lookahead would be required to resolve it.



But it should be alright, as long as the language remains context-free.

Even if it becomes ambiguous, non-deterministic, or requires a long

lookahead. Ideally it should be determinitstic for linear time

performance, but it seems there are parsers that can run close to it in

the average case, as long as the language remains close to

deterministic.



And I don't have a strong opinion on the semicolon issue, except that

it must be an option. But whatever we do, we must avoid the following

pitfall from JavaScript:



```javascript

example

;(x)

```



The semicolon is mandatory here, because otherwise `(x)` is handled as

an argument list, and `example` would be called as a function. That is,

it would be a multi-line statement, instead of two separate statements.



And why anyone would do this?



```javascript

const x = y.example

;(() => {

   console.log(x)

})()

```



Immediately invoked function expressions are a thing in JavaScript, and

it would not be uncommon to have some expression ending with an

identifier right before them.



 >> The grammar was made by using a EBNF evaluator tool[1].

 >>

 >> [1]: https://mdkrajnak.github.io/ebnftest/

 >

 > I would add this link at the markdown, so then people can play with it.



I would make an even stronger argument for including the link in the

docs. A good language specification also specifies which language

specification grammar is used for the specification itself. And the

EBNF in particular is not properly standardized, so you really need to

specify which EBNF variant you are using.



The link should thus be good enough to refer to the EBNF implementation

used in this specification, although a permanent (version locked) link

would be better.



----

As for my revision of the grammar:



- Separated rules into sections.

- Added optional white space around the program.

- You don't actually need non-terminal symbols for keywords. Especially

   if you are including the keyword in the symbol name.

- You don't need non-terminal symbols for symbols either, unless you

   have a more "semantic" name for it. There should not be another

   "semicolon" besides `;`, for example.

- In Johnny's version the function name is a single identifier. I don't

   know why Carlos's version made it multiple. I have made it single

   again.

- In Johnny's version the space before the return type is optional. I

   don't know why Carlos's version made it mandatory. I have made it

   optional again.

- Replaced `<identifier>` in `<function-definition>` with

   `<function-name>` to express that this identifier is the name of the

   declared function. Then, `<function-name>` is just `<identifier>`.

- Renamed `<fn-args>` to `<function-parameters>`, since parameters are

   the variables in a function declaration, while arguments are the

   values bound to those variables during function calls.

- Replaced `<type>` for `<return-type>` in `<function-declaration>` to

   express that this type identifier is the return type of the function.

   Then, `<return-type>` is just `<type>`.

- Replaced `<block>` in `<function-definition>` for `<function-body>` to

   express that this block is the body of the declared function.

- Reworked `<block>`, `<statement>` and `<end-of-statement>` to allow

   for:

     - Single statement followd by optional end-of-statement;

     - Statement list with mandatory end-of-statement between statements;

     - But the statements could be made optional, yet I did not in this

       version, as there is no `void` return type, currently.

- Replaced `<number>` in `<return-statement>` with `<expression>` to

   prepare for them in the future. The only allowed expression is still

   an integer literal, though.

- Renamed `<number>` to `<integer>`, and reworked it to actually

   represent decimal integer literals. Sequences of zero digits are now

   forbidden at the left side, but a lone zero digit is still allowed.

- Reworked `<identifier>` to better express that it starts with

   `<alpha>` or underline, followed by zero or more `<alpha>`, `<digit>`

   or underline.

- Removed `_` from `<alpha>` to better reflect the name (as underline is

   not an alphabetic character).

- Renamed `<space>` for `<ws>` to avoid ambiguity with the character

   U+0020 Space, and made it a one-or-more list. Also introduced `<ows>`

   for "optional white space". Shorter names were preferred here due to

   these symbols in particular being used very frequently.

- Also introduced `<line-break>` as either LF, CR or CRLF. Otherwise the

   CRLF sequence would be parsed as two separate line breaks. Not that it

   would matter that much, except maybe for mapping line numbers.



```

(* Entry Point *)

<program>             ::= <ows> <function-definition> <ows>



(* Functions *)

<function-definition> ::= 'fn' <ws> <function-name> <ows> 
<function-parameters> <ows> ':' <ows> <return-type> <ows> <function-body>

<function-name>       ::= <identifier>

<function-parameters> ::= '(' <ows> ')'

<return-type>         ::= <type>

<function-body>       ::= <block>



(* Statements *)

<block>               ::= '{' <ows> <statement> <ows> 
(<end-of-statement> <ows> <statement> <ows>)* <end-of-statement>? <ows> '}'

<end-of-statement>    ::= ';' | <line-break>

<statement>           ::= <return-statement>

<return-statement>    ::= 'return' <ws> <expression>



(* Expressions *)

<expression>          ::= <integer>



(* Identifiers *)

<type>                ::= 'u32'

<identifier>          ::= (<alpha> | '_') (<alpha> | <digit> | '_')*



(* Literals *)

<integer>             ::= <integer-base10>

<integer-base10>      ::= #'[1-9]' <digit>* | '0'



(* Utilities *)

<ws>                  ::= <white-space>+

<ows>                 ::= <white-space>*

<white-space>         ::= <linear-space> | <line-break>

<line-break>          ::= '\n' | '\r' | '\r\n'

<linear-space>        ::= #'[ \t]'

<alpha>               ::= #'[a-zA-Z]'

<digit>               ::= #'[0-9]'

```



Further discussion:



- Is the language going to support Unicode? If so, `<alpha>` could use

   the _L:Letter_ Unicode category instead of being limited to

   `[a-zA-Z]`. But the EBNF tool does not support Unicode categories in

   its regular expressions (it does not support flags). Also don't

   forget to rename it to `<letter>` in that case.

     - It would help developers in non-English speaking countries, but it

       could be difficult to work with multi-byte characters and Unicode

       normalization.

- There are more linear space and line break characters than the ones

   included here, even within ASCII, although they are not all that

   important. Even more in Unicode (some under _Cc:Other/control_,

   others under _Z:Separator_). Should we support them?

- The function definition could accept a single expression as an

   alternative to its `<block>`, similar to Kotlin.

- The integer literal could include optional underline separators for

   readability. Just need to be careful not to start with underline, to

   avoid ambiguity with identifiers.

- I guess we don't have to support the full set of Unicode digits, since

   we don't know if these digits would even be decimal in the first

   place. The numbering system could be very different from our own, so

   it is likely not feasible to support them.

- I have not checked if this syntax would avoid that edge case with

   JavaScript I mentioned in the beginning. I might check that next

   time (I'm still not sure of how).

- It might seem strange that I included semantic non-terminals here,

   despite having removed non-terminals for symbols and keywords. I can't

   say for sure, since this is my first time trying this style, but I

   suspect that besides making the language specification easier to

   understand, the important bits to hook into in the parser will be

   around these symbols. That is, it could simplify some work on the

   parser.


  reply	other threads:[~2024-03-14  4:29 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-09  0:05 [RFC PATCH olang v1] docs: create zero programming language specification Johnny Richard
2024-03-08 23:09 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-14  4:29   ` Ricardo Kagawa [this message]
2024-03-14 22:43     ` Johnny Richard
2024-03-09  0:36 ` [RFC PATCH olang v1] docs: create zero programming language specification Johnny Richard
2024-03-09  5:09 ` Carlos Maniero
2024-03-19 20:21 ` Johnny Richard
2024-03-23 23:31 ` Carlos Maniero
  -- strict thread matches above, loose matches on Subject: below --
2024-09-27 23:07 [PATCH olang v2 1/2] ast: add function call node Johnny Richard
2024-09-27 21:11 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-25 23:20 [PATCH olang v1 2/2] parser: add support for parsing function calls Johnny Richard
2024-09-25 21:22 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-25 18:39 [PATCH olang] tests: fix diff error output Carlos Maniero
2024-09-25 18:39 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-25 18:30 [PATCH olang] parser: parse multiple function into a single translation unit Carlos Maniero
2024-09-25 18:31 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-23 22:19 [PATCH olang v1 2/3] lexer: add token comma Johnny Richard
2024-09-23 22:23 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-23 11:43 [PATCH olang 2/2] ast: permit multi declarations on translation unit Carlos Maniero
2024-09-23 11:44 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-23 10:11 [PATCH olang v1 3/3] naming: rename all identifier symbols to id Carlos Maniero
2024-09-23 10:12 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-22  0:46 [PATCH olang v2 4/4] codegen: operate mov instructions based on the symbol's type Carlos Maniero
2024-09-22  0:47 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-21 21:02 [PATCH olang v1 2/2] tests: build: add parallelization support for unit tests Johnny Richard
2024-09-21 21:05 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-21  8:25 [PATCH olang 5/5] codegen: perform mov instructions based on variable type Carlos Maniero
2024-09-21  8:26 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-21  1:13 [PATCH olang 5/5] codegen: preserve function's variable stack location Carlos Maniero
2024-09-21  1:13 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-21  0:20 [PATCH olang v1 3/3] codegen: add support scopes and symbols lookups for var Johnny Richard
2024-09-21  0:23 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-17 15:14 [PATCH olang] cli: add libc error handling Carlos Maniero
2024-09-17 15:15 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-17 13:43 [PATCH olang v1] remove unused examples programs Johnny Richard
2024-09-17 11:43 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-17 12:46 [PATCH olang v1 4/4] docs: info: add instructions to install/uninstall olang Johnny Richard
2024-09-17 10:48 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-16 16:29 [PATCH olang v1 3/3] docs: remove pandoc dependency for man docs Johnny Richard
2024-09-16 14:31 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-09-11  1:03 [PATCH olang v1 2/2] parser: add var definition and reference support Johnny Richard
2024-09-10 23:05 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-08-25 13:16 [PATCH olang v2 2/2] codegen: x86_64: implement binary operations Johnny Richard
2024-08-25 13:26 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-08-21  3:39 [PATCH olang 1/2] tests: add comment based integration tests mechanism Carlos Maniero
2024-08-21  3:41 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-08-13 18:55 [PATCH olang v2 2/2] ast: inline ast_node_data_t union typedef Johnny Richard
2024-08-13 18:04 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-05-12 14:30 [PATCH olang 4/4] tests: print integration tests TODOs Carlos Maniero
2024-05-12 14:31 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-27 12:14 [PATCH olang v1 2/2] codegen: x86_64: implement binary operations Johnny Richard
2024-04-27 11:21 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-18 23:08 [PATCH olang v1] parser: fix parse expression with binop chain Johnny Richard
2024-04-18 22:11 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-18 22:18 [PATCH olang v1] parser: add missing <= and >= binary operators Johnny Richard
2024-04-18 21:22 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-18 21:58 [PATCH olang v1] docs: spec: add %, <= and >= binary operators Johnny Richard
2024-04-18 21:02 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-16 23:51 [PATCH olang v1] Revert "docs: spec: postpone assignment operators" Johnny Richard
2024-04-16 22:56 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-16 23:35 [PATCH olang v2] docs: spec: add binary expressions Johnny Richard
2024-04-16 22:40 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-15 18:20 [PATCH olang v1] spec: ebnf: add binary expressions Johnny Richard
2024-04-15 17:43 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-04-08  4:38 [PATCH olang v2 2/2] docs: spec: add variables and constants specification Carlos Maniero
2024-04-08  4:39 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-29  1:59 [PATCH olang] linter: turn off clang-format to keep retro compatibility with v16 Johnny Richard
2024-03-29  0:59 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-29  0:33 [PATCH olang] site: change look and feel and rewrite home introduction section Johnny Richard
2024-03-28 23:33 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-24 16:12 [PATCH olang v3] docs: create o programming language spec Johnny Richard
2024-03-24 15:16 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-19 20:18 [PATCH olang v2] docs: create o programming language spec Johnny Richard
2024-03-19 19:20 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-19 19:57 [PATCH olang v1 3/3] codegen: add compiler support to linux aarch64 arch Johnny Richard
2024-03-19 19:00 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-18  8:39 [PATCH olang v3 3/3] parser: add all binary operation expressions Johnny Richard
2024-03-18  7:43 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-17 21:29 [PATCH olang v2 3/3] parser: add all binary operation expressions Johnny Richard
2024-03-17 20:37 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-13 21:21 [PATCH olang v1 3/3] parser: add basic arithmetic expressions '+' '*' '/' '-' Johnny Richard
2024-03-13 20:29 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-13 12:44 [PATCH olang v3] refactor: rename zero programming language to olang Fabio Maciel
2024-03-13 12:45 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-12 19:35 [PATCH olang v1] refactor: rename zero programming language to olang Johnny Richard
2024-03-12 18:40 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-11  8:48 [PATCH olang] site: change dns to o-lang.org Johnny Richard
2024-03-11  7:50 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-08 23:13 [PATCH olang v1] ast: add ast_node root for the entire program Johnny Richard
2024-03-08 22:13 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-08 22:39 [PATCH olang v2 3/3] tests: add tests for the minimal possible olang program Carlos Maniero
2024-03-08 22:40 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-07 23:23 [PATCH olang 3/3] tests: add tests for the minimal possible olang program Carlos Maniero
2024-03-07 23:24 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-01 22:24 [PATCH olang v2 4/4] parser: create simplified parser for tiny AST Johnny Richard
2024-03-01 21:32 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-28 19:04 [PATCH olang v1 4/4] parser: create simplified parser for tiny AST Johnny Richard
2024-02-28 18:11 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-28 14:25 [PATCH olang v3] arena: optimization: ensure alignment memory access Carlos Maniero
2024-02-28 14:26 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-28 12:37 [PATCH olang v2] cli: replace memory allocation malloc -> arena Johnny Richard
2024-02-28 11:39 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-27 19:59 [PATCH olang v2 2/2] utils: create hash map data structure Johnny Richard
2024-02-27 19:01 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-24 20:40 [PATCH olang] test: fix suite name for list_test and arena_test Johnny Richard
2024-02-24 19:42 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-22 19:09 [PATCH olang] cli: replace memory allocation malloc -> arena Johnny Richard
2024-02-22 18:11 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-22 18:38 [PATCH olang] docs: add DCO information on hacking page Johnny Richard
2024-02-22 17:41 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-22 18:24 [PATCH olang] build: rename 0c.c file to main.c Johnny Richard
2024-02-22 17:26 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-21 22:20 [PATCH olang 2/2] utils: create hash map data structure Johnny Richard
2024-02-21 21:24 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-21 15:09 [PATCH olang v2] arena: optimization: make arena 8 bits aligned Carlos Maniero
2024-02-21 15:09 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-21  5:52 [PATCH olang] arena: optimization: make arena 8 bits aligned Carlos Maniero
2024-02-21  5:53 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-20 23:37 [PATCH olang] utils: add linked-list Carlos Maniero
2024-02-20 23:37 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-20 17:35 [PATCH olang v3] utils: add arena Carlos Maniero
2024-02-20 17:41 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-19 20:42 [PATCH olang v5 4/4] lexer: test: add integration tests for --dump-tokens Carlos Maniero
2024-02-19 20:48 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-19  1:44 [PATCH olang v3 2/2] lexer: create --dump-tokens cli command Johnny Richard
2024-02-19  0:47 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-18  0:50 [PATCH olang 2/2] tests: add unit tests configuration Carlos Maniero
2024-02-18  0:55 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 21:04 [PATCH olang] docs: deploy: replace shrt.site domain by olang.johnnyrichard.com Johnny Richard
2024-02-17 20:03 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 20:38 [PATCH olang] docs: build: fix docs publishing task Johnny Richard
2024-02-17 19:37 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 20:12 [PATCH olang] docs: add mobile version Carlos Maniero
2024-02-17 20:17 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 18:40 [PATCH olang v2] docs: add HACKING documentation Carlos Maniero
2024-02-17 18:45 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 18:29 [PATCH olang v2] docs: add white mode support Carlos Maniero
2024-02-17 18:34 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 17:46 [PATCH olang] docs: add white-mode support Carlos Maniero
2024-02-17 17:51 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-17 16:22 [PATCH olang] docs: add pandoc Carlos Maniero
2024-02-17 16:27 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-16 16:24 [PATCH olang v2] docs: add sphinx documentation support Johnny Richard
2024-02-16 15:26 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-16 16:23 [PATCH olang] docs: build: add deployment script Carlos Maniero
2024-02-16 16:28 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-16  8:59 [PATCH olang] docs: add sphinx documentation support Johnny Richard
2024-02-16  8:01 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-16  3:07 [PATCH olang v3 2/2] tests: add integration test setup Carlos Maniero
2024-02-16  3:12 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-15 16:21 [PATCH olang 2/2] tests: add integration test setup Carlos Maniero
2024-02-15 16:27 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-02-13 20:55 [PATCH olang] docs: fix git send-email config instruction Carlos Maniero
2024-02-13 21:00 ` [olang/patches/.build.yml] build success builds.sr.ht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88cb1a82-809e-4db5-95cd-2bbe828d0166@gmail.com \
    --to=ricardo.kagawa@gmail.com \
    --cc=builds@sr.ht \
    --cc=~johnnyrichard/olang-devel@lists.sr.ht \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox