Re: [RFC PATCH olang v1] docs: create zero programming language specification

public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed

From: Ricardo Kagawa <ricardo.kagawa@gmail.com>
To: ~johnnyrichard/olang-devel@lists.sr.ht
Subject: Re: [RFC PATCH olang v1] docs: create zero programming language specification
Date: Fri, 15 Mar 2024 17:54:10 -0300	[thread overview]
Message-ID: <11b1f29a-7a4a-4b46-9376-98bd52c9edd4@gmail.com> (raw)

> You've replied to the CI build reply.  Next time try to reply to the
> right thread.

I just opened the "Reply to thread" link from sourcehut's web interface.
It automatically filled the TO, CC, Subject and some thread ID header,
which values I just trusted. It seems the UI is not to be trusted, but
then, which values should I use? The address in TO seems to have bounced
my reply, so I thought it didn't even make it to sourcehut.

> Your message has few weird line breaks.

I am not myself sure, but I suspect it is an issue with the file format
generated by `vim`. My default `email` file type seems to be forcing the
`dos` format (CRLF line breaks), which might be being interpreted as two
separate line breaks somewhere between Thunderird, Gmail and sourcehut.
I'll see if I can force it to use the `unix` format (LF only) and if
that fixes things at all.

> I'm not sure how you want to version lock this variant.  Should I add
> a specific github/git tag version to the document?

Yes, sort of. The web tool itself cannot be version locked, since it
simply does not have that option, but it does link to its GitHub
project. The project does not itself contain the description of its
EBNF syntax, but it does have a link to what it uses to implement its
EBNF parser, which in turn describes its syntax. You could include a
version locked link to [that][1].

[1]: https://github.com/Engelberg/instaparse/tree/v1.4.12

> > Is the language going to support Unicode?
>
> I would say to keep it simple as much as we can on this earlier stage
> (ASCII only) unless you have a big concern.

I guess that would be OK. I don't think it would be too difficult to
migrate later. Maybe tricky, but not difficult, since Unicode is a
superset of ASCII. Just need to be careful not to depend too much on
the fact that ASCII characters are stored in 8-bit variables, as
Unicode uses variable-length characters (variable within a string, but
characters are multiples of 8 bits).

> If we don't add a token in here like **=** it will be very weird.

Actually, I mentioned Kotlin also to imply that there would be an
equals sign before the expression.

> > - I have not checked if this syntax would avoid that edge case with
> >   JavaScript I mentioned in the beginning. I might check that next
> >   time (I'm still not sure of how).
>
> Maybe we are going to discovery it on the implementation process.

I _suspect_ it would be enough to give precedence to interpreting line
breaks as end-of-statement, and if so, there might be a way to represent
that precedence in the EBNF grammar (by convention). I would still need
to mull over it for a while to be sure.

Another revision:

- Function body now accepts a single expression.
     - This introduced the `<end-of-file>` token, which is not an actual
       sequence of characters. It allows the function body expression at
       the end of the program without a following line break or
       semicolon. Earlier declarations must include a line break or
       semicolon.
- `\v` (vertical tab) and `\f` (form feed) included as line breaks for
   completeness over ASCII (based on `\s` regex class, which agrees with
   Unicode properties over the ASCII range).
- Integer literals can now include underlines as separators.
     - The literal is allowed to terminate with an arbitrarily long
       sequence of separators, though.
     - It would be possible to restrict the last character to be a digit,
       but maybe it is not worth the trouble?
- Introducing hexadecimal integer literals.
     - Hexadecimal literals are allowed to have an arbitrarily long
       sequence of zeroes to the left, after the `0x` prefix. This is
       intentional, and the parser should ignore excess zeroes.
     - The alphabetic characters in the literal should be parsed without
       case-sensitivity. There is no difference between the `0x` and `0X`
       prefixes in literals. If mixed case is not desirable, let the
       linter or formattter restrict that.

```
(* Entry Point *)
<program>             ::= <ows> <function-definition> <ows> <end-of-file>

(* Functions *)
<function-definition> ::= 'fn' <ws> <function-name> <ows> 
<function-parameters> <ows> ':' <ows> <return-type> <ows> <function-body>
<function-name>       ::= <identifier>
<function-parameters> ::= '(' <ows> ')'
<return-type>         ::= <type>
<function-body>       ::= <block> | '=' <ows> <expression> <ows> 
(<end-of-statement> | <end-of-file>)

(* Statements *)
<block>               ::= '{' <ows> <statement> <ows> 
(<end-of-statement> <ows> <statement> <ows>)* <end-of-statement>? <ows> '}'
<end-of-statement>    ::= ';' | <line-break>
<statement>           ::= <return-statement>
<return-statement>    ::= 'return' <ws> <expression>

(* Expressions *)
<expression>          ::= <integer>

(* Identifiers *)
<type>                ::= 'u32'
<identifier>          ::= (<alpha> | '_') (<alpha> | <digit> | '_')*

(* Literals *)
<integer>             ::= <integer-base10> | <integer-base16>
<integer-base10>      ::= #'[1-9]' (<digit> | '_')* | '0'
<integer-base16>      ::= #'0[Xx]' <hex-digit> (<hex-digit> | '_')*

(* Utilities *)
<ws>                  ::= <white-space>+
<ows>                 ::= <white-space>*
<white-space>         ::= <linear-space> | <line-break>
<line-break>          ::= #'[\n\v\f\r]' | '\r\n'
<linear-space>        ::= #'[ \t]'
<alpha>               ::= #'[a-zA-Z]'
<digit>               ::= #'[0-9]'
<hex-digit>           ::= <digit> | #'[a-fA-F]'
<end-of-file>         ::= #'$'
```

next             reply	other threads:[~2024-03-15 20:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15 20:54 Ricardo Kagawa [this message]
2024-03-17 15:41 ` Carlos Maniero
2024-03-18  9:58 ` Johnny Richard
  -- strict thread matches above, loose matches on Subject: below --
2024-03-09  0:05 Johnny Richard
2024-03-09  0:36 ` Johnny Richard
2024-03-09  5:09 ` Carlos Maniero
2024-03-19 20:21 ` Johnny Richard
2024-03-23 23:31 ` Carlos Maniero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11b1f29a-7a4a-4b46-9376-98bd52c9edd4@gmail.com \
    --to=ricardo.kagawa@gmail.com \
    --cc=~johnnyrichard/olang-devel@lists.sr.ht \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox