public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed
* [RFC PATCH olang v1] docs: create zero programming language specification
@ 2024-03-09  0:05 Johnny Richard
  2024-03-08 23:09 ` [olang/patches/.build.yml] build success builds.sr.ht
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Johnny Richard @ 2024-03-09  0:05 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel; +Cc: Johnny Richard

This document specifies the semantics and behavior of the Zero Programming
Language for compiler programmers be informed how the language is designed.

This document will help newcomers to understand how the language looks
like and as a DRAFT guide on the language design discussions.

The grammar was made by using a EBNF evaluator tool[1].

[1]: https://mdkrajnak.github.io/ebnftest/

Signed-off-by: Johnny Richard <johnny@johnnyrichard.com>
---
This grammar adds the token SEMICOLON (';') for every statement.  I know we
agreed make it optional, but the SEMICOLON makes the parser much more
convenient to implement.

And this is the first topic I would like to discuss. Let me know if you
agree otherwise I can adapt the grammar to make SEMICOLON optional.

 docs/pages/language_specification.md | 41 ++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 docs/pages/language_specification.md

diff --git a/docs/pages/language_specification.md b/docs/pages/language_specification.md
new file mode 100644
index 0000000..9d27eda
--- /dev/null
+++ b/docs/pages/language_specification.md
@@ -0,0 +1,41 @@
+zero programming language specification
+=======================================
+
+ABSTRACT
+--------
+
+This document specifies the semantics and behavior of the Zero Programming
+Language for compiler programmers be informed how the language is designed.
+
+This specification is on DRAFT and will evolve through discussions on olang-dev
+mailing list.
+
+Language Syntax
+---------------
+
+This is the Zero Programming Language EBNF grammar specification 
+
+NOTE: This grammar spec is a DRAFT and it covers only a small portion of the
+language.
+
+```
+<program>               ::= <function-definition>
+<function-definition>   ::= <fn_keyword> <space>+ <identifier> <space>* <f-args> <space>* <colon> <space>* <type> <space>* <block>
+<identifier>            ::= <alpha>+
+                          | <alpha>+ <number>*
+                          ;
+<f-args>                ::= '(' <space>* ')'
+<block>                 ::= <ocurly> <space>* <statement>* <space>* <ccurly>
+<statement>             ::= <return-statement>
+<return-statement>      ::= <return_keyword> <space>* <number>* <space>* <semicolon>
+<semicolon>             ::= ';'
+<ocurly>                ::= '{'
+<ccurly>                ::= '}'
+<type>                  ::= 'u32 '
+<colon>                 ::= ':'
+<alpha>                 ::= #'[a-zA-Z_]'
+<number>                ::= #'[0-9]'
+<fn_keyword>            ::= 'fn'
+<return_keyword>        ::= 'return'
+<space>                 ::= #'[ \t\r\n]'
+```
-- 
2.44.0


^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [RFC PATCH olang v1] docs: create zero programming language specification
@ 2024-03-15 20:54 Ricardo Kagawa
  2024-03-17 15:41 ` Carlos Maniero
  2024-03-18  9:58 ` Johnny Richard
  0 siblings, 2 replies; 11+ messages in thread
From: Ricardo Kagawa @ 2024-03-15 20:54 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel

> You've replied to the CI build reply.  Next time try to reply to the
> right thread.

I just opened the "Reply to thread" link from sourcehut's web interface.
It automatically filled the TO, CC, Subject and some thread ID header,
which values I just trusted. It seems the UI is not to be trusted, but
then, which values should I use? The address in TO seems to have bounced
my reply, so I thought it didn't even make it to sourcehut.

> Your message has few weird line breaks.

I am not myself sure, but I suspect it is an issue with the file format
generated by `vim`. My default `email` file type seems to be forcing the
`dos` format (CRLF line breaks), which might be being interpreted as two
separate line breaks somewhere between Thunderird, Gmail and sourcehut.
I'll see if I can force it to use the `unix` format (LF only) and if
that fixes things at all.

> I'm not sure how you want to version lock this variant.  Should I add
> a specific github/git tag version to the document?

Yes, sort of. The web tool itself cannot be version locked, since it
simply does not have that option, but it does link to its GitHub
project. The project does not itself contain the description of its
EBNF syntax, but it does have a link to what it uses to implement its
EBNF parser, which in turn describes its syntax. You could include a
version locked link to [that][1].

[1]: https://github.com/Engelberg/instaparse/tree/v1.4.12

> > Is the language going to support Unicode?
>
> I would say to keep it simple as much as we can on this earlier stage
> (ASCII only) unless you have a big concern.

I guess that would be OK. I don't think it would be too difficult to
migrate later. Maybe tricky, but not difficult, since Unicode is a
superset of ASCII. Just need to be careful not to depend too much on
the fact that ASCII characters are stored in 8-bit variables, as
Unicode uses variable-length characters (variable within a string, but
characters are multiples of 8 bits).

> If we don't add a token in here like **=** it will be very weird.

Actually, I mentioned Kotlin also to imply that there would be an
equals sign before the expression.

> > - I have not checked if this syntax would avoid that edge case with
> >   JavaScript I mentioned in the beginning. I might check that next
> >   time (I'm still not sure of how).
>
> Maybe we are going to discovery it on the implementation process.

I _suspect_ it would be enough to give precedence to interpreting line
breaks as end-of-statement, and if so, there might be a way to represent
that precedence in the EBNF grammar (by convention). I would still need
to mull over it for a while to be sure.

Another revision:

- Function body now accepts a single expression.
     - This introduced the `<end-of-file>` token, which is not an actual
       sequence of characters. It allows the function body expression at
       the end of the program without a following line break or
       semicolon. Earlier declarations must include a line break or
       semicolon.
- `\v` (vertical tab) and `\f` (form feed) included as line breaks for
   completeness over ASCII (based on `\s` regex class, which agrees with
   Unicode properties over the ASCII range).
- Integer literals can now include underlines as separators.
     - The literal is allowed to terminate with an arbitrarily long
       sequence of separators, though.
     - It would be possible to restrict the last character to be a digit,
       but maybe it is not worth the trouble?
- Introducing hexadecimal integer literals.
     - Hexadecimal literals are allowed to have an arbitrarily long
       sequence of zeroes to the left, after the `0x` prefix. This is
       intentional, and the parser should ignore excess zeroes.
     - The alphabetic characters in the literal should be parsed without
       case-sensitivity. There is no difference between the `0x` and `0X`
       prefixes in literals. If mixed case is not desirable, let the
       linter or formattter restrict that.

```
(* Entry Point *)
<program>             ::= <ows> <function-definition> <ows> <end-of-file>

(* Functions *)
<function-definition> ::= 'fn' <ws> <function-name> <ows> 
<function-parameters> <ows> ':' <ows> <return-type> <ows> <function-body>
<function-name>       ::= <identifier>
<function-parameters> ::= '(' <ows> ')'
<return-type>         ::= <type>
<function-body>       ::= <block> | '=' <ows> <expression> <ows> 
(<end-of-statement> | <end-of-file>)

(* Statements *)
<block>               ::= '{' <ows> <statement> <ows> 
(<end-of-statement> <ows> <statement> <ows>)* <end-of-statement>? <ows> '}'
<end-of-statement>    ::= ';' | <line-break>
<statement>           ::= <return-statement>
<return-statement>    ::= 'return' <ws> <expression>

(* Expressions *)
<expression>          ::= <integer>

(* Identifiers *)
<type>                ::= 'u32'
<identifier>          ::= (<alpha> | '_') (<alpha> | <digit> | '_')*

(* Literals *)
<integer>             ::= <integer-base10> | <integer-base16>
<integer-base10>      ::= #'[1-9]' (<digit> | '_')* | '0'
<integer-base16>      ::= #'0[Xx]' <hex-digit> (<hex-digit> | '_')*

(* Utilities *)
<ws>                  ::= <white-space>+
<ows>                 ::= <white-space>*
<white-space>         ::= <linear-space> | <line-break>
<line-break>          ::= #'[\n\v\f\r]' | '\r\n'
<linear-space>        ::= #'[ \t]'
<alpha>               ::= #'[a-zA-Z]'
<digit>               ::= #'[0-9]'
<hex-digit>           ::= <digit> | #'[a-fA-F]'
<end-of-file>         ::= #'$'
```


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-23 23:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-09  0:05 [RFC PATCH olang v1] docs: create zero programming language specification Johnny Richard
2024-03-08 23:09 ` [olang/patches/.build.yml] build success builds.sr.ht
2024-03-14  4:29   ` Ricardo Kagawa
2024-03-14 22:43     ` Johnny Richard
2024-03-09  0:36 ` [RFC PATCH olang v1] docs: create zero programming language specification Johnny Richard
2024-03-09  5:09 ` Carlos Maniero
2024-03-19 20:21 ` Johnny Richard
2024-03-23 23:31 ` Carlos Maniero
2024-03-15 20:54 Ricardo Kagawa
2024-03-17 15:41 ` Carlos Maniero
2024-03-18  9:58 ` Johnny Richard

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox