[RFC] Namespaces in OLANG - Carlos Maniero

public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed

From: "Carlos Maniero" <carlos@maniero.me>
To: <~johnnyrichard/olang-devel@lists.sr.ht>
Subject: [RFC] Namespaces in OLANG
Date: Sat, 23 Mar 2024 23:46:46 -0300	[thread overview]
Message-ID: <D01MXLCQ33C1.16Z0UFCP9WOF0@maniero.me> (raw)

One of the inconveniences in C programming is the necessity to manually
namespace our functions to prevent conflicts.

Consider the example bellow:

  File: list.h
  ...
  void add(list_t* list, void* value);
  ...

  File: math.h
  ...
  void add(int a, int b);
  ...

  File: main.c
  require <list.h>
  require <math.h>

This will result in a compilation error, as both *list.h* and *math.h* define a
function named *add*.

The reason for this behavior in C is straightforward: C uses function names as
assembly labels. This means that in the example above, there will be an attempt
to create two assembly labels named add, which is not allowed.

How other languages deal with that?
-----------------------------------

Many languages solve this issue by creating custom labels for functions.

Let's examine an example with C++

  int sum(int a, int b) {
      return a + b;
  }

  int main() {
      return sum(1, 2);
  }

Assembly:

  _Z3sumii:
  ...
          movl    %edi, -4(%rbp)
          movl    %esi, -8(%rbp)
          movl    -4(%rbp), %edx
          movl    -8(%rbp), %eax
          addl    %edx, %eax
  ...
          ret
  ...
  main:
  ...
          movl    $2, %esi
          movl    $1, %edi
          call    _Z3sumii
  ...

As you can see, the sum function was named as _Z3sumii. While this is a good
way to deal with name conflicts, it completely breaks C’s compatibility.

Let’s say we need to call this function from C. Assuming that the function name
is deterministic and standard for all C++ compilers, we would need to refer to
this mangled name in C, which could lead to a fragile implementation.

  File: main.c

  extern _Z3sumii(int a, int b);

To make the C++ and C integration more reliable, the *extern* keyword can be
used.

  extern "C" int sum(int a, int b) {
      return a + b;
  }

The example above will produce a assembly function with *sum* label which is
easily referable in C.

  sum:
  ...
          movl    %edi, -4(%rbp)
          movl    %esi, -8(%rbp)
          movl    -4(%rbp), %edx
          movl    -8(%rbp), %eax
          addl    %edx, %eax

However, the use of *extern "C"* comes with its own set of challenges. While it
facilitates the integration of C and C++ code, it simultaneously exposes us to
potential naming collisions. Additionally, it can introduce an element of
distraction within the code.

Using namespaces in olang
-------------------------

Before we begin, I want to ensure that we’re all on the same page regarding the
following points:

1. Deterministic Code Generation: Our goal is to be able to examine an olang
function and precisely predict the assembly code that will be generated.

2. Full Compatibility with C: We aim for seamless integration with C. We don’t
want to introduce features to the language solely for compatibility with C. Any
code compiled by olang should be usable in C without requiring any additional
boilerplate.

3. Manual namespacing is inconvenient: While it’s possible to create a manual
namespace for your functions in C by prefixing them with a namespace, this
approach can be cumbersome and inconvenient.

To address conflicts while still ensuring predictable code generation,
compatibility with C and without the need for manual namespaces, I purpose the
*ns* statement.

  ns olang.core.math

  fn add(a: u32, b: u32): u32 {
    return a + b
  }

This could generate an assembly label called *olang_core_math__add*. Let's
evaluate this solution against our three key criteria:

1. Deterministic Code Generation:
   It is deterministic! The function label is always {ns}__{fn_name}.

2. Full Compatibility with C:

   It is completely compatible with C!

     int olang_core_math__add(int, int);

   If you think it is ugly to call a function that way directly, you can create
   macros in C to improve the readability, but completely optional.

     #define NSMATH(name) olang_core_math__##name

     int NSMATH(add)(int a, int b);

     int main() {
       return NSMATH(add)(2, 3);
     }

3. Manual namespacing is inconvenient: 

   You don't need to manually namespace every function with the cost of start
   every single file with a *ns* statement.

An important observation of the *ns* usage is that it must match the directory
structure. The path of a file that declares the namespace *olang.core.math*
must ends with *olang/core/math.ol*. This requirement is need for future
import resolution.

Alternatives:
-------------

1. Automatically create namespaces based on the filename:

Automatically creating namespaces based on the filename presents a unique set
of challenges. The primary hurdle is determining the starting point of the
namespace. For instance, if we have a file located at */a/b/c/d.ol*, it’s
unclear where the namespace should start.

This is a route that Python has taken, and it’s not uncommon to encounter
developers wrestling with import-related issues as a result.

2. Manual namespaces: ...

Conclusion
----------

In my opinion, the introduction of a namespace statement offers numerous
benefits:

- It aids in resolving function name conflicts.
- It facilitates deterministic code generation while maintaining compatibility
  with C.
- It simplifies the resolution of imports.

These advantages come with the minor stipulation of initiating all files with a
namespace statement, which, in my view, is a small price to pay for the
benefits gained.

Note that *ns* is just a suggestion, we can go with *module* or any other
keyword. I don't know if we have plans to have C++ like namespaces which is a
totally different thing.

Also I started to think about it because I'm working in adding debugging
information on the code we generate and I have the need to represent a d file
in the AST and the program node does not seems appropriated for it.

If we goes with *ns* the olang spect will look like it:

(* Namespace *)
<namespace>           ::= 'ns' <ws> <namespace-name> <ows> <end-of-statement> <program>
<namespace-name>       ::= <identifier>

(* Entry Point *)
<program>             ::= <ows> <function-definition> <ows> <end-of-file>

(* Functions *)
<function-definition> ::= 'fn' <ws> <function-name> <ows>
<function-parameters> <ows> ':' <ows> <return-type> <ows> <function-body>
<function-name>       ::= <identifier>
<function-parameters> ::= '(' <ows> ')'
<return-type>         ::= <type>
<function-body>       ::= <block>

(* Statements *)
<block>               ::= '{' <ows> <statement> <ows> (<end-of-statement>
<ows> <statement> <ows>)* <end-of-statement>? <ows> '}'
<end-of-statement>    ::= ';' | <line-break>
<statement>           ::= <return-statement>
<return-statement>    ::= 'return' <ws> <expression>

(* Expressions *)
<expression>          ::= <integer>

(* Identifiers *)
<type>                ::= 'u32'
<identifier>          ::= (<alpha> | '_') (<alpha> | <digit> | '_')*

(* Literals *)
<integer>             ::= <integer-base10> | <integer-base16>
<integer-base10>      ::= #'[1-9]' (<digit> | '_')* | '0'
<integer-base16>      ::= #'0[Xx]' <hex-digit> (<hex-digit> | '_')*

(* Utilities *)
<ws>                  ::= <white-space>+
<ows>                 ::= <white-space>*
<white-space>         ::= <linear-space> | <line-break>
<line-break>          ::= #'[\n\v\f\r]' | '\r\n'
<linear-space>        ::= #'[ \t]'
<alpha>               ::= #'[a-zA-Z]'
<digit>               ::= #'[0-9]'
<hex-digit>           ::= <digit> | #'[a-fA-F]'
<end-of-file>         ::= #'$'

next             reply	other threads:[~2024-03-24  2:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-24  2:46 Carlos Maniero [this message]
2024-03-27 18:39 ` Johnny Richard
2024-03-28 13:41   ` Carlos Maniero
2024-04-06 16:51     ` Johnny Richard
2024-04-07 20:49       ` Carlos Maniero
2024-04-07 20:58         ` Carlos Maniero
2024-04-08  2:45           ` Carlos Maniero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D01MXLCQ33C1.16Z0UFCP9WOF0@maniero.me \
    --to=carlos@maniero.me \
    --cc=~johnnyrichard/olang-devel@lists.sr.ht \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox