[RFC] Namespaces in OLANG

public inbox for ~johnnyrichard/olang-devel@lists.sr.ht
 help / color / mirror / code / Atom feed

* [RFC] Namespaces in OLANG
@ 2024-03-24  2:46 Carlos Maniero
  2024-03-27 18:39 ` Johnny Richard
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maniero @ 2024-03-24  2:46 UTC (permalink / raw)
  To: ~johnnyrichard/olang-devel

One of the inconveniences in C programming is the necessity to manually
namespace our functions to prevent conflicts.

Consider the example bellow:

  File: list.h
  ...
  void add(list_t* list, void* value);
  ...

  File: math.h
  ...
  void add(int a, int b);
  ...

  File: main.c
  require <list.h>
  require <math.h>

This will result in a compilation error, as both *list.h* and *math.h* define a
function named *add*.

The reason for this behavior in C is straightforward: C uses function names as
assembly labels. This means that in the example above, there will be an attempt
to create two assembly labels named add, which is not allowed.

How other languages deal with that?
-----------------------------------

Many languages solve this issue by creating custom labels for functions.

Let's examine an example with C++

  int sum(int a, int b) {
      return a + b;
  }

  int main() {
      return sum(1, 2);
  }

Assembly:

  _Z3sumii:
  ...
          movl    %edi, -4(%rbp)
          movl    %esi, -8(%rbp)
          movl    -4(%rbp), %edx
          movl    -8(%rbp), %eax
          addl    %edx, %eax
  ...
          ret
  ...
  main:
  ...
          movl    $2, %esi
          movl    $1, %edi
          call    _Z3sumii
  ...

As you can see, the sum function was named as _Z3sumii. While this is a good
way to deal with name conflicts, it completely breaks C’s compatibility.

Let’s say we need to call this function from C. Assuming that the function name
is deterministic and standard for all C++ compilers, we would need to refer to
this mangled name in C, which could lead to a fragile implementation.

  File: main.c

  extern _Z3sumii(int a, int b);

To make the C++ and C integration more reliable, the *extern* keyword can be
used.

  extern "C" int sum(int a, int b) {
      return a + b;
  }

The example above will produce a assembly function with *sum* label which is
easily referable in C.

  sum:
  ...
          movl    %edi, -4(%rbp)
          movl    %esi, -8(%rbp)
          movl    -4(%rbp), %edx
          movl    -8(%rbp), %eax
          addl    %edx, %eax

However, the use of *extern "C"* comes with its own set of challenges. While it
facilitates the integration of C and C++ code, it simultaneously exposes us to
potential naming collisions. Additionally, it can introduce an element of
distraction within the code.

Using namespaces in olang
-------------------------

Before we begin, I want to ensure that we’re all on the same page regarding the
following points:

1. Deterministic Code Generation: Our goal is to be able to examine an olang
function and precisely predict the assembly code that will be generated.

2. Full Compatibility with C: We aim for seamless integration with C. We don’t
want to introduce features to the language solely for compatibility with C. Any
code compiled by olang should be usable in C without requiring any additional
boilerplate.

3. Manual namespacing is inconvenient: While it’s possible to create a manual
namespace for your functions in C by prefixing them with a namespace, this
approach can be cumbersome and inconvenient.

To address conflicts while still ensuring predictable code generation,
compatibility with C and without the need for manual namespaces, I purpose the
*ns* statement.

  ns olang.core.math

  fn add(a: u32, b: u32): u32 {
    return a + b
  }

This could generate an assembly label called *olang_core_math__add*. Let's
evaluate this solution against our three key criteria:

1. Deterministic Code Generation:
   It is deterministic! The function label is always {ns}__{fn_name}.

2. Full Compatibility with C:

   It is completely compatible with C!

     int olang_core_math__add(int, int);

   If you think it is ugly to call a function that way directly, you can create
   macros in C to improve the readability, but completely optional.

     #define NSMATH(name) olang_core_math__##name

     int NSMATH(add)(int a, int b);

     int main() {
       return NSMATH(add)(2, 3);
     }

3. Manual namespacing is inconvenient: 

   You don't need to manually namespace every function with the cost of start
   every single file with a *ns* statement.

An important observation of the *ns* usage is that it must match the directory
structure. The path of a file that declares the namespace *olang.core.math*
must ends with *olang/core/math.ol*. This requirement is need for future
import resolution.

Alternatives:
-------------

1. Automatically create namespaces based on the filename:

Automatically creating namespaces based on the filename presents a unique set
of challenges. The primary hurdle is determining the starting point of the
namespace. For instance, if we have a file located at */a/b/c/d.ol*, it’s
unclear where the namespace should start.

This is a route that Python has taken, and it’s not uncommon to encounter
developers wrestling with import-related issues as a result.

2. Manual namespaces: ...

Conclusion
----------

In my opinion, the introduction of a namespace statement offers numerous
benefits:

- It aids in resolving function name conflicts.
- It facilitates deterministic code generation while maintaining compatibility
  with C.
- It simplifies the resolution of imports.

These advantages come with the minor stipulation of initiating all files with a
namespace statement, which, in my view, is a small price to pay for the
benefits gained.

Note that *ns* is just a suggestion, we can go with *module* or any other
keyword. I don't know if we have plans to have C++ like namespaces which is a
totally different thing.

Also I started to think about it because I'm working in adding debugging
information on the code we generate and I have the need to represent a d file
in the AST and the program node does not seems appropriated for it.

If we goes with *ns* the olang spect will look like it:

(* Namespace *)
<namespace>           ::= 'ns' <ws> <namespace-name> <ows> <end-of-statement> <program>
<namespace-name>       ::= <identifier>

(* Entry Point *)
<program>             ::= <ows> <function-definition> <ows> <end-of-file>

(* Functions *)
<function-definition> ::= 'fn' <ws> <function-name> <ows>
<function-parameters> <ows> ':' <ows> <return-type> <ows> <function-body>
<function-name>       ::= <identifier>
<function-parameters> ::= '(' <ows> ')'
<return-type>         ::= <type>
<function-body>       ::= <block>

(* Statements *)
<block>               ::= '{' <ows> <statement> <ows> (<end-of-statement>
<ows> <statement> <ows>)* <end-of-statement>? <ows> '}'
<end-of-statement>    ::= ';' | <line-break>
<statement>           ::= <return-statement>
<return-statement>    ::= 'return' <ws> <expression>

(* Expressions *)
<expression>          ::= <integer>

(* Identifiers *)
<type>                ::= 'u32'
<identifier>          ::= (<alpha> | '_') (<alpha> | <digit> | '_')*

(* Literals *)
<integer>             ::= <integer-base10> | <integer-base16>
<integer-base10>      ::= #'[1-9]' (<digit> | '_')* | '0'
<integer-base16>      ::= #'0[Xx]' <hex-digit> (<hex-digit> | '_')*

(* Utilities *)
<ws>                  ::= <white-space>+
<ows>                 ::= <white-space>*
<white-space>         ::= <linear-space> | <line-break>
<line-break>          ::= #'[\n\v\f\r]' | '\r\n'
<linear-space>        ::= #'[ \t]'
<alpha>               ::= #'[a-zA-Z]'
<digit>               ::= #'[0-9]'
<hex-digit>           ::= <digit> | #'[a-fA-F]'
<end-of-file>         ::= #'$'

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-03-24  2:46 [RFC] Namespaces in OLANG Carlos Maniero
@ 2024-03-27 18:39 ` Johnny Richard
  2024-03-28 13:41   ` Carlos Maniero
  0 siblings, 1 reply; 7+ messages in thread
From: Johnny Richard @ 2024-03-27 18:39 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

Thank you very much for providing this insightful reading material.

Although I've appended a few comments, I'm hesitant to make significant
decisions at this early stage of language development due to my limited
expertise. However, if you're confident in the direction proposed, I
won't obstruct progress.

On Sat, Mar 23, 2024 at 11:46:46PM -0300, Carlos Maniero wrote:
> Using namespaces in olang
> -------------------------
> 
> Before we begin, I want to ensure that we’re all on the same page regarding the
> following points:
> 
> 1. Deterministic Code Generation: Our goal is to be able to examine an olang
> function and precisely predict the assembly code that will be generated.
> 
> 2. Full Compatibility with C: We aim for seamless integration with C. We don’t
> want to introduce features to the language solely for compatibility with C. Any
> code compiled by olang should be usable in C without requiring any additional
> boilerplate.
> 
> 3. Manual namespacing is inconvenient: While it’s possible to create a manual
> namespace for your functions in C by prefixing them with a namespace, this
> approach can be cumbersome and inconvenient.
> 
> To address conflicts while still ensuring predictable code generation,
> compatibility with C and without the need for manual namespaces, I purpose the
> *ns* statement.
> 
>   ns olang.core.math
> 
>   fn add(a: u32, b: u32): u32 {
>     return a + b
>   }
> 
> This could generate an assembly label called *olang_core_math__add*. Let's
> evaluate this solution against our three key criteria:
> 
> 1. Deterministic Code Generation:
>    It is deterministic! The function label is always {ns}__{fn_name}.
> 
> 2. Full Compatibility with C:
> 
>    It is completely compatible with C!
> 
>      int olang_core_math__add(int, int);
> 
>    If you think it is ugly to call a function that way directly, you can create
>    macros in C to improve the readability, but completely optional.
> 
>      #define NSMATH(name) olang_core_math__##name

I think this is too ugly and very hack.  I would prefer to call
olang_core_math_add instead.

> 3. Manual namespacing is inconvenient: 
> 
>    You don't need to manually namespace every function with the cost of start
>    every single file with a *ns* statement.

If we keep managing names manually, we already have the *1* and *2* for
free.  So, the only benefit of namespacing would be to avoid the
inconvenience of adding it manually.

> An important observation of the *ns* usage is that it must match the directory
> structure. The path of a file that declares the namespace *olang.core.math*
> must ends with *olang/core/math.ol*. This requirement is need for future
> import resolution.
> 
> Alternatives:
> -------------
> 
> 1. Automatically create namespaces based on the filename:

I know we don't have written down nicely the goal of the language, but I
prefer being explicit and avoid convention over configuration.

> 2. Manual namespaces: ...
> 
> Conclusion
> ----------
> 
> In my opinion, the introduction of a namespace statement offers numerous
> benefits:
> 
> - It aids in resolving function name conflicts.
> - It facilitates deterministic code generation while maintaining compatibility
>   with C.

The current suggestion doesn't solve the all compatibility with C.  We
have to provide a way of calling a C function from olang code without
namespacing (in case of namespace being mandatory).

> - It simplifies the resolution of imports.

I would suggest to not go much further with import resolution (unless
you already want to define modules).  Perhaps we could have namespace
doing nothing else than namespacing...

> These advantages come with the minor stipulation of initiating all files with a
> namespace statement, which, in my view, is a small price to pay for the
> benefits gained.

I'm not keen on the idea of enforcing strict adherence to the folder
structure.

How about we introduce a namespace block instead? Within this
block, everything would automatically have the namespace added as a
prefix. This could offer more flexibility while still maintaining
organization.

> Note that *ns* is just a suggestion, we can go with *module* or any other
> keyword. I don't know if we have plans to have C++ like namespaces which is a
> totally different thing.

I think module has a different meaning.  If you want to have modules, for
sure we have to discuss import resolution.  IMHO namespace shouldn't do
anything else than namespacing.

> Also I started to think about it because I'm working in adding debugging
> information on the code we generate and I have the need to represent a d file
> in the AST and the program node does not seems appropriated for it.

Today we can use the "new" <translation-unit> as the main entrypoint for
a source file. 

Could we use this AST node to attach the file path to it?  Or even
better, is there a way of embed the .ol file into the ELF binary?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-03-27 18:39 ` Johnny Richard
@ 2024-03-28 13:41   ` Carlos Maniero
  2024-04-06 16:51     ` Johnny Richard
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maniero @ 2024-03-28 13:41 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

> Thank you very much for providing this insightful reading material.
>
> However, if you're confident in the direction proposed, I
> won't obstruct progress.

I believe it's crucial that we take the necessary time to thoroughly
define the right approach. No need to rush this process. I just wanna
make sure we can clearly outline the direction we want for olang in a
user experience perspective.

By discussing these subject I believe we can clearly write the goal of
the language that yet is still subjective.

> > 2. Full Compatibility with C:
> > 
> >    It is completely compatible with C!
> > 
> >      int olang_core_math__add(int, int);
> > 
> >    If you think it is ugly to call a function that way directly, you can create
> >    macros in C to improve the readability, but completely optional.
> > 
> >      #define NSMATH(name) olang_core_math__##name
>
> I think this is too ugly and very hack.  I would prefer to call
> olang_core_math_add instead.

You quoted the entire text (which I truncated), do you mean the macro is
hacky? Or everything? The macro is just a suggestion.

You mentioned that you would prefer to use *olang_core_math_add*. Did you
mean *olang_core_math__add*, or are you opposed to the use of double
underscores to separate the namespace from the identifier?

> > 3. Manual namespacing is inconvenient: 
> > 
> >    You don't need to manually namespace every function with the cost of start
> >    every single file with a *ns* statement.
>
> If we keep managing names manually, we already have the *1* and *2* for
> free.  So, the only benefit of namespacing would be to avoid the
> inconvenience of adding it manually.

That's partially correct. I mentioned points 1 and 2 because most modern
system languages, such as C++ and Rust, mangle names to avoid conflicts.
However, I made a mistake by not including this solution in the
Alternatives section.

> > An important observation of the *ns* usage is that it must match the directory
> > structure. The path of a file that declares the namespace *olang.core.math*
> > must ends with *olang/core/math.ol*. This requirement is need for future
> > import resolution.
> > 
> > Alternatives:
> > -------------
> > 
> > 1. Automatically create namespaces based on the filename:
>
> I know we don't have written down nicely the goal of the language, but I
> prefer being explicit and avoid convention over configuration.

Agree! That's why namespaced files are great \o/

> > 2. Manual namespaces: ...
> > 
> > Conclusion
> > ----------
> > 
> > In my opinion, the introduction of a namespace statement offers numerous
> > benefits:
> > 
> > - It aids in resolving function name conflicts.
> > - It facilitates deterministic code generation while maintaining compatibility
> >   with C.
>
> The current suggestion doesn't solve the all compatibility with C.  We
> have to provide a way of calling a C function from olang code without
> namespacing (in case of namespace being mandatory).

Good catch! In my opinion, we should follow C's approach on this matter.

  extern fn pow(base: u32, power: u32)

In this case, the extern identifier matches exactly with the assembly
symbol. Please note that the extern statement is merely a semantic tool;
it does not generate any code.

Do you think that namespaces translation in between C and olang are
necessary? In our arena implementation, all functions have the *arena_*
prefix. By using *extern* the way I'm proposing we will call these
functions in olang with their exactly name, ie, *arena_alloc* will be
called using *arena_alloc* not just *alloc*. IMO, it is ok since it is
an external.

Do you think that translating namespaces between C and olang is
necessary? In our arena implementation, all functions have the *arena_*
prefix. By using *extern* in the way I'm proposing, we will call these
functions in olang by their exact names. For instance, *arena_alloc*
will be invoked as *arena_alloc*, not just *alloc*. In my opinion, this
is acceptable since it is an external function.

> > - It simplifies the resolution of imports.
>
> I would suggest to not go much further with import resolution (unless
> you already want to define modules).  Perhaps we could have namespace
> doing nothing else than namespacing...
>

If by "modules" you are referring to the file level, and not to
something like packages or libraries, then that is exactly what I want
to define! Influenced by Clojure, I recommended calling it a
"namespace". However, I believe that naming it ‘mod' or ‘module' is more
suitable for its purpose.

  mod olang.core.math

  fn add(a: u32, b: u32) {
    return a + b
  }

> > These advantages come with the minor stipulation of initiating all files with a
> > namespace statement, which, in my view, is a small price to pay for the
> > benefits gained.
>
> I'm not keen on the idea of enforcing strict adherence to the folder
> structure.
>
> How about we introduce a namespace block instead? Within this
> block, everything would automatically have the namespace added as a
> prefix. This could offer more flexibility while still maintaining
> organization.

Don't you think that in practice almost every single file will
namespace? C++ follows this pattern, and look at this Qt mirror [1], 6k
files, all namespaced, they even created a macro to facilitate the
work.

[1] https://github.com/search?q=repo%3Aradekp%2Fqt+%2FQT_BEGIN_NAMESPACE%5Cn%2F&type=code&p=1

> I think module has a different meaning.  If you want to have modules,
> for sure we have to discuss import resolution.  IMHO namespace shouldn't
> do anything else than namespacing.

I believe you're right. It's almost impossible to discuss modules
without bringing up imports. To me, the way C handles this is one of the
most painful things in my life (hehe).

The main issue is that I never know where something is coming from,
which is especially painful when I'm trying to replicate something I've
already done. This also, often leads to unused includes over time
because it's hard to determine if an include is actually being used.

If we abandon modules and just go with C++-like namespaces, I believe we
we may want to endure C's painful include system. This is because the
language won't have control over function names. The way the include
system is designed sends a message to developers that including a file
is akin to concatenating all the definitions into a single file. But
yet I think it would be ok to have names imports even if we don't
control the language names but it would be just a semantic tool.

Named Imports
-------------

  mod myprog

  import olang.core.math

  fn main(): u32 {
    return olang.core.math::sum(1, 2)
  }

And even associate identifiers to it.

  mod myprog

  import olang.core.math as ocm

  fn main(): u32 {
    return ocm::sum(1, 2)
  }

Note that there is no actually difference in between mangling and my
module purpose, except the fact modules generates deterministic and
friendly names that can be easily used in C and also easy to gen code,
once to generate the assembly symbol of *olang.core.math::sum* we can
just replace dots by underscores and double column to double
underscores.

External Linkage
----------------

We probably don't wanna to make all function global for external
linkage. So we may need a visibility keyword. And to me, everything that
can be imported should also be available for external linkage, even if
we decide to do not generate an object file per module. I would
recommend the usage of *export* or *pub*. I like *export* better.

Note that this is required no meter how we decide to handle imports.

No mangling
-----------

But what if I really need something to have the exact name? Lets say you
wanna integrate with a bootloader that is integrated on the link process and
expects a symbol called *kmain* to jump into?

Ok, I admit, in that case namespaces it is gonna be a pain in the ass. But
the good news is that these are exceptional, you don't need this for
your entire application but only for a few functions.

I was wondering that we could have a *global* keyword where everything
that is global assumes its own name and is always have public visibility.

  mod myprog

  import olang.core.math as ocm

  global fn main(): u32 {
    return ocm::sum(1, 2)
  }

If we decided that we wanna both for the language *mod* and *ns* we
could even have a global ns.

  mod myprog

  import olang.core.math as ocm

  ns global {
    fn main(): u32 {
      return ocm::sum(1, 2)
    }
  }

Summarizing
-----------

We have a few options in the table.

1. Use the names the way they are. (C approach)

Pros:
- Simple, no magic, it is what it is.
- Easy to produce debug info since the assembly symbol will be the
  function name.

Cons:
- More challenge to keep the code out of name conflicts in large
  codebases.
- Requires developers to manually namespace functions.

2. Use mangled names (C++, Rust approach)

Pros:
- Keep the code out of naming conflicts

Cons:
- Non deterministic names
- Since the function name is usually non deterministic, you are required
  to use a no-mangle statement to integrate with C.
- More debug info is required.

3. Use modules (Zig approach (I think))

Pros:
- Keep the code out of naming conflicts
- Deterministic assembly symbols permit integrate with C without any
  magic, you just need to follow the convention ns__fn.

Cons:
- If you really need to have a specific name for your function you gonna
  need a no-mangle approach.
- More debug info is required.
- It is not entirely free of name conflicts, you can force a conflict by
  create function that starts with double underscores which is not
  recommended by C, since these names are reserved.

I haven't talked too much about zig, but here goes a fun fact, zig uses
dot in their names which solves the conflict name I described above once
you cannot create an identifier that contains a dot, but it also makes
non viable the C integration without an ABI.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-03-28 13:41   ` Carlos Maniero
@ 2024-04-06 16:51     ` Johnny Richard
  2024-04-07 20:49       ` Carlos Maniero
  0 siblings, 1 reply; 7+ messages in thread
From: Johnny Richard @ 2024-04-06 16:51 UTC (permalink / raw)
  To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel

Thanks for the reply. 

I understand you want to build a very complex build system and module
system because you are frustrated on how C tackle these problems.  To
be honest I prefer to keep these problems stupid simple as C does.  This
would also reduce the impact of integrating with C (I mean, almost no
effort).

But if you really want to define right now how we organize the files no
problem, pick the one you think is the best, I have no knowledge to
contribute here.  But I can share my thoughts on this anyway.

On Thu, Mar 28, 2024 at 10:41:20AM -0300, Carlos Maniero wrote:
> > > 2. Full Compatibility with C:
> > > 
> > >    It is completely compatible with C!
> > > 
> > >      int olang_core_math__add(int, int);
> > > 
> > >    If you think it is ugly to call a function that way directly, you can create
> > >    macros in C to improve the readability, but completely optional.
> > > 
> > >      #define NSMATH(name) olang_core_math__##name
> >
> > I think this is too ugly and very hack.  I would prefer to call
> > olang_core_math_add instead.
> 
> You quoted the entire text (which I truncated), do you mean the macro is
> hacky? Or everything? The macro is just a suggestion.

Yeah, I was talking about the macro suggestion.

> You mentioned that you would prefer to use *olang_core_math_add*. Did you
> mean *olang_core_math__add*, or are you opposed to the use of double
> underscores to separate the namespace from the identifier?

Not actually.  I think it's okay if we keep double underscore to
separate namespaces from function. 

> > > 3. Manual namespacing is inconvenient: 
> > > 
> > >    You don't need to manually namespace every function with the cost of start
> > >    every single file with a *ns* statement.
> >
> > If we keep managing names manually, we already have the *1* and *2* for
> > free.  So, the only benefit of namespacing would be to avoid the
> > inconvenience of adding it manually.
> 
> That's partially correct. I mentioned points 1 and 2 because most modern
> system languages, such as C++ and Rust, mangle names to avoid conflicts.
> However, I made a mistake by not including this solution in the
> Alternatives section.

Sure, no problems.
 
> > > An important observation of the *ns* usage is that it must match the directory
> > > structure. The path of a file that declares the namespace *olang.core.math*
> > > must ends with *olang/core/math.ol*. This requirement is need for future
> > > import resolution.
> > > 
> > > Alternatives:
> > > -------------
> > > 
> > > 1. Automatically create namespaces based on the filename:
> >
> > I know we don't have written down nicely the goal of the language, but I
> > prefer being explicit and avoid convention over configuration.
> 
> Agree! That's why namespaced files are great \o/

Not actually.  I mean, adding suffix to the binary symbol automatically
based on directory structure is kind of convention over configuration.

> > > 2. Manual namespaces: ...
> > > 
> > > Conclusion
> > > ----------
> > > 
> > > In my opinion, the introduction of a namespace statement offers numerous
> > > benefits:
> > > 
> > > - It aids in resolving function name conflicts.
> > > - It facilitates deterministic code generation while maintaining compatibility
> > >   with C.
> >
> > The current suggestion doesn't solve the all compatibility with C.  We
> > have to provide a way of calling a C function from olang code without
> > namespacing (in case of namespace being mandatory).
> 
> Good catch! In my opinion, we should follow C's approach on this matter.
> 
>   extern fn pow(base: u32, power: u32)

Sure, I am okay with this one.

> Do you think that namespaces translation in between C and olang are
> necessary? In our arena implementation, all functions have the *arena_*
> prefix. By using *extern* the way I'm proposing we will call these
> functions in olang with their exactly name, ie, *arena_alloc* will be
> called using *arena_alloc* not just *alloc*. IMO, it is ok since it is
> an external.

I prefer to follow exactly the same extern name in this case.

> > > - It simplifies the resolution of imports.
> >
> > I would suggest to not go much further with import resolution (unless
> > you already want to define modules).  Perhaps we could have namespace
> > doing nothing else than namespacing...
> 
> If by "modules" you are referring to the file level, and not to
> something like packages or libraries, then that is exactly what I want
> to define! Influenced by Clojure, I recommended calling it a
> "namespace". However, I believe that naming it ‘mod' or ‘module' is more
> suitable for its purpose.
> 
>   mod olang.core.math
> 
>   fn add(a: u32, b: u32) {
>     return a + b
>   }

Not sure if I understand what you meant here. Maybe would be better
clarify what you understands as module.  This concept is still blurry to me.

> > > These advantages come with the minor stipulation of initiating all files with a
> > > namespace statement, which, in my view, is a small price to pay for the
> > > benefits gained.
> >
> > I'm not keen on the idea of enforcing strict adherence to the folder
> > structure.
> >
> > How about we introduce a namespace block instead? Within this
> > block, everything would automatically have the namespace added as a
> > prefix. This could offer more flexibility while still maintaining
> > organization.
> 
> Don't you think that in practice almost every single file will
> namespace? C++ follows this pattern, and look at this Qt mirror [1], 6k
> files, all namespaced, they even created a macro to facilitate the
> work.
> 
> [1] https://github.com/search?q=repo%3Aradekp%2Fqt+%2FQT_BEGIN_NAMESPACE%5Cn%2F&type=code&p=1

I prefer being explicit in this case (even if we need to write it for
every file).  If you land on a random file, you can say for what the
binary the code will translate to without checking the folder structure.
(Dir tree organization is completely up to the developers). 

> > I think module has a different meaning.  If you want to have modules,
> > for sure we have to discuss import resolution.  IMHO namespace shouldn't
> > do anything else than namespacing.
> 
> I believe you're right. It's almost impossible to discuss modules
> without bringing up imports. To me, the way C handles this is one of the
> most painful things in my life (hehe).
> 
> The main issue is that I never know where something is coming from,
> which is especially painful when I'm trying to replicate something I've
> already done. This also, often leads to unused includes over time
> because it's hard to determine if an include is actually being used.

Could you please enlighten me the unused definitions problem?  Does it
have any implications with performance?

If this is so bad, why we cannot make the compiler smart enough to
detect them?

> If we abandon modules and just go with C++-like namespaces, I believe we
> we may want to endure C's painful include system. This is because the
> language won't have control over function names. The way the include
> system is designed sends a message to developers that including a file
> is akin to concatenating all the definitions into a single file. But
> yet I think it would be ok to have names imports even if we don't
> control the language names but it would be just a semantic tool.
> 
> Named Imports
> -------------
> 
>   mod myprog
> 
>   import olang.core.math
> 
>   fn main(): u32 {
>     return olang.core.math::sum(1, 2)
>   }
> 
> And even associate identifiers to it.
> 
>   mod myprog
> 
>   import olang.core.math as ocm
> 
>   fn main(): u32 {
>     return ocm::sum(1, 2)
>   }
> 
> Note that there is no actually difference in between mangling and my
> module purpose, except the fact modules generates deterministic and
> friendly names that can be easily used in C and also easy to gen code,
> once to generate the assembly symbol of *olang.core.math::sum* we can
> just replace dots by underscores and double column to double
> underscores.
> 
> External Linkage
> ----------------
> 
> We probably don't wanna to make all function global for external
> linkage.

Could you please clarify what you mean with external linkage?

> So we may need a visibility keyword.

The C already has this functionality, the *static* keyword...
Everything else is "public" by default.  Do you see a problem on
following the same pattern?

> And to me, everything that can be imported should also be available
> for external linkage, even if we decide to do not generate an object
> file per module. I would recommend the usage of *export* or *pub*. I
> like *export* better.

Why this has to be so complex?  Seems like we would have to implement a
very complex system for compile a program with multiple files.  I would
love to have something very simple where you compile down to a `.o` file
and the linking is done simple as GAS or GCC does.

> Note that this is required no meter how we decide to handle imports.
> 
> No mangling
> -----------
> 
> But what if I really need something to have the exact name? Lets say you
> wanna integrate with a bootloader that is integrated on the link process and
> expects a symbol called *kmain* to jump into?
> 
> Ok, I admit, in that case namespaces it is gonna be a pain in the ass. But
> the good news is that these are exceptional, you don't need this for
> your entire application but only for a few functions.
> 
> I was wondering that we could have a *global* keyword where everything
> that is global assumes its own name and is always have public visibility.

You are assuming that all files has a mandatory namespace.  I still
prefer the namespace being optional.  The default behavior would be
every symbol wont have prefix. (1 to 1) with ELF binary symbols.

IMHO this is what makes the language simple as C.  Adding extra
functionality to protect the programming doing stuff and enabling
functionality with a lot of gymnastics feels like an unpleasant
experience.

> Summarizing
> -----------
> 
> We have a few options in the table.
> 
> 1. Use the names the way they are. (C approach)
> 
> Pros:
> - Simple, no magic, it is what it is.
> - Easy to produce debug info since the assembly symbol will be the
>   function name.

- No build gymnastics as well. The *make* program would solve our
  problems for free (Even with partial compilation).

> 
> Cons:
> - More challenge to keep the code out of name conflicts in large
>   codebases.
> - Requires developers to manually namespace functions.
> 
> 2. Use mangled names (C++, Rust approach)

The mangled names are basically to enable function overload.  I think we
don't want this functionality on the language.  I think we can ignore
this one.

> 3. Use modules (Zig approach (I think))
> 
> Pros:
> - Keep the code out of naming conflicts
> - Deterministic assembly symbols permit integrate with C without any
>   magic, you just need to follow the convention ns__fn.
> 
> Cons:
> - If you really need to have a specific name for your function you gonna
>   need a no-mangle approach.

Why? I mean you can still have the namespace and the freedom to use
namespace where you want.  Are you assuming the namespace is mandatory
here?

> - More debug info is required.
> - It is not entirely free of name conflicts, you can force a conflict by
>   create function that starts with double underscores which is not
>   recommended by C, since these names are reserved.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-04-06 16:51     ` Johnny Richard
@ 2024-04-07 20:49       ` Carlos Maniero
  2024-04-07 20:58         ` Carlos Maniero
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maniero @ 2024-04-07 20:49 UTC (permalink / raw)
  To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel

I confess that this is my last attempt to explain what I'm
suggesting. I believe neither you and me have energy to continue
discussing this subject. So if you don't get it I believe we could just
drop it.

> > > > - It simplifies the resolution of imports.
> > >
> > > I would suggest to not go much further with import resolution (unless
> > > you already want to define modules).  Perhaps we could have namespace
> > > doing nothing else than namespacing...
> > 
> > If by "modules" you are referring to the file level, and not to
> > something like packages or libraries, then that is exactly what I want
> > to define! Influenced by Clojure, I recommended calling it a
> > "namespace". However, I believe that naming it ‘mod' or ‘module' is more
> > suitable for its purpose.
> > 
> >   mod olang.core.math
> > 
> >   fn add(a: u32, b: u32) {
> >     return a + b
> >   }
>
> Not sure if I understand what you meant here. Maybe would be better
> clarify what you understands as module.  This concept is still blurry to me.

module = namespaced files.

All symbols inside a file will share the same namespace implying the
symbols of the object file will start with the name you gave to the
module. I gave some examples bellow.

> > > > These advantages come with the minor stipulation of initiating all files with a
> > > > namespace statement, which, in my view, is a small price to pay for the
> > > > benefits gained.
> > >
> > > I'm not keen on the idea of enforcing strict adherence to the folder
> > > structure.
> > >
> > > How about we introduce a namespace block instead? Within this
> > > block, everything would automatically have the namespace added as a
> > > prefix. This could offer more flexibility while still maintaining
> > > organization.
> > 
> > Don't you think that in practice almost every single file will
> > namespace? C++ follows this pattern, and look at this Qt mirror [1], 6k
> > files, all namespaced, they even created a macro to facilitate the
> > work.
> > 
> > [1] https://github.com/search?q=repo%3Aradekp%2Fqt+%2FQT_BEGIN_NAMESPACE%5Cn%2F&type=code&p=1
>
> I prefer being explicit in this case (even if we need to write it for
> every file).  If you land on a random file, you can say for what the
> binary the code will translate to without checking the folder structure.
> (Dir tree organization is completely up to the developers). 

The approach I suggested allows you to explicitly declare the module in
each file. This way, anyone who opens a file can immediately understand
what the code will translate to, without having to check the folder
structure.

  1. mod olang.core.math # <--- THIS LINE IS PART OF THE FILE
  2. 
  3. fn add(a: u32, b: u32) {
  4.   return a + b
  5. }

I never suggested we should automatically create namespaces based on the
file structure. Actually I described this approach in the Alternative
section "1. Automatically create namespaces based on the filename" which
I may not make it clear enough, but I dislike a lot.

> If you land on a random file, you can say for what the
> binary the code will translate to without checking the folder structure

In the code organization example I provided, the first line of each file
explicitly declares the module. Just check the first line and done! Just
like java/golang package.

> (Dir tree organization is completely up to the developers). 

And they are!

Let's say you have this code organization:

  - shared/
    - arena.ol {mod arena}
    - list.ol {mod list}
    - hashmap.ol {mod hashmap}
  - fe/
    - ast {mod fe.ast}
    - parser {mod fe.parser}
  - main.ol {mod main}

Note that everything inside {} is the first line of the file. To compile
this program we could use the follow command line:

  olang -I shared/ -o main main.ol

Considering the *main.ol* start with following lines:

  mod main

  import arena
  import list
  import hashmap
  import fe.ast
  import fe.parser

The olang compiler can automatically find the files inside the *fe*
directory, because they match the file structure. But *arena*, *list*,
and *hashmap* do not follow the file structure, so you should specify
the directory where you can find them.

The only convention is that dots in the module name mean slashes in the
path location. This is required to avoid the ambiguity of having slashes
meaning both division and module names.

Although, In my opinion, if a developer creates a file named
*veggies/zucchini.ol* and attempts to declare the module as *mod
veg.zuc*, it would lead to confusion and errors. This is because the
module name does not align with the file structure.

A practical solution in this scenario would be to rename the file to
*veg_zuc.ol* and declare the module as *mod veg_zuc*. This way, the
module name matches the filename, making it easier to understand and
manage.

When compiling, you would specify the directory of *veg_zuc.ol* in the
olang command. The generated symbols would remain largely the same (once
*veg.zuc* translates to *veg_zuc*), maintaining the integrity of the
structure.

In practice is up to you how you use modules. You can create modules
matching your entire project structure and the compiler will make the
work without any extra argument, or you can create any structure you
want and declare the include dir.

> > > I think module has a different meaning.  If you want to have modules,
> > > for sure we have to discuss import resolution.  IMHO namespace shouldn't
> > > do anything else than namespacing.
> > 
> > I believe you're right. It's almost impossible to discuss modules
> > without bringing up imports. To me, the way C handles this is one of the
> > most painful things in my life (hehe).
> > 
> > The main issue is that I never know where something is coming from,
> > which is especially painful when I'm trying to replicate something I've
> > already done. This also, often leads to unused includes over time
> > because it's hard to determine if an include is actually being used.
>
> Could you please enlighten me the unused definitions problem?  Does it
> have any implications with performance?
>
> If this is so bad, why we cannot make the compiler smart enough to
> detect them?

I don't know what is the implication in terms of performance. I guess
the object file may be larger if the header file has more than
prototypes. But if you find unused includes no big deal in code
maintenance, I have nothing else to say.

> > If we abandon modules and just go with C++-like namespaces, I believe we
> > we may want to endure C's painful include system. This is because the
> > language won't have control over function names. The way the include
> > system is designed sends a message to developers that including a file
> > is akin to concatenating all the definitions into a single file. But
> > yet I think it would be ok to have names imports even if we don't
> > control the language names but it would be just a semantic tool.
> > 
> > Named Imports
> > -------------
> > 
> >   mod myprog
> > 
> >   import olang.core.math
> > 
> >   fn main(): u32 {
> >     return olang.core.math::sum(1, 2)
> >   }
> > 
> > And even associate identifiers to it.
> > 
> >   mod myprog
> > 
> >   import olang.core.math as ocm
> > 
> >   fn main(): u32 {
> >     return ocm::sum(1, 2)
> >   }
> > 
> > Note that there is no actually difference in between mangling and my
> > module purpose, except the fact modules generates deterministic and
> > friendly names that can be easily used in C and also easy to gen code,
> > once to generate the assembly symbol of *olang.core.math::sum* we can
> > just replace dots by underscores and double column to double
> > underscores.
> > 
> > External Linkage
> > ----------------
> > 
> > We probably don't wanna to make all function global for external
> > linkage.
>
> Could you please clarify what you mean with external linkage?

  When you write an implementation file (.cpp, .cxx, etc) your compiler
  generates a translation unit. This is the source file from your
  implementation plus all the headers you #included in it.

  Internal linkage refers to everything only in scope of a translation
  unit.

  External linkage refers to things that exist beyond a particular
  translation unit. In other words, accessible through the whole program,
  which is the combination of all translation units (or object files).

  https://stackoverflow.com/questions/1358400/what-is-external-linkage-and-internal-linkage

But looking at you answer above, it seems you understood the concept.

> > So we may need a visibility keyword.
>
> The C already has this functionality, the *static* keyword...
> Everything else is "public" by default.  Do you see a problem on
> following the same pattern?

I prefer everything to be private by default to prevent leaking of
implementation details which leads in less symbols in the object file.

> > And to me, everything that can be imported should also be available
> > for external linkage, even if we decide to do not generate an object
> > file per module. I would recommend the usage of *export* or *pub*. I
> > like *export* better.
>
> Why this has to be so complex?  Seems like we would have to implement a
> very complex system for compile a program with multiple files.  I would
> love to have something very simple where you compile down to a `.o` file
> and the linking is done simple as GAS or GCC does.

I said and I quote: 

> > ... even if we decide to do not generate an object
> > file per module

"EVEN IF". I'm not suggesting us to do anything other than create a *.o*
file. I'm not suggesting anything complex I don't know what makes you
understand that.

Basically what I said is: Let's do not make anything public by default.

I just mentioned mentioned that we may don't wanna to generate an object
file per module because olang has no header file. Meaning that we we
always need to examine all files of the project anyway. But it just
created more noise at the discussion, so you can ignore it.

>
> > Note that this is required no meter how we decide to handle imports.
> > 
> > No mangling
> > -----------
> > 
> > But what if I really need something to have the exact name? Lets say you
> > wanna integrate with a bootloader that is integrated on the link process and
> > expects a symbol called *kmain* to jump into?
> > 
> > Ok, I admit, in that case namespaces it is gonna be a pain in the ass. But
> > the good news is that these are exceptional, you don't need this for
> > your entire application but only for a few functions.
> > 
> > I was wondering that we could have a *global* keyword where everything
> > that is global assumes its own name and is always have public visibility.
>
> You are assuming that all files has a mandatory namespace.  I still
> prefer the namespace being optional.  The default behavior would be
> every symbol wont have prefix. (1 to 1) with ELF binary symbols.
>
> IMHO this is what makes the language simple as C.  Adding extra
> functionality to protect the programming doing stuff and enabling
> functionality with a lot of gymnastics feels like an unpleasant
> experience.

You gonna need to make a lot of gymnastic. Actually we do a lot of
gymnastic every single time we prefix a C function to avoid conflicts.
And many projects create macros to do the same job.

I'm purposing that the language should provide the appropriated tooling
for it. But I agree that we could make modules optional and you could
provide the include directory in this case.

> > Summarizing
> > -----------
> > 
> > We have a few options in the table.
> > 
> > 1. Use the names the way they are. (C approach)
> > 
> > Pros:
> > - Simple, no magic, it is what it is.
> > - Easy to produce debug info since the assembly symbol will be the
> >   function name.
>
> - No build gymnastics as well. The *make* program would solve our
>   problems for free (Even with partial compilation).

No gymnastics in the modules suggestion. You can use make. You can make
partial compilation. Never purpose the oppose.

> > 
> > Cons:
> > - More challenge to keep the code out of name conflicts in large
> >   codebases.
> > - Requires developers to manually namespace functions.
> > 
> > 2. Use mangled names (C++, Rust approach)
>
> The mangled names are basically to enable function overload.  I think we
> don't want this functionality on the language.  I think we can ignore
> this one.

It is used to "enable function overload" but it isn't the only thing it
solves. By mangled names I mean any strategy that generates names to
avoid conflicts. It could even be a random string.

  Name mangling

  The names used in object file symbol tables and in linking are often not
  the same names used in the source programs from which the object files
  were compiled. There are three reasons for this: avoiding name
  collisions, name overloading, and type checking. The process of turning
  the source program names into the object file names is called name
  mangling. This section discusses mangling typically done to names in C,
  Fortran, and C++ programs.

  https://archive.ph/20130126113557/http://www.iecc.com/linker/linker05.html#selection-481.0-487.90

>
> > 3. Use modules (Zig approach (I think))
> > 
> > Pros:
> > - Keep the code out of naming conflicts
> > - Deterministic assembly symbols permit integrate with C without any
> >   magic, you just need to follow the convention ns__fn.
> > 
> > Cons:
> > - If you really need to have a specific name for your function you gonna
> >   need a no-mangle approach.
>
> Why? I mean you can still have the namespace and the freedom to use
> namespace where you want.  Are you assuming the namespace is mandatory
> here?

As I answered early in this thread a module is a namespaced file. If you
need a function called *zucchini* and this function lies inside a file
*veggies*/mod *veggies*, the *zucchini* file inside the object file
will be called *veggies__zucchini*.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-04-07 20:49       ` Carlos Maniero
@ 2024-04-07 20:58         ` Carlos Maniero
  2024-04-08  2:45           ` Carlos Maniero
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maniero @ 2024-04-07 20:58 UTC (permalink / raw)
  To: Carlos Maniero, Johnny Richard; +Cc: ~johnnyrichard/olang-devel

> So if you don't get it I believe we could just
> drop it.
Let me try to fix my tone here! I was ruge.

I mean, If we could not agree, I think we could drop it and try to
discuss it later in the future.

I confess I started to fell offended by you named the solution complex
where IMO it is pretty simple actually.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Namespaces in OLANG
  2024-04-07 20:58         ` Carlos Maniero
@ 2024-04-08  2:45           ` Carlos Maniero
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos Maniero @ 2024-04-08  2:45 UTC (permalink / raw)
  To: Carlos Maniero, Johnny Richard; +Cc: ~johnnyrichard/olang-devel

Johnny and I have been discussing this subject outside of this thread and it
seems we weren’t able to make much progress due to certain communication
anti-patterns.

Initially, I was attempting to address multiple issues simultaneously:

- Name collision
- Import resolution
- Namespaces
- External linking
- Assembly symbol generation

Even though these subjects are interconnected, it might be more effective to
discuss each one individually.

Additionally, I realized that I was somewhat attached to the solution I had
designed during my first attempt at creating a programming language (Mars).
This may not necessarily be the most suitable solution for our current
situation.

We are still trying to learn how to properly communicate over email. Given
these considerations, I believe it would be beneficial to start a new thread
where we can address these issues individually.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-08  2:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-24  2:46 [RFC] Namespaces in OLANG Carlos Maniero
2024-03-27 18:39 ` Johnny Richard
2024-03-28 13:41   ` Carlos Maniero
2024-04-06 16:51     ` Johnny Richard
2024-04-07 20:49       ` Carlos Maniero
2024-04-07 20:58         ` Carlos Maniero
2024-04-08  2:45           ` Carlos Maniero

Code repositories for project(s) associated with this public inbox

	https://git.johnnyrichard.com/olang.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox