* [RFC SPEC] Primitive data types and arrays @ 2024-04-08 3:29 Carlos Maniero 2024-04-12 7:32 ` Johnny Richard 0 siblings, 1 reply; 11+ messages in thread From: Carlos Maniero @ 2024-04-08 3:29 UTC (permalink / raw) To: ~johnnyrichard/olang-devel This thread tries to specify a basic datatypes of olang. Primitive data types: ===================== Primitive types are types that can be held on a general purpose register. s8 : 8-bit signed integer type. s16 : 16-bit signed integer type. s32 : 32-bit signed integer type. s64 : 64-bit signed integer type. u8 : 8-bit unsigned integer type. u16 : 16-bit unsigned integer type. u32 : 32-bit unsigned integer type. u64 : 64-bit unsigned integer type. f32 : 32-bit floating point type. f64 : 64-bit floating point type. Translation to C: ----------------- s8 : int8_t s16 : int16_t s32 : int32_t s64 : int64_t u8 : uint8_t u16 : uint16_t u32 : uint32_t u64 : uint64_t f32 : int32_t f64 : int64_t C also permits the use of type qualifiers, such as signed int or short int. However, this specification recommends omitting the qualifier for simplicity. In my opinion, this approach is more intuitive. While the meaning of long int can be ambiguous, there’s no ambiguity with int32_t. Example: -------- const x: u32 = 1 Grammar: -------- <primitive-type> ::= 's8'| 's16'| 's32'| 's64'| 'u8'| 'u16'| 'u32'| 'u64'| 'f32'| 'f64' Arrays: ======= An array is a fixed-size collection of similar data items stored in contiguous memory locations. It can be used to store the collection of primitive data types such as int, char, float, etc., and also derived and user-defined data types, structures, etc. Example: -------- const x: u32[] = [1] const y: u32[2] = [1, 2] Grammar: -------- <array-type> ::= <type> <ows> '[' <ows> <number>* <ows> ']' <array-assign> ::= '[' <ows> (<expression> (<ows> ',' <ows> <expression>)* )? <ows> ']' Open question: -------------- I have no idea how to initialize an array with a value. In C I know that this is allowed: int arr[20] = {0}; But I think this is ambiguous since if I remove the number 20 from the statement above it will give me an one-sized array. Translation to C: ----------------- A olang array is just like a C array, no need to translation. Although it differs from C by using square brakets other then curly brakets. That way we could easily differ arrays from structs. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-08 3:29 [RFC SPEC] Primitive data types and arrays Carlos Maniero @ 2024-04-12 7:32 ` Johnny Richard 2024-04-13 2:51 ` Carlos Maniero 0 siblings, 1 reply; 11+ messages in thread From: Johnny Richard @ 2024-04-12 7:32 UTC (permalink / raw) To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel On Mon, Apr 08, 2024 at 12:29:11AM -0300, Carlos Maniero wrote: > This thread tries to specify a basic datatypes of olang. > > Primitive data types: > ===================== > > Primitive types are types that can be held on a general purpose register. > > s8 : 8-bit signed integer type. > s16 : 16-bit signed integer type. > s32 : 32-bit signed integer type. > s64 : 64-bit signed integer type. > u8 : 8-bit unsigned integer type. > u16 : 16-bit unsigned integer type. > u32 : 32-bit unsigned integer type. > u64 : 64-bit unsigned integer type. > f32 : 32-bit floating point type. > f64 : 64-bit floating point type. > > Translation to C: > ----------------- > > s8 : int8_t > s16 : int16_t > s32 : int32_t > s64 : int64_t > u8 : uint8_t > u16 : uint16_t > u32 : uint32_t > u64 : uint64_t > f32 : int32_t > f64 : int64_t I loved it. Out of curiosity, we are going to have _boolean_ and _char_ I believe. Shouldn't they also be included on these primitive spec? > C also permits the use of type qualifiers, such as signed int or short int. > However, this specification recommends omitting the qualifier for simplicity. > In my opinion, this approach is more intuitive. While the meaning of long int > can be ambiguous, there’s no ambiguity with int32_t. I agree. > Arrays: > ======= > > An array is a fixed-size collection of similar data items stored in contiguous > memory locations. It can be used to store the collection of primitive data > types such as int, char, float, etc., and also derived and user-defined data > types, structures, etc. > > Example: > -------- > > const x: u32[] = [1] > const y: u32[2] = [1, 2] > > Grammar: > -------- > > <array-type> ::= <type> <ows> '[' <ows> <number>* <ows> ']' > <array-assign> ::= '[' > <ows> > (<expression> > (<ows> ',' <ows> <expression>)* > )? > <ows> > ']' > > Open question: > -------------- > > I have no idea how to initialize an array with a value. In C I know that this > is allowed: > > int arr[20] = {0}; > > But I think this is ambiguous since if I remove the number 20 from the > statement above it will give me an one-sized array. Yeah, I see... With the syntax you proposed I suggest the following syntax for initialize all elements zeroed: const arr: u8[2] = [...0] It should only work for arrays with size explicitly declared. > Translation to C: > ----------------- > > A olang array is just like a C array, no need to translation. Although it > differs from C by using square brakets other then curly brakets. That way we > could easily differ arrays from structs. Sure. I like it. I think we can split this RFC up into different RFCs. One for discuss primitive types and another one for discussing arrays in general. What do you think? There is other topics we also can discuss about arrays, for example: Array access by index --------------------- The brackets in C has weird behaviour on arrays when accessed by index as listed bellow: int xs[] = { 1, 2 }; xs[0] = 1; 0[xs] = 1; xs[1] = 2; 1[xs] = 2; The weird array access with the number as the first element works because c does pointer arithmetics in the end. xs[i] == *(xs + 1) [i]xs == *(i + xs) I like the C simplicity of translating it to pointers arithmetics, but I think the access using number first is too weird. We should avoid it I believe. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-12 7:32 ` Johnny Richard @ 2024-04-13 2:51 ` Carlos Maniero 2024-04-13 23:31 ` Johnny Richard 0 siblings, 1 reply; 11+ messages in thread From: Carlos Maniero @ 2024-04-13 2:51 UTC (permalink / raw) To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel > I loved it. Out of curiosity, we are going to have _boolean_ and _char_ > I believe. Shouldn't they also be included on these primitive spec? I like it! We could discuss in the near feature if they are or not just type alias for u8. But I also agree they must be built-in without the need of any include. <primitive-type> ::= 's8'| 's16'| 's32'| 's64'| 'u8'| 'u16'| 'u32'| 'u64'| 'f32'| 'f64'| 'char' | 'bool' <expression> ::= <integer> | <identifier> | <boolean> | <char> <boolean> ::= "true" | "false" <char> ::= "'" #'.' "'" Are you comfortable with the above grammar? > Sure. I like it. I think we can split this RFC up into different RFCs. > One for discuss primitive types and another one for discussing arrays in > general. What do you think? LGTM! Let's conclude this one first and them I start a new thread. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-13 2:51 ` Carlos Maniero @ 2024-04-13 23:31 ` Johnny Richard 2024-04-16 3:40 ` Carlos Maniero 0 siblings, 1 reply; 11+ messages in thread From: Johnny Richard @ 2024-04-13 23:31 UTC (permalink / raw) To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel On Fri, Apr 12, 2024 at 11:51:51PM -0300, Carlos Maniero wrote: > > I loved it. Out of curiosity, we are going to have _boolean_ and _char_ > > I believe. Shouldn't they also be included on these primitive spec? > > I like it! We could discuss in the near feature if they are or not just > type alias for u8. But I also agree they must be built-in without the > need of any include. > > <primitive-type> ::= 's8'| 's16'| 's32'| 's64'| 'u8'| > 'u16'| 'u32'| 'u64'| 'f32'| 'f64'| > 'char' | 'bool' > <expression> ::= <integer> | <identifier> | <boolean> | <char> > <boolean> ::= "true" | "false" > <char> ::= "'" #'.' "'" Perhaps _char_ SHOULD have support to escaped chars like \r (carried return), \n (line feed)... Whenever you create the patch, don't forget it. > Are you comfortable with the above grammar? I am wondering if we should also define _void_ as a primitive. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-13 23:31 ` Johnny Richard @ 2024-04-16 3:40 ` Carlos Maniero 2024-04-16 18:34 ` Johnny Richard 0 siblings, 1 reply; 11+ messages in thread From: Carlos Maniero @ 2024-04-16 3:40 UTC (permalink / raw) To: Johnny Richard; +Cc: ~johnnyrichard/olang-devel > Perhaps _char_ SHOULD have support to escaped chars like \r (carried > return), \n (line feed)... Whenever you create the patch, don't forget it. Sure! I will cover this. Thanks! > > Are you comfortable with the above grammar? > > I am wondering if we should also define _void_ as a primitive. I think so. Do you like the name *void*? I don't like that much by I can't think in any alternative. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-16 3:40 ` Carlos Maniero @ 2024-04-16 18:34 ` Johnny Richard 2024-04-17 1:30 ` ricardo_kagawa 0 siblings, 1 reply; 11+ messages in thread From: Johnny Richard @ 2024-04-16 18:34 UTC (permalink / raw) To: Carlos Maniero; +Cc: ~johnnyrichard/olang-devel On Tue, Apr 16, 2024 at 12:40:50AM -0300, Carlos Maniero wrote: > > I am wondering if we should also define _void_ as a primitive. > > I think so. Do you like the name *void*? I don't like that much by I > can't think in any alternative. Let's go with _void_. We are on very early development stage, everything can change anytime. And _void_ is kind of very well known keyword. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-16 18:34 ` Johnny Richard @ 2024-04-17 1:30 ` ricardo_kagawa 2024-04-18 21:53 ` Carlos Maniero 2024-04-20 11:45 ` Johnny Richard 0 siblings, 2 replies; 11+ messages in thread From: ricardo_kagawa @ 2024-04-17 1:30 UTC (permalink / raw) To: ~johnnyrichard/olang-devel > A olang array is just like a C array, no need to translation. Although it Are you sure about this? I mean, as a contiguous, properly sized chunk of memory with indexed access, it looks fine. But in C, an array variable is a pointer to that chunk of memory, and therefore pointer arithmetics would be required to match C arrays. I'm not sure I'd like to deal with pointers. But it's not like I can't, it's just that I know it opens a nasty can of worms that I'm not sure you'd want to deal with as a language designer. > > I loved it. Out of curiosity, we are going to have _boolean_ and _char_ > > I believe. Shouldn't they also be included on these primitive spec? > > I like it! We could discuss in the near feature if they are or not just > type alias for u8. But I also agree they must be built-in without the > need of any include. I like the idea of treating `boolean` and `char` as primitives, but do be careful about what they mean. Obviously, `boolean` can be either `true` or `false`, but what should that mean? If `boolean` is mapped to `u8`, then zero and non-zero? But the real question is what would `char` be? If the language should support Unicode properly, then `char` would represent a _code unit_ rather than a "character", which could be considered a misnomer. Since Unicode uses variable-length characters, a Unicode character might be difficult to represent as just `char`. If no Unicode support is planned, then `char` as `u8` is good enough to represent characters in 7-bit ASCII encoding. > > Perhaps _char_ SHOULD have support to escaped chars like \r (carried > > return), \n (line feed)... Whenever you create the patch, don't forget it. > > Sure! I will cover this. Thanks! If you have plans to support Unicode, then I'd also suggest to include hexadecimal and Unicode escapes, like `\x20` and `\uffef`. > > > I am wondering if we should also define _void_ as a primitive. > > > > I think so. Do you like the name *void*? I don't like that much by I > > can't think in any alternative. > > Let's go with _void_. We are on very early development stage, > everything can change anytime. And _void_ is kind of very well known > keyword. Note that in most languages where there is a `void` type, the `void` type is not actually valid in variable declarations. They are valid only in funtion return types. In C, they are also valid as pointer types (that is, `void* x;` is valid), but IIRC, not as variable types (`void x;` is not valid). In the current version of the spec, it would be included in <return-type>, rather than <type>, to allow it only as a function return type. Also, there are three other types that might be interesting, if I may suggest: `never` (from TypeScript [1]), `unit` (from functional-like languages [2]) and `null` (from ECMAScript specs [3]). [1]: https://www.typescriptlang.org/docs/handbook/2/functions.html#never [2]: https://en.wikipedia.org/wiki/Unit_type [3]: https://tc39.es/ecma262/multipage/overview.html#sec-null-value - `never` would not be that useful without an exception system. - `unit` and `null` would not make much sense at the same time, so it is either one or the other. - `null` would also be more interesting with union types (TypeScript), to define nullable types as the union of a non-nullable type and the `null` type. (C has union types, but they are not related to this.) - I don't really know why an empty tuple would be interesting as the value for the `unit` type, but several languages use this convention. In ECMAScript specs, there is a `null` type that uses the `null` value as its unit value. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-17 1:30 ` ricardo_kagawa @ 2024-04-18 21:53 ` Carlos Maniero 2024-04-24 16:23 ` ricardo_kagawa 2024-04-20 11:45 ` Johnny Richard 1 sibling, 1 reply; 11+ messages in thread From: Carlos Maniero @ 2024-04-18 21:53 UTC (permalink / raw) To: ricardo_kagawa, ~johnnyrichard/olang-devel About arrays, Johnny has suggested to talk about arrays in a different thread, I'm just waiting us to conclude this discussion and I'll start another thread to define Olang's array specification. But we brought excellent points, and maybe we should define pointers before arrays. > Obviously, `boolean` can be either `true` or `false`, but what should > that mean? If `boolean` is mapped to `u8`, then zero and non-zero? IMO, true should be 1 and false 0 in a way that *1 == true* is true and *2 == true* is false. Control flow structures may accept anything not just booleans and may apply the non-zero approach you described, but we can discuss this on their own RFC (that does not exists yet). > But the real question is what would `char` be? If the language should > support Unicode properly, then `char` would represent a _code unit_ > rather than a "character", which could be considered a misnomer. Since > Unicode uses variable-length characters, a Unicode character might be > difficult to represent as just `char`. > > If no Unicode support is planned, then `char` as `u8` is good enough to > represent characters in 7-bit ASCII encoding. I'll be honest with you, It makes a lot of sense all you said, making a char a u8 seems to enforce an Western-Eurocentrism in Olang. But I confess that I never stopped to learn more about unicode. At the same time I think we should support a 32-bit sized unicode char, I don't wanna make all chars an u32 keeping the support to ASCII encoding. IMO, we should either postpone specifying a char right now or assume that a char at this point represents an ASCII char and start a new RFC about unicode where we may define something like an unicode char. BTW, you seem well versed on the unicode theory, would you like to purpose a mechanism to deal with unicode? > Note that in most languages where there is a `void` type, the `void` > type is not actually valid in variable declarations. [...] > > In the current version of the spec, it would be included in > <return-type>, rather than <type>, to allow it only as a function > return type. Agree! > Also, there are three other types that might be interesting, if I may > suggest: `never` (from TypeScript [1]), `unit` (from functional-like > languages [2]) and `null` (from ECMAScript specs [3]). > > [1]: https://www.typescriptlang.org/docs/handbook/2/functions.html#never > [2]: https://en.wikipedia.org/wiki/Unit_type > [3]: https://tc39.es/ecma262/multipage/overview.html#sec-null-value They seems to be very specific, we may wanna to wait until we find an use for them. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-18 21:53 ` Carlos Maniero @ 2024-04-24 16:23 ` ricardo_kagawa 0 siblings, 0 replies; 11+ messages in thread From: ricardo_kagawa @ 2024-04-24 16:23 UTC (permalink / raw) To: ~johnnyrichard/olang-devel; +Cc: carlos > About arrays, Johnny has suggested to talk about arrays in a different > thread, I'm just waiting us to conclude this discussion and I'll start > another thread to define Olang's array specification. But we brought > excellent points, and maybe we should define pointers before arrays. My point was not that you should include pointers in the language, but that you should not match C arrays as you have specified. Of course, you can include pointers if that was your plan all along, but I would rather you did not. > > Obviously, `boolean` can be either `true` or `false`, but what should > > that mean? If `boolean` is mapped to `u8`, then zero and non-zero? > > IMO, true should be 1 and false 0 in a way that *1 == true* is true and > *2 == true* is false. Control flow structures may accept anything not > just booleans and may apply the non-zero approach you described, but we > can discuss this on their own RFC (that does not exists yet). I have my issues regarding that, but let's wait for that new thread. > > But the real question is what would `char` be? If the language should > > support Unicode properly, then `char` would represent a _code unit_ > > rather than a "character", which could be considered a misnomer. Since > > Unicode uses variable-length characters, a Unicode character might be > > difficult to represent as just `char`. > > > > If no Unicode support is planned, then `char` as `u8` is good enough to > > represent characters in 7-bit ASCII encoding. > > I'll be honest with you, It makes a lot of sense all you said, making a > char a u8 seems to enforce an Western-Eurocentrism in Olang. But I > confess that I never stopped to learn more about unicode. > > At the same time I think we should support a 32-bit sized unicode char, > I don't wanna make all chars an u32 keeping the support to ASCII encoding. This is exactly how I feel, except I would stick to UTF-16 (this is what JS uses). Unicode would be a lot more complex to deal with, and totally overkill if you don't have plans to support non-ASCII characters as primitives. But if you do have plans to support it, it might be better to at least avoid making assumptions that could make it difficult to transition to it later. > IMO, we should either postpone specifying a char right now or assume > that a char at this point represents an ASCII char and start a new RFC > about unicode where we may define something like an unicode char. My intent was actually to make you postpone the definition of the `char` type until you have considered this carefully enough. You don't have to decide that right now, but you also don't have to define the `char` type right now either. But if you do intend to support Unicode as `char`, then I would not make it something separate from ASCII, as Unicode is a superset of ASCII. Not a problem if you intend to support Unicode as a separate library (as in C), but I feel it would be weird to have both ASCII and Unicode as primitives if you already have ASCII included in Unicode. > BTW, you seem well versed on the unicode theory, would you like to > purpose a mechanism to deal with unicode? I am not that well versed, I just have a user-level knowledge of Unicode. What I would propose however, is to look at languages that natively support Unicode, like JS. More precisely, not just copy what they do, but also look at what they did wrong and try to do better. In C, `char` is assumed ASCII (it is not actually, but sort of can be) and Unicode seems to be supported through a standard library (I have never used Unicode in C, but I suspect it is related to "wide chars", at least). > > Also, there are three other types that might be interesting, if I may > > suggest: `never` (from TypeScript [1]), `unit` (from functional-like > > languages [2]) and `null` (from ECMAScript specs [3]). > > They seems to be very specific, we may wanna to wait until we find an > use for them. Yeah, I am not suggesting you to include these right now (or at all), just to take them into consideration. I don't know where you are planning to go about your language's design, as details are still lacking at this point. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-17 1:30 ` ricardo_kagawa 2024-04-18 21:53 ` Carlos Maniero @ 2024-04-20 11:45 ` Johnny Richard 2024-04-24 18:45 ` ricardo_kagawa 1 sibling, 1 reply; 11+ messages in thread From: Johnny Richard @ 2024-04-20 11:45 UTC (permalink / raw) To: ricardo_kagawa; +Cc: ~johnnyrichard/olang-devel On Tue, Apr 16, 2024 at 10:30:03PM -0300, ricardo_kagawa@disroot.org wrote: > > A olang array is just like a C array, no need to translation. Although it > > Are you sure about this? I mean, as a contiguous, properly sized chunk > of memory with indexed access, it looks fine. But in C, an array > variable is a pointer to that chunk of memory, and therefore pointer > arithmetics would be required to match C arrays. > > I'm not sure I'd like to deal with pointers. But it's not like I can't, > it's just that I know it opens a nasty can of worms that I'm not sure > you'd want to deal with as a language designer. I really would like to know what you see as nasty. I mean, don't you want to deal with pointer in general? Or you want to segregate the concept of array and pointers? > > > I loved it. Out of curiosity, we are going to have _boolean_ and _char_ > > > I believe. Shouldn't they also be included on these primitive spec? > > > > I like it! We could discuss in the near feature if they are or not just > > type alias for u8. But I also agree they must be built-in without the > > need of any include. > > I like the idea of treating `boolean` and `char` as primitives, but do > be careful about what they mean. > > Obviously, `boolean` can be either `true` or `false`, but what should > that mean? If `boolean` is mapped to `u8`, then zero and non-zero? That's what exactly what I had in mind. Which problems you see with this approach? > But the real question is what would `char` be? If the language should > support Unicode properly, then `char` would represent a _code unit_ > rather than a "character", which could be considered a misnomer. Since > Unicode uses variable-length characters, a Unicode character might be > difficult to represent as just `char`. > > If no Unicode support is planned, then `char` as `u8` is good enough to > represent characters in 7-bit ASCII encoding. Could you please enlighten me the implications of starting with `char` as `u8` alias (7-bit ASCII)? What are the problems we could have if we don't support Unicode properly? > > Let's go with _void_. We are on very early development stage, > > everything can change anytime. And _void_ is kind of very well known > > keyword. > > Note that in most languages where there is a `void` type, the `void` > type is not actually valid in variable declarations. They are valid only > in funtion return types. In C, they are also valid as pointer types > (that is, `void* x;` is valid), but IIRC, not as variable types > (`void x;` is not valid). I'm okay of not using void pointers as long as we have a replacement for it. I still want to have support to define a raw pointer (untyped). > In the current version of the spec, it would be included in > <return-type>, rather than <type>, to allow it only as a function > return type. Yeah, I like it. > Also, there are three other types that might be interesting, if I may > suggest: `never` (from TypeScript [1]), `unit` (from functional-like > languages [2]) and `null` (from ECMAScript specs [3]). > > [1]: https://www.typescriptlang.org/docs/handbook/2/functions.html#never > [2]: https://en.wikipedia.org/wiki/Unit_type > [3]: https://tc39.es/ecma262/multipage/overview.html#sec-null-value > > - `never` would not be that useful without an exception system. The language wont have exception. > - `null` would also be more interesting with union types (TypeScript), > to define nullable types as the union of a non-nullable type and the > `null` type. (C has union types, but they are not related to this.) > > - I don't really know why an empty tuple would be interesting as the > value for the `unit` type, but several languages use this convention. > In ECMAScript specs, there is a `null` type that uses the `null` > value as its unit value. I think this approach lead us to design a complex type system. I understand the value of this, but the cost is high when you want to design a simple language. Regarding `null` I would like to have `null` as an alias to 0 (zero). And we could also have semantic analyses on it. In this case `null` wont be a proper type. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC SPEC] Primitive data types and arrays 2024-04-20 11:45 ` Johnny Richard @ 2024-04-24 18:45 ` ricardo_kagawa 0 siblings, 0 replies; 11+ messages in thread From: ricardo_kagawa @ 2024-04-24 18:45 UTC (permalink / raw) To: ~johnnyrichard/olang-devel > > > A olang array is just like a C array, no need to translation. Although it > > > > Are you sure about this? I mean, as a contiguous, properly sized chunk > > of memory with indexed access, it looks fine. But in C, an array > > variable is a pointer to that chunk of memory, and therefore pointer > > arithmetics would be required to match C arrays. > > > > I'm not sure I'd like to deal with pointers. But it's not like I can't, > > it's just that I know it opens a nasty can of worms that I'm not sure > > you'd want to deal with as a language designer. > > I really would like to know what you see as nasty. I mean, don't you > want to deal with pointer in general? Or you want to segregate the > concept of array and pointers? Both, in fact. As soon as you include pointers in the language, all memory positions in the process become fair game, even for mutation. Which also means that your compiler will not be able to enforce that `const` variables are in fact immutable. It can at most check that known bindings are not assigned to, but they could still be mutated through pointers. Even literal values (as those coming from preprocessor macros) could be mutated at runtime. Any visibility modifier would become advisory rather than compulsory from this point. You just have to locate the correct memory address and write new values or instructions to it. But an array does not need to be handled through pointers as in C, unless you specifically say that arrays match the C implementation. Languages without pointers still handle arrays just fine, they just don't match C arrays. > > Obviously, `boolean` can be either `true` or `false`, but what should > > that mean? If `boolean` is mapped to `u8`, then zero and non-zero? > > That's what exactly what I had in mind. > > Which problems you see with this approach? My issue only lies in the lack of a proper definition of what a `boolean` is, not (yet) in how they are implemented (as that was not yet mentioned). Processors don't handle single-bits very well, so an implementation of the `boolean` type will likely use at least `u8` (and possibly a full word). But that would have to map at least 256 values to a type that only has 2, and obviously there are many ways to do just that. AFAIK, processors should have a "branch if not zero" instruction to handle conditionals based on zero/non-zero booleans, which would make this approach one of the better options (and probably why C uses this definition, ignoring the fact that C has no `boolean` type). On the other hand, I'm not sure that you should allow implicit coersions between `boolean` and `u8` (or between most any types), even if internally they would be equivalent. > > But the real question is what would `char` be? If the language should > > support Unicode properly, then `char` would represent a _code unit_ > > rather than a "character", which could be considered a misnomer. Since > > Unicode uses variable-length characters, a Unicode character might be > > difficult to represent as just `char`. > > > > If no Unicode support is planned, then `char` as `u8` is good enough to > > represent characters in 7-bit ASCII encoding. > > Could you please enlighten me the implications of starting with `char` > as `u8` alias (7-bit ASCII)? What are the problems we could have if we > don't support Unicode properly? You don't have to support Unicode at all at any time, but transitioning from ASCII to Unicode may not be exactly trivial, since Unicode characters have varying width, unlike ASCII, which is fixed to 7 bits. And I'm not saying "varying width" as in "there exist the UTF-8/16/32 variants". Even UTF-8 can represent all Unicode characters, despite assuming characters to be basically ASCII. That is possible because the first bits of each character are used to represent its width. If the first bit is zero, then the remaining 7 bits are in the ASCII range. Otherwise, the number of consecutive 1-bits before the first 0-bit is the number of bytes in the character, so a Unicode character can effectively have an arbitrary length (aligned to 8 bits). The remaining bits after reading that many bytes represent the character's code point, which could be in UTF-16 or UTF-32 range and beyond. UTF-16 uses a different mechanism, but the overall idea is the same, and they can also represent characters in UTF-32 and beyond. And besides having varying width, Unicode characters also have the issues of normalization and validity. Validation seems somewhat simple, but possibly expensive, since AFAIK you just need to check that multi-byte characters are not truncated. Unicode strings are not required to be normalized, but you would need to implement normalization for users to be able to compare them. Some glyphs may have multiple machine representations, and normalization converts any representation to one in particular, so that the more efficient byte-by-byte comparisons yield correct results. For example, there is a "latin small letter a with tilde" character (ã), which is equivalent to the character "latin small letter a" followed by a "combining tilde" character. They represent the same glyph, but have different binary representations and even of different sizes, and would not be considered equal when testing string equality. All that is to say that depending on what assumptions you make while implementing `char` as ASCII, it might be relatively easy or very demanding to transition from ASCII to Unicode. Despite Unicode being a superset of ASCII, the way they work is quite different, especially at low level. > > > Let's go with _void_. We are on very early development stage, > > > everything can change anytime. And _void_ is kind of very well known > > > keyword. > > > > Note that in most languages where there is a `void` type, the `void` > > type is not actually valid in variable declarations. They are valid only > > in funtion return types. In C, they are also valid as pointer types > > (that is, `void* x;` is valid), but IIRC, not as variable types > > (`void x;` is not valid). > > I'm okay of not using void pointers as long as we have a replacement for > it. I still want to have support to define a raw pointer (untyped). Or you could also add `void` to a future <pointer-type>. Just don't add it to <type>. But I'd rather not have pointers. > > In ECMAScript specs, there is a `null` type that uses the `null` > > value as its unit value. > > I think this approach lead us to design a complex type system. I > understand the value of this, but the cost is high when you want to > design a simple language. > > Regarding `null` I would like to have `null` as an alias to 0 (zero). > And we could also have semantic analyses on it. In this case `null` > wont be a proper type. No, I think if you define `null` as an alias to 0 as in C, you won't have the ability to perform semantic analysis on it. In C, the preprocessor will replace the identifier `NULL` with the literal `0` _before_ semantic analysis. AFAIK, NULL only works in C as an invalid memory position to intentionally cause segmentation faults when access is attempted. And that is not even handled by the C compiler, it is an error from the operating system. C will just let you access the zero address, and if the operating system says it's okay, then it's okay. In Java there is no preprocessor (unless you count the annotation processor as a preprocessor), so `null` is a proper value, despite not having a proper type. As such, it is available for semantic analysis for the Java compiler. All classes in Java are assumed "nullable", which works relatively well since all variables for non-primitive types are reference types, where a "null pointer" makes sense, even with (or especially with) C-like semantics. In this case, there is a JVM that can deny access to the `null` address and throw a `NullPointerException`. As I usually say, the complexity doesn't really disappear, it is just moved somewhere else. Even if you don't include this check in the type system, it will be included somewhere else (perhaps manually, in code), as people will still need to perform this check all the time. Especially if you decide to incorporate pointers. For example, in TS, types are not nullable by default, so as long as the type definitions are sound, testing for `x !== null` is usually not needed. But in JS, where the interpreter does not check types statically, you always have to check for `x !== null` at runtime before using `x`. JS is simpler, but this complexity does not really go away. On the other hand, if you do incorporate pointers, the non-nullability of types becomes advisory rather than compulsory, so perhaps not that useful in this case. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-24 18:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-08 3:29 [RFC SPEC] Primitive data types and arrays Carlos Maniero 2024-04-12 7:32 ` Johnny Richard 2024-04-13 2:51 ` Carlos Maniero 2024-04-13 23:31 ` Johnny Richard 2024-04-16 3:40 ` Carlos Maniero 2024-04-16 18:34 ` Johnny Richard 2024-04-17 1:30 ` ricardo_kagawa 2024-04-18 21:53 ` Carlos Maniero 2024-04-24 16:23 ` ricardo_kagawa 2024-04-20 11:45 ` Johnny Richard 2024-04-24 18:45 ` ricardo_kagawa
Code repositories for project(s) associated with this public inbox https://git.johnnyrichard.com/olang.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox