Hi,
Schol-R-LEA wrote:
Brendan wrote:
For this case; the compiler would see "x = x + 1;" and evaluate the range of the result of the expression on the right hand side (if x ranges from 0 to 999 then x+1 must have a range from 1 to 1000). Then (for assignment) the compiler checks that the left hand side is able to store that range of values, and generates a compile time error because x can only store a value from 0 to 999.
The programmer would have to fix the error. This might mean doing "x = (x + 1) % (x.max + 1)" if they want wrapping, or doing "x = min(x+1, x.max);" if they want saturation, or adding a check immediately before it, or adding a check somewhere else entirely, or increasing the range of values that x can hold, or changing it to "x2 = x + 1;", or ....
If the programmer does happen to do something like "if(x >= x.max) { return FAILED; }" then this is no different to any branch in any language. It's not a run-time check inserted by the compiler itself.
OK, this gives me some idea of your intentions. From this, I gather your intent is that the compiler flag an error (or at least a warning) on invalid possibles. This seems reasonable to me, and I had much the same thing in mind (though only for builds with a version status of 'release-candidate' and above). It also tells me that you are differentiating from the compiler inserting runtime checks automatically, and the compiler requiring the programmer to include manually inserted runtime checks.
This last part is of interest to me, as I am not sure I see them as being as clear-cut as you seem to be asserting. For example, it would seem (to me) that there is no reason this
couldn't be automated, at least partially. For example, let us consider the possibility of allowing the client-programmer to define range types (either as named types or as one-off subtypes of the integer type) in which a specific behavior is defaulted:
Code:
(def Foo-Range (constrainted-type Integer
(cond (((< _ -250) (set! _ -250) _)
((>= _ 750) (raise-overflow-exception! _))
(else _)))))
(This is just off the top of my head, but sort of what I have in mind; I would probably have a specific ranged-type c'tor that would cover this common case:
Code:
(def Foo-Range
(ranged-type Integer
:underflow #saturate
:overflow (raise-overflow-exception! _)))
or something like it.) Similar things could be done with ranged types in (for example) Ada, though IIRC in that case the default (and only) behavior is to raise a standard exception. Furthermore, using an explicit range could allow the compiler (or library macros, in my case) to automatically optimize the variable's memory footprint (by using a 16-bit value instead of 64-bit one, for example). Of course, range checking is just one example, but the point is, there are ways in which this can be automated which would still give the client-programmer fine control over the handling of edge cases.
I don't just have ranged types, I only have ranged types.
For creating integers it's like "range 0 to 255 myInteger", but (for convenience) the compiler also lets you specific a range by the number of bits (e.g. "s7" is a synonym for "range -64 to 63", and "u12" is a synonym for "range 0 to 4095").
For creating floating point variables it's similar but different because there's both precision and range. The precision is specified in bits and the range is like it is for integers - e.g. "f24 range 0 to 1 myFloaty". It's also possible to set the range by specifying the exponent size in bits, so "f24e8" is equivalent to a single precision (32-bit) float.
In general this has nothing to do with how much space variables consume and programmers shouldn't know or care about storage. For example if you have an integer "range 100000000 to 100000255 foo" then the compiler can store it in 8-bits or anything else it feels like.
For structures, the compiler is free to do anything it wants. For example, if you have this:
Code:
struct {
range 100000000 to 100000255 foo
u32 bar
range 1 to 7 dayOfWeek
range 1 to 31 day
range 1 to 12 month
}
Then the compiler might use 8 bits for "foo"; then decide to pack "dayOfWeek", "day" and "month" into a bitfield; and then (for alignment purposes) re-order the fields so you end up with this:
Code:
struct {
u32 bar
u16 dayOfMonth : 3
u16 day : 5
u16 month : 4
u8 foo
}
Of course the language doesn't support bitfields, as there's no real reason to bother with them.
For cases where the exact layout in memory matters (e.g. for file formats and messaging protocols in normal process, and for things like page tables in kernels and memory mapped IO in device drivers) there's "rigid structures". For these the compiler has to follow a set of strict rules - no padding for alignment (other than rounding to the nearest whole byte), no field re-ordering, everything little-endian (even on big-endian machines), no tricks to reduce sizes (that "range 100000000 to 100000255 foo" variable would be 32 bits), etc.
Schol-R-LEA wrote:
Also, the premise of pre-analyzing all possible paths has the potential of running into a combinatorial explosion. At some point, the compiler would have to simply give up and reject the code, agreed? That may make certain kinds of code impossible to write with such restrictions in place. This is not a very likely scenario, so I don't know if it is worth giving too much weight to it, but it would have to addressed at least in the documentation and error reporting.
No
A function's signature is a contract. If you write "range 1 to 9 myFunction(range -100 to 300 y)" then the compiler checks that the function is capable of handling all possible values of y from -100 to 300 correctly. If the function is called, the compiler only has to check that the caller complies with the contract (the caller provides a value from -100 to 300). This means that individual functions can be checked in isolation, and checked in any order (and checked in parallel, possibly by multiple computers on a LAN).
Also note that for local variables the overflow checking can work in reverse. Normally for assignments the compiler ensures that the left hand side variable can handle the range of results from the right hand expression and complains if it doesn't. However, if you say that a local variable has the "auto" type then the compiler initially assumes the variable's range is from 0 to 0, and instead of complaining if there's an overflow it just increases the variable's range.
Schol-R-LEA wrote:
There's another case which would be problematic (well, in most languages, anyway - there are languages where it is possible to programmatically generate types at runtime, but the overhead is quite steep), which is where the range itself is set at runtime. There really would be no clear-cut way to check that ahead of time in a statically-typed language, so the compiler would in your scenario require every access to the value to be explicitly tested. Since this is precisely the scenario we are looking to avoid, it is hard to see how not testing automatically would be of benefit. Again, an unlikely scenario, but something to give consideration to.
I gather, you would consider explicit checks preferable in any of these scenarios, is this correct?
You'd have to choose a "max. size" type and the compiler will only ensure that all your code handles the range of the max. size type correctly. Anything beyond that (e.g. if you want to limit values to a sub-range at run-time, or only want to store prime numbers, or only odd numbers, or whatever else) is "domain logic" (your problem) and not "correctness" (compiler's problem).
Schol-R-LEA wrote:
Brendan wrote:
Schol-R-LEA wrote:
Note that I have not given the context of this use case - neither whether it is part of a loop condition or simply some arbitrary part of the program logic, nor whether the increment or the range are explicitly defined in the program or not. Context may be relevant to your solution, I will grant, but I want to first consider the general case before moving on to specific cases.
The context only really effects what the compiler thinks the previous range of values in x could be.
I would say it can affect a good deal more than that. For example, if the value is the index of a definite loop, and the compiler detects that the body of the code had no effect, then the loop itself (including the increment) can be optimized away.
OK, that's not really a fair example, but consider (as an example) a counting loop where the index is not explicitly used for any other purpose; depending on the processor, it may be possible to reverse the increment to a decrement in order to use a simpler exit condition, or replace the loop with with non-conditional repetition (e.g., REPZ MOV RSI, RDI). Whether it would be possible to find such potential optimizations (and know when they would make a difference) might not be feasible, but that's not the point: the point is that context can affect how you compile a particular part of a program.
Things like syntax checks, grammar/semantic checks, type checks, and overflow and precision checks happen first. Optimisations only happen after checks are done.
Schol-R-LEA wrote:
Brendan wrote:
Basically it comes down to a design choice. A compiler may:
- guarantee there are no false positives (e.g. overflows) at compile time; which makes it impossible to avoid false negatives (e.g. "nuisance" errors) at compile time, or
- guarantee there are no false negatives (e.g. "nuisance" errors) at compile time; which makes it impossible to avoid false positives (e.g. overflows) at compile time
The first option is what I'm planning. It's harder to write the compiler and makes things a little more annoying for programmers when they write code.
Excellent, this gives us all a lot better understanding of your intentions and motivations, I think.
For this aspect of it, yes.
Cheers,
Brendan