OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Jan 19, 2018 7:45 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4
Author Message
 Post subject: Re: [language design] My working notes on Thelema
PostPosted: Tue Nov 28, 2017 9:30 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1057
Location: Athens, GA, USA
OK, so now for the digression about hypertext, and specifically about Project Xanadu and xanalogical storage.

Project Xanadu, for those unfamiliar with it, was one of, if not the, first explorations of the concepts of hypertext and hypermedia, both terms coined by the originator of the project, Ted Nelson. The project informally began in 1960, taking an ever more solid shape throughout the 1960s and 1970s despite the skepticism, indifference, and even obstructionism of others, and after a half dozen or more iterations, is still ongoing today - you can see the latest project here. While Ted claims that it is finally a working system, after being the declared to be longest-running software vaporware ever by his critics and suffering as the butt of many industry jokes, though for all that there is now a light at the end of the tunnel, it has so far fallen far short of its intentions due to forces often out of his control.

It was a very different idea from the modern World Wide Web, even though one of Ted's books, Literary Machines, was a primary influence on Tim Berners-Lee and his design (though contrary to what Nelson has said, not the prime inspiration - Berners-Lee had been working on both SGML document formats, and other forms of hypertext, for at least three years before the book was published, and was originally only intending HTML and HTTP as a means to share research papers among scientists with something approximating citations).

Actually, much of the confusion about it is that it is a combination of many ideas, most of which would seem unrelated to most people. This is where both the strength, and the weakness, of Nelson's vision lies, in that people rarely see the connections he does - and connections are at the heart of his ideas.

And yes, Nelson (like myself) has ADHD. Quite severely, in fact. He see it as a strength, rather than a problem, but in terms of getting support for the project, and seeing it through to the end, it has been crippling, which is unfortunate, because it is also what led him to it in the first place. Perhaps more than anything, Xanadu was Ted's attempt to find a prosthesis with which to grapple with the breadth of his interests and ideas, a breadth borne out of his 'butterfly mind'.

Anyway, all of this is prologue. The point is that while Project Xanadu encompasses a wide number of ideas, many of which have since spread out into the computer field in separate pieces and in distorted forms, one piece that hasn't caught on is the idea that Files Are Evil.

OK, a bald statement like that, so typical of Ted, is going to take some explaining. I know, I know, I promise I will get to the point eventually, but digressions are a big part of all of this, and this probably won't be the last one here.

What Ted means when he talks about 'the tyranny of the file' is that the conventional, hierarchical model of files as separate entities, which need to be kept track of both by the file system and the user, is a poor fit for how the human mind actually works with information, and in particular, that it obscures the relationships between ideas. This applies to both conventional file handling, and to file-oriented hypertext/hypermedia systems like the World Wide Web.

It is here that Ted loses most people, because to most people, he is mixing up different levels of things - and Ted would even agree, but his views about what those levels are, is quite different from the one most people are familiar with. Basically, where most people see separate documents, which might refer to each other through citations or hyperlinks but are fundamentally separate, he sees swarms of ideas which can be organized in endless ways and viewed through many lenses, of which the 'documents' are just one possible view of them, and not an especially fundamental one at that.

Now, this will seems somewhat familiar to those of you who have some experience with relational databases, and in fact Ted took a look at RDBMS ideas in the late 1990s, concluding that they were on the right track, but still blinkered by their assumptions about what data 'really is'.

To his eyes, there is no 'really is'. He views information as a continuum -- what he calls a docuverse - and his primary frustration is in the fact that everyone else is (by his estimation) trying to impose their ideas of what the pieces of that continuum are, rather than them float free for anyone to view as they choose. He sees Xanadu as an attempt to approximate that free-floating continuum - he's is trying to reduce the amount of inherent structure in order to allow variant structures to be easier to find.

Getting these ideas across is really, really difficult, especially since (again, like myself) he often leaves the best parts in his own mind, making it look like he's jumping all over the place and skipping steps.

He does that, too, but most of that impression comes from things he has so well-set in his own mind that he forgets that other people haven't heard them yet. This is a trap that is far too easy for an visionary to fall into, and while he is aware of the problem and does strive to avoid it, it is one which is hard to notice for anyone until it is brought to there attention - and sadly, few have had the patience to do so.

Moving on to the next post, which discusses the back-end and front-end part of Xanadu, which I need to gloss a bit before explaining how this all ties into my language ideas.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Tue Nov 28, 2017 2:05 pm, edited 8 times in total.

Top
 Profile  
 
 Post subject: Re: [language design] My working notes on Thelema
PostPosted: Tue Nov 28, 2017 10:54 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1057
Location: Athens, GA, USA
OK, let's go over how Xanadu is intended to work, and how I intend to apply the ideas, if not always the methods, thereof.

As I already said, the point of Xanadu is to replace files. Much of what Nelson & Co. are doing can be done with existing file systems, and in fact several of the iterations of the project ran on top of existing file systems or RDBMS systems, but doing them with existing files involves a lot of ad-hoc work. Xanadu, at least in the file-system based forms, is essentially a library for doing just that, though part of how it does that is to create what amounts to a separate database system on top of the existing ones - which is why the real eventual goal was to implement it in a more stand-alone form, working on the media directly rather than through the file system.

In the 1988 and 1993 designs of Xanadu (and presumably in OpenXanadu, though despite the name not all the code is exposed yet AFAICT), there is supposed a Back End and a Front End, with the Front-End/Back End (FEBE) protocol between them. The BE handles managing the storage and retrieval of data fragments, both locally and remotely, keeps track of what is where and which things are stored or mirrored locally, and maintaining coherence across distributed storage and caching. It is a hairy piece of work and while its operations are secondary to the goals of Xanadu, it is at the heart of the implementation of those goals.

My understanding is that the FE handles the decisions about which fragments to request at a given time. Note that the FE is a front end to the applications, not some equivalent of a browser - it is primarily an API for the BE, though it does do some management of the data views as well.

Presentation to the user is up to the applications themselves, and to the display manager, which is supposed to permit various ways of display connections between applications to the user. At this point, you can probably see part of what most people get confused by in all of this, as there is no single 'browser' anywhere in all of this.

Basically, when a new datum is created - whether it is a 'text document', a 'spreadsheet document', a 'saved game record', an 'image', or what have you - the FE passes it to the BE, which it encrypts in some way and then writes to some storage media - possibly as part of a journal that contains data from several other applications and users.

Along with the data, the FE passes the BE information about the datum and it source. If it is part of a larger 'document' - which is usually going to be the case - it includes information about the document, including a link to the address of any related data, and how they are related. For example, for a 'word processing' application, it might pass a link to the datum which was, at the time of editing, the immediate predecessor of the datum being stored, and that the datum was (again, when created) the successive item in a larger document.

The BE catalogs each of the recently written data according to the format the datum is in, its size, the user who created it, local date and time of creation, the application it originated in, the encryption type - all things that a conventional file system may or may not record - but also the publication status (and later, publication history), the current (and later, previous) owners/maintainers of the datum, the current location it is stored in, and whether to mirror it elsewhere (which is the default for most things).

Up to this point, it looks normal. Here is where it changes course a bit.

The BE generates a permanent address link for the datum, one which is independent of its current location in storage. This is a key point, because the storage location itself is only an ephemeris to the system - while the datum is meant to be treated as immutable, and the parent copy should never be overwritten, the actual physical image of the datum in the storage medium isn't the datum itself. This is also why, for networked systems, automatic mirroring is the default (and why it being encrypted - and the fact that the encryption methods can vary from datum to datum or even copy to copy of the same datum - is important).

A large part of this is to abstract away, from the perspective of the FE, the applications, and the user, the process of storing, transmitting, mirroring, and caching the data. As far as everything outside of the BE is concerned, the datum is (or should be) immutable and eternal, approximating a Platonic essence of the idea it encodes. The reality is obviously more complicated, but the system is meant to bend over backwards to maintain that illusion, across the entire 'docuverse' straddling the network.

(So far, it hasn't quite managed this, and perhaps never will, but in terms of its goals, it goes further than any other system that I know of.)

Now, you may have noted that I haven't talked about links, hyper or otherwise, yet. This is where things go even further out of the norm, because the Xanadu idea of a 'hyperlink' has nothing much at all to do with the hyperlinks of things like the WWW.

In Xanadu, there are several types of links, most of which are not directly related to how the datum is presented to the user. The particular kind of links in consideration right now might be called 'resolution links', which describe the physical location(s) of the data; and 'association links', which store how two or more data relate to each other (these aren't the terms used by the Project, but explaining their terms would take hundreds of pages, and I only know a fraction of the terminology myself). The former are ephemeral, relating to the specific physical storage, and are stored as the equivalent of a FAT or an i-node structure, while the latter are permanent, and have their own resolution links when they are themselves stored.

Some of the types of association links are:

  • 'span links', which refers to a slice or section out of the datum, allowing just the relevant sections to be referenced in documents or transferred across a network, without having to serve the whole document - the 'whole document' is itself just a series of different kinds of association links.
  • 'permutation-of-order links', which are used to manipulate the structure of the document, creating a view - or collection of views - which can themselves be stored and manipulated. This relates to the immutability of data - rather than changing the data when updating, the FE permutes the order of the links that make up the 'document' or view, and pass that permutation to the BE to record it. This, among other things, serves as both persistent undo/redo, and as version control.
  • 'structuring links', which describe the layout of the view independent of the data and the ordering thereof. This acts as out-of-band markup, among other things - the markup is not part of the datum itself.
  • 'citation links', which represent a place where a user wanted to record a connection between two ideas. This link associates bi-directionally, and has its own separate publication and visibility which is partially dependent of that of the data - an application, and hence a user, can view any citation link IIF they have permission to view both the citation and all of the data it refers to. There many also be 'meta-citations' which aggregate several related citations, but I don't know if that was something actually planned or just something discussed - since citations are themselves data, and all data are first-class citizens, such a meta-citation would just be a specific case of a view.

It is important to recall that the 'views' in question are to the applications and the display manager. They can then organize the actual user display based on those views into the data as needed. The same data - or even the same views - may be shown as part of a 'text document' by one application, as set of spreadsheet cells by another, or composed with some image in yet another. This is why markup is out-of-band, and why structuring links applied to a given set of data are stored for later use by the applications.

There are still other links for recording the history of the datum's ownership and publication status, connecting a data format to one or more means of interpreting the format or transposing it into another format, indirection (to allow for updating of views - since most links available to the FE are immutable, these allow for the equivalent of a VCS repo's 'HEAD' branch, allowing the applications to fetch whatever the latest version of a document is and separating 'currently published' from 'previously published'), tracking where copies of a given datum can be found for the purposes of caching and Torrent-like network distribution, and so forth, but most of those are only for use internally by the BE.

When the new datum is created as part of some new document, a new association link is created to connect it to that document, which is then passed back to the FE for use by the application. The FE then creates a permutation link for the document, incorporating the datum into the document link traces, which is then passed back to the BE for storage.

Moving on...

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Tue Nov 28, 2017 2:26 pm, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: [language design] My working notes on Thelema
PostPosted: Tue Nov 28, 2017 11:27 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1057
Location: Athens, GA, USA
OK, so I've covered all this stuff about how xanalogical storage is meant to work, so I can pop that off the stack and talk about the plans for my languages, or more particularly, for my compiler and toolchain.

Basically, my plan is to have an editor that performs the lexical analysis on the fly, and saves the programs not as text, but as a link trace of meta-tokens. The lexical analyzer would still be available as a separate tool, which the editor would be calling as a library, so the compiler could potentially use source code from other editors, but that's going in a different direction than I have in mind.

What is a meta-token, you ask? Well, in part it is a token in the lexical analysis sense - a lexeme which the syntax analyzer can operate on. The 'meta' part comes from the fact that the datum it references does not need to be a specific text string - it can, in principle at least, be any kind of data at all, provided that the syntax analyzer can interpret it in a meaningful way.

Also, a meta-token may be associated with more than one value, allowing for alternate representations of the syntactic structure - provided that the syntax analyzer agrees that the different representations have the same meaning. So what the meta-token really is is a way of associating a representation with a syntactic meaning that was set by the syntax analyzer when the meta-token was created.

I expect that you can see why I am talking of a 'syntax analyzer' rather than 'parser'.

Now, this does complicate the editing a bit - there has to be a way to differentiate between 'change the name/representation of this particular variable globally' and 'change from this variable to a new one or a different one, just for this particular part of the program', among other things. But it also opens up a lot of possibilities that would be a lot less feasible with the conventional 'plain text' model of source code.

For example, if the editor treats some parts of the program structure as 'markup' rather than 'syntax' - for example, indentation, newlines, delimiters for things like string literals or the beginning and ending of lexical blocks - then the same code could be edited in multiple 'programming languages' without needing an explicit translator - the program is stored as a syntax tree of meta-tokens anyway, so the representation of the program is separate from the 'Platonic essence' of the program the code describes. The source code itself is just a specific presentation of the program.

Mind you, it would still be in 'the same' language in the sense that the actual syntax would be the same, just shown in different ways, so it wouldn't quite be all things to all programmers, but it would make things a lot more flexible. And if two analyzers for different language syntaces had some or all of the possible meta-tokens in common, it drastically changes certain code-level interop issues.

It also solves some of the dichotomy between conventional programming languages and 'visual' ones, though the problem of the Deutsch Limit would still exist for any given visual presentation.

On a side note, and as a preview of where I am going with this, the lexer could also add citation links to make it an annotated AST, adding to the ability to pass information about the program to the semantic analyzer and code generator. This can allow for additional analysis of things like, say, whether certain optimizations could be applied in the generated executable.

Oh, and because the final executable is also stored xanalogically, there is no reason it has to produce a single executable image - it can create multiple whole executable images for different architectures, branched executables for variants of the same architecture and system hardware which the loader could select from, even 'templates' which the loader could fill in the gaps to at load time - the loader would only need to fetch those parts it needed, possibly along with additional information it could use to further tweak the executable by means of runtime code synthesis (Surprise! you knew I was going to bring that up somewhere in all of this, didn't you?). Oh, and the executables would be cached on systems other than the origin, and only permanently stored if the user or an application chooses to mirror it, so updating and backtracking isn't especially difficult (which is also a reason why everything transferred between systems is encrypted), and any node currently mirroring or caching something can be used by the other nodes as the equivalent of a Torrent site for published programs if the administrators chose to allow it, according to the limits they choose (but only to users who have rights to use them - I am sure that there would be a way around that, and it raises some hairy issues about licensing, regulation, and compliance, but that would have to be dealt with after the experimental stage of all this).

Just one more post, I promise. I am finally ready to explain how all of this ties into types and dispatch.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Tue Nov 28, 2017 2:44 pm, edited 17 times in total.

Top
 Profile  
 
 Post subject: Re: [language design] My working notes on Thelema
PostPosted: Tue Nov 28, 2017 11:54 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1057
Location: Athens, GA, USA
OK, some of you can probably see where this is going, but some of you are probably completely lost (some may be both, in varying ways). Don't worry, this is the payoff for all of that.

Aside from the whole 'abstract syntax tree of meta-tokens' form for programs, the xanalogical approach opens up another possible avenue for programming language design. Remember how I said that the use of permutation links rather than changing the stored values allowed for nearly-unlimited undo/redo, and (here' the key part) acted as a form of version control? And remember what I said about indirection links allowing for updatable views? Here's the important part: you can have multiple indirection links to different parts of the development history.

This gives you things like branching, forking, and staging, practically for free, once you have xanalogical storage.

OK, so getting to there is anything but free, but bear with me here.

If the compiler is working from an indirection link to the stored AST, and the AST itself is mostly just a tree of links to the meta-tokens, then the compiler can keep a separate record of which warnings, constraints, and optimizations to apply when compiling the program, and link that to the indirection handle.

Back to the compiler annotations. Did you notice that these can - once again - be anything that the compiler might have a use for? It can serve to link to code documentation, design documents, UML diagrams, whatever. And if it has some hooks that allow it to, say, apply a constraint based on the documentation - a reminder to update the documentation, say, or some kind of constraint based on a class declaration matching the structure defined in UML - then it could use that to change the errors, executable output, or other results.

Or, just perhaps, it could be used to apply type constraints on code which doesn't explicitly declare types.

Now, I would still want to be able to add explicit typing to the program source code, especially for things like procedural dispatch (where you need it in order to have the program call the right procedure), but, if we can have it represented as a form of compiler constraint, well, there's no reason that the code editor can't hoist them out and save them as annotations, right?

That would let you, say, write most of the code without worrying about typing when you are first working out things, then progressively add more stringent constraints as you stage from (for example) 'development-experimental', to 'development', to 'unit testing', to 'integration', and so forth up to 'release'.

And the editor and compiler together could be configured to enforce that you can only edit the program code in either 'development' or 'development-experimental', while still permitting you to add type predicates later on. Oh, it couldn't stop you from creating a different permutation in some other application, but it could simply refuse to work with that alternate permutation.

So, now you know what I have in mind. Will it work? I have no idea; probably not, if I am really honest about it. But I should learn a lot about what does and doesn't work along the way, right?

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
 Post subject: Re: [language design] My working notes on Thelema
PostPosted: Wed Nov 29, 2017 2:27 am 
Offline
Member
Member

Joined: Thu Nov 16, 2017 3:01 pm
Posts: 35
Quote:
Some of the types of association links are:

'span links', which refers to a slice or section out of the datum, allowing just the relevant sections to be referenced in documents or transferred across a network, without having to serve the whole document - the 'whole document' is itself just a series of different kinds of association links.
'permutation-of-order links', which are used to manipulate the structure of the document, creating a view - or collection of views - which can themselves be stored and manipulated. This relates to the immutability of data - rather than changing the data when updating, the FE permutes the order of the links that make up the 'document' or view, and pass that permutation to the BE to record it. This, among other things, serves as both persistent undo/redo, and as version control.
'structuring links', which describe the layout of the view independent of the data and the ordering thereof. This acts as out-of-band markup, among other things - the markup is not part of the datum itself.
'citation links', which represent a place where a user wanted to record a connection between two ideas. This link associates bi-directionally, and has its own separate publication and visibility which is partially dependent of that of the data - an application, and hence a user, can view any citation link IIF they have permission to view both the citation and all of the data it refers to. There many also be 'meta-citations' which aggregate several related citations, but I don't know if that was something actually planned or just something discussed - since citations are themselves data, and all data are first-class citizens, such a meta-citation would just be a specific case of a view.


The first 3 things here sound like filter / map / reduce / sort operations.

I do agree that information is more of a continuum than anything. You could probably even build a traditional filesystem on top of this sort of thing using citation links.


Quote:
Basically, my plan is to have an editor that performs the lexical analysis on the fly, and saves the programs not as text, but as a link trace of meta-tokens. The lexical analyzer would still be available as a separate tool, which the editor would be calling as a library, so the compiler could potentially use source code from other editors, but that's going in a different direction than I have in mind.

What is a meta-token, you ask? Well, in part it is a token in the lexical analysis sense - a lexeme which the syntax analyzer can operate on.


This actually reminds me of when I programmed in TI-BASIC. The editor doesn't operate on characters like a normal text editor. Things like "Asm(", "Input", "Disp", etc. were tokens selectable from a menu. My guess is that they did this so that they could skip lexical analysis.

Quote:
Remember how I said that the use of permutation links rather than changing the stored values allowed for nearly-unlimited undo/redo, and (here' the key part) acted as a form of version control? And remember what I said about indirection links allowing for updatable views? Here's the important part: you can have multiple indirection links to different parts of the development history.


"Permutation Links" sounds like something I did in a configuration format I designed a while back for a game engine:
Code:
Class Instance-Variant {
    Class Field: Value;
}

Where you can use an asterisk for the classname to reference an instance (rather than define a new one), classnames are optional, and the hyphen / variant part is optional. A practical use would be something like:
Code:
*MyWindow {
    Text: "My Window";
    Size: 320px, 240px;
}

*MyWindow-Linux {
    Text: "My Window For Linux";
}

*MyWindow-Linux-GNU {
    Text: "My Window For Pedantic People";
}

I basically designed it because I wanted to have a flexible configuration and object definition format for the resource compiler that supported fallback chains.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4

All times are UTC - 6 hours


Who is online

Users browsing this forum: Google Feedfetcher, z0rr0 and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group