Stream Oriented Programming?

jvff · **Joined:** Sun Oct 24, 2004 11:00 pm **Posts:** 46

Hi,

I just wanted to ask for some opinions about the viability of an idea I had some time ago. It's based on the same principle as Unix pipes, however, implemented on a lower level.

All source-code objects (programs, libraries, etc.) are implemented and designed on the pipe metaphor. So every "object" (for a lack of better term, but could be module, component, function, etc.) runs and has a pointer to it's entry stream(s) and a pointer to it's exit stream(s) (and their respective lengths). This way all the code does is process streams.

IMHO it's an oversimplification of how computer systems are today (ie. Input/Output machines), and perhaps more flexible. I think cdecl calling convention in IA32 (ie. the default C calling convention in x86), has indirect stream processing built-in, however only for input streams. The input pointer is saved in esp (ie. the stack), and the caller is responsible for it's cleanup, hence printf(char *str, ...) is possible. However cdecl doesn't have support for output streams as it only allows single-return (the return could be a string pointer, but that still limits where the stream size is returned, possibly as first item in stream).

Anyway, stream oriented programming (SOP, please mind this term is a place holder for the idea here described, and if there exists real SOP outside this thread, I have no knowledge about it), allows some neat features, such as module linking to create for example programs or libraries, and perhaps aspect oriented programming (AOP) by replacing a specified module in a registry or central server by another module with pre-formats data and links to the original module, and also can perform post-formating.

Drawback #1: who will handle the output stream? The OS? The parent module? The child module? This could result in inefficiencies. For example a module when "called" doesn't really know the size of the result. So it can't preallocate some are and use it without guessing the output size. It could use the stack, but that could result in stack overflows (which could possibly be solved by buffers, but who handles these?).
Drawback #2: Micro-streams could pose inefficiencies. For example consider an extremist SOP model which implements an itoa function as a module. The stream is so small it fit's in the CPU's registers. What's the point of allocating memory for the stream data? Of course stream data could be useful if you're streaming integers to the module, but for most cases, not. Someway, someone must catch micro-streams and optimize them (again in extreme SOP there could be a stream to handle stream creation, but that sounds stupid).
Drawback #3: Streams are limited by their static nature. Some modules are more flexible if they are implemented like state machines. They become more like object instances in object oriented programming (OOP). However, this can be acheived simply by streaming the state with the stream data. But this still would require "buffer modules" to store the state for future use.
Drawback #4: Type safety; there is none. Stream is just data. Interpretation of this data is strictly a responsability of the module. If the stream isn't properly formated it could result in the best case bad data, in medium case system crash and in worst case, uncontrolled system reconfiguration. This could be at least protected by AOP, where verifiers are placed before critical modules. But still AOP could be used to corrupt outcoming data.
Drawback #5: Multi-threading safety. Obviously independant streams can be issued in parallel. However if modules become state machines (ala OOP), the states could be compomised. The most probable solution would be to have buffers save states and restrict buffers to single-threads. Synchronization would be simply controlled streaming of thread buffers.
Drawback #6: Parallel stream processing. Stream processing allows the removal of some iterations (by simply streaming all the data). Some iterations can be executed in parallel. The coder could transform an iteration into n-streams, allowing them to be executed in n-threads. However, in a higher level, an iteration stream can be transformed into a single stream, and threads are created by the system, where each thread handles a part of the stream. This is more dynamic, but again it's hard to define who will control thread generation (a higher level central stream system? A module? The user?).
Drawback #7: Stream funneling. The problem here is about modules with multiple input streams. This could be dangerous if one of the streams isn't ready. Another problem arises from funneling standards. What if we only feed a single stream to a multi-stream module? When is it best to have a module allow multi-inputs if it can be desgined for a single input?

SOP seems to me as a neat idea. However reality is a bit different. I can't really weigh the advantages and disadvantages, so I'm asking for opinions. I kill time brainstorming, and I would like to see how far I can go with the ideas and what I can lear from them. Thanks for reading, sorry for long post (as apparently I have the habbit of doing), and forgive bad english,

JVFF

Alboin · **Posted:** Fri Feb 02, 2007 4:06 pm

SOP (As you call it.) is a wonderful idea that actually works quite well. (Although, I have only used it on an application level.) The JACK audio system uses something like this, where I can have multiple programs, that are in no way related to each other, talking to each other through the JACK library. This is somewhat like OOP, only on a broader scale.

For example, with JACK, I can have one program to play a MIDI file (eg. Rosegarden). Another to synthesize the MIDI file that is playing, and yet another to record the final output. Then, once they are all started, I open the JACK control center and connect Rosegarden's output to the synthesizer's input, and the synthesizer's output to the recorder's input. It works wonderfully. I suggest you look deeper into the possibilities of it. Also, if I recall correctly, JACK supports multiple streams per application, which is very useful.

jvff · **Joined:** Sun Oct 24, 2004 11:00 pm **Posts:** 46

Yep. Seems very similiar =) However, from a first-look, it is specific for audio. Nice to see it works in at least one domain. I wonder if it's possible to integrate SOP over the horizon, and if it's worth it.

My biggest concern is buffer size for output generation. If it's too large, it's wasted space. If it's too small, there's too many stalls. Ideally, each module should have it's own buffer size, but that could result in bloat on untrusted modules. The ideal solution is a dynamic list. Perhaps assigning a page for each stream as a buffer is a quik-and-dirty solution. Keeping a buffer-size predictor (similiar to a CPU's branch predictor) we can have more accurate guesses for buffer allocation. But winds up to the end the other drawbacks has: who will control it. I'm guessing the who must be some kind of supervisor, that is called to create buffers, and control streams, threads etc. I also can suppose this supervisor could be a kernel =)

SOP seems to be one step closer to the some vision I read in the internet about programs becoming simpling a structure of previously defined component, without needing to code anything. Thanks for your reply =)

JVFF

Candy · **Posted:** Sat Feb 03, 2007 6:30 am

jvff wrote:

Hi,

I just wanted to ask for some opinions about the viability of an idea I had some time ago. It's based on the same principle as Unix pipes, however, implemented on a lower level.

I think it'd be quite viable, am working on a similar idea myself.

Quote:

All source-code objects (programs, libraries, etc.) are implemented and designed on the pipe metaphor. So every "object" (for a lack of better term, but could be module, component, function, etc.) runs and has a pointer to it's entry stream(s) and a pointer to it's exit stream(s) (and their respective lengths). This way all the code does is process streams.

Identical to my design, except that I view a stream as a continuous stream, so you can't request the length (normally), but you can test for the end of the stream.

I also intend to allow code to convert things to streams of other things & streams of things to one (1) other thing, so you can view opening a file as convert(file, stream<byte>) and archiving files as convert(stream<file>, file). The latter isn't supported yet. Also, there's a bit of a performance problem with indirection as nothing is buffered and you can only process a single item each time.

Quote:

IMHO it's an oversimplification of how computer systems are today (ie. Input/Output machines), and perhaps more flexible. I think cdecl calling convention in IA32 (ie. the default C calling convention in x86), has indirect stream processing built-in, however only for input streams. The input pointer is saved in esp (ie. the stack), and the caller is responsible for it's cleanup, hence printf(char *str, ...) is possible. However cdecl doesn't have support for output streams as it only allows single-return (the return could be a string pointer, but that still limits where the stream size is returned, possibly as first item in stream).

I think it's a simplification, not more than that. There are so many things that you can model with this concept that it could be a next greatest thing in programming.

Quote:

Anyway, stream oriented programming (SOP, please mind this term is a place holder for the idea here described, and if there exists real SOP outside this thread, I have no knowledge about it), allows some neat features, such as module linking to create for example programs or libraries, and perhaps aspect oriented programming (AOP) by replacing a specified module in a registry or central server by another module with pre-formats data and links to the original module, and also can perform post-formating.

Still sounds the same

Quote:

Drawback #1: who will handle the output stream? The OS? The parent module? The child module? This could result in inefficiencies. For example a module when "called" doesn't really know the size of the result. So it can't preallocate some are and use it without guessing the output size. It could use the stack, but that could result in stack overflows (which could possibly be solved by buffers, but who handles these?).

I intend to solve this with buffers, handled by the compiler (as template code that's semi-automatically inserted in between). The buffers also allow you to specify that you do or do not want to immediately jump to the next sequence for the next step of processing and if done correctly, you could run each minuscule bit of processing on a separate core. That scales really well (although the system I've tested it on only had HT - not much improvement).

Quote:

Drawback #2: Micro-streams could pose inefficiencies. For example consider an extremist SOP model which implements an itoa function as a module. The stream is so small it fit's in the CPU's registers. What's the point of allocating memory for the stream data? Of course stream data could be useful if you're streaming integers to the module, but for most cases, not. Someway, someone must catch micro-streams and optimize them (again in extreme SOP there could be a stream to handle stream creation, but that sounds stupid).

This is the case where I consider the model breaks down because it becomes too much of a stand in the way, and should either be replaced with more efficient modules that do more or acceptance of the speed. Using buffers & mass processing, you can get this pretty quick.

Quote:

Drawback #3: Streams are limited by their static nature. Some modules are more flexible if they are implemented like state machines. They become more like object instances in object oriented programming (OOP). However, this can be acheived simply by streaming the state with the stream data. But this still would require "buffer modules" to store the state for future use.

Instantiate modules explicitly with their own state so they have a state & known in/output connections (which they immediately can use). OO implicitness for me.

Quote:

Drawback #4: Type safety; there is none. Stream is just data. Interpretation of this data is strictly a responsability of the module. If the stream isn't properly formated it could result in the best case bad data, in medium case system crash and in worst case, uncontrolled system reconfiguration. This could be at least protected by AOP, where verifiers are placed before critical modules. But still AOP could be used to corrupt outcoming data.

Convert the type as a part of the name and only allow (at a choice level) the connections that work. Disallow all others. If you know something works, ilnk the module in the type system twice and tell it that it has two names.

Quote:

Drawback #5: Multi-threading safety. Obviously independant streams can be issued in parallel. However if modules become state machines (ala OOP), the states could be compomised. The most probable solution would be to have buffers save states and restrict buffers to single-threads. Synchronization would be simply controlled streaming of thread buffers.

Lock-free buffers are very usable and you can use a fairly large buffer & use multiple threads that keep tabs on what they should & shouldn't be doing. They should probably be implemented somewhat like coroutines.

Quote:

Drawback #6: Parallel stream processing. Stream processing allows the removal of some iterations (by simply streaming all the data). Some iterations can be executed in parallel. The coder could transform an iteration into n-streams, allowing them to be executed in n-threads. However, in a higher level, an iteration stream can be transformed into a single stream, and threads are created by the system, where each thread handles a part of the stream. This is more dynamic, but again it's hard to define who will control thread generation (a higher level central stream system? A module? The user?).

Thread creation is implicit and (imo) system defined. The modules indicate whether they carry state from one entry to the next (and should of course be made so that they don't). That way you can predetect which modules are being executed most of all, which threads are overloaded and add more threads to those bits or split up that thread into two threads each covering half the modules.

Quote:

Drawback #7: Stream funneling. The problem here is about modules with multiple input streams. This could be dangerous if one of the streams isn't ready. Another problem arises from funneling standards. What if we only feed a single stream to a multi-stream module? When is it best to have a module allow multi-inputs if it can be desgined for a single input?

I've defined a basic_filter that implicitly assumes there's one input and one output. That just requires you to test that the modules all keep working with full outputs & inputs, and you must ensure that for a module with >1 input the lowest throughput comes from the entropy-input. I've noticed that nearly all those things with more than one input use the others as auxiliary input, for random numbers, encryption streams and so on. They all have a primary stream as well, which is the actual data (thinking Vernam cipher here). If your encryption key stream is as quick as the data stream, that's no problem at all.

Quote:

SOP seems to me as a neat idea. However reality is a bit different. I can't really weigh the advantages and disadvantages, so I'm asking for opinions. I kill time brainstorming, and I would like to see how far I can go with the ideas and what I can lear from them. Thanks for reading, sorry for long post (as apparently I have the habbit of doing), and forgive bad english,

You can try my code at http://atlantisos.svn.sourceforge.net/v ... clude/aos/, in particular input, output, module and filter.

My personal implementation atm only does unicode (two up, lib/libaos/modules/unicode) which is tested (in unix) and works, and should cleanly do Vernam ciphers and so forth. I have (at my work place) an implementation that was implemented to process network traffic and to interpret numerous levels of data, and it does that parallel (not implicitly yet) with buffers, supports cycles in data chains, physical interfacing and distributed processing. That program has 14k lines that I wrote for that, which includes 33 filters.

You also need an item called settings, which are variables within the modules that you can adjust at runtime for finetuning or changing the processing. In that idea I invented dynamic variables (short while back) and I've published them in the same directory as "dynamic", header file.

I'd love to work on it again but I'm pretty cramped for time.

jvff · **Joined:** Sun Oct 24, 2004 11:00 pm **Posts:** 46

I guess you are the author of the Atlantis project thread post =) I was going to reply to that thread with some ideas, but my computer crashed and lost all the data. Oh well...

Very nice to see you already have some results. Do you plan on sharing such library? Also, could you provide a little info about it if possible? Thanks for the reply,

JVFF

Candy · **Posted:** Sun Feb 04, 2007 5:05 am

jvff wrote:

I guess you are the author of the Atlantis project thread post =) I was going to reply to that thread with some ideas, but my computer crashed and lost all the data. Oh well...

Very nice to see you already have some results. Do you plan on sharing such library?

It's public domain as far as I can release it under that (haven't researched the Dutch law too deeply but I think they do have "public domain" in there somewhere). I only hope that people improving on it release their changes into PD as well, so that we might finally get a decent and fully free code base for everybody (and that means EVERYBODY) to use. That way, we might even get cheaper products that run better and more stable.

Quote:

Also, could you provide a little info about it if possible? Thanks for the reply,

At the moment, each type of change is embedded into a filter object which has inputs, outputs and connect methods. They can be connected together using the | character as objects, after which the first output of the first filter is permanently connected to the first input of the second filter, and a new wrapper filter object is returned with their inputs & outputs (think atm it's just their first in&output) as its in&outputs. So you can view any filter operation as a transitive operation, effectively allowing you to sort of combine operations arbitrarily to achieve a given effect that can't be done with a single module.

Each filter has 5 parameters that indicate what type it is (this is pretty old code but it holds mostly). The first one indicates the first input type, the second indicates the first output type. The third indicates the kind of modification it does, the fourth and fifth allow diversity so that if two people make an <image, image, blur> type function they can be put alongside whilst allowing a meaningful choice between them.

The inputs & outputs are classes that can and should be subclassed with a few types of connection, at least a buffered connection should be in there sometime soon. I also intend to change the filter/module abstraction using c++0x features (especially variadic templates) so you can make a plain function and auto-wrap that into a filter object, which is then auto-wrapped into a module. Effectively, you'd be writing a single function and the "magic" in the headers takes care of all the rest (allowing a few other hooks for startup & shutdown behaviour, as required by most other libraries too). They also need to be integrated with the "dynamic" header so that their properties can be set properly (which is the reason I didn't make many of the other filters because I pretty much can't).

I've set my framework up around the concept of precalculation as much as possible - given all inputs, it's best to calculate the new values as soon as possible, instead of as late as possible. The dynamic header isn't conforming to this yet leading to it not being able to do self-referencing converging calculations (which it otherwise would be able to do, although slightly limited in case of integer logic (converging stops when the changes drop below epsilon, or 1 in case of integers)). To get this working properly in a distributed context you need to allow streaming arbitrary data type-safe across an arbitrary network, which is a problem that I haven't tackled. You also need to be able to control each segment of the program using dynamic variables over a network, which is something I'm working on at the moment (albeit slowly). The last problem that would need solving is remote execution, which can be "emulated" with telnet controlled by a program, but is best solved with a form of explicit server that accepts random programs & such in a given format & sets up the communication for it.

Small note on the dynamic variables: The idea behind them is that you can assign an expression possibly involving input variables to a given location and that the location is notified when the input changes, in sofar that you can make modules with dynamic variables that receive a notification when their value changes and that can interactively respond to it etc., without requiring anybody to type boring loads of value passing logic. You make a dynamic variable and you make it public, no accessors, no complex things, no boring code, just a public number. You tie them together from inputs to outputs and you never touch them again and they just keep working.

Candy · **Posted:** Tue Feb 06, 2007 12:37 pm

Hello?

Combuster · **Posted:** Wed Feb 07, 2007 2:56 am

Candy wrote:

Hello?

I think your wisdom is so über that it scared the rest away

Seriously, i dont see anything left to discuss here rather than questioning your design, which is basically advocating one's taste since it seems technically correct. (For me, I'd like have a copy of the dynamic variables that are implemented lazily)

jvff · **Joined:** Sun Oct 24, 2004 11:00 pm **Posts:** 46

Sorry for delay. Not scaring off =) I am very interested in your implementation. It seems very clean and I can see you have a good design. The only thing that diverges from what I had in mind is the minimalistic view. While you're implementing everything in C++ classes so the user can use it as a library, I had in mind setting up a specification or a calling convention in binary-compatibility level. Something like for example COM.

However, this comes to me together with a programming crisis. Currently I'm stuck at reimplementing the wheel and trying to keep everything ideal, simple, flexible, minimalistic and innovative, and I can't get anything useful out of myself. Every time I try to code some project, I look the way I have to implement it and I just hate it. There's always something wrong with it and I keep coding foundations for future projects in the hope I can reuse them, but I keep going deeper and deeper on the foundations, at the same time I keep abstracting more and more concepts, resulting in a cycle.

That's why I was experimenting with a COM like programming model. Apparently it looks more flexible (I have multiple interfaces for various modules, and I can choose them to communicate with them). However SOP strikes me in it's simplicity. There is only one interface: the stream.

Now my mind wonders about multi-in/outputs in modules. Having multiple outputs means I have to synchronize them (ie. sometimes one can get read more quickly then the other). Having multiple inputs breaks the single interface concept. But without it, things are just impossible. Also, if I have multiple modules reading from the same stream, some stalls may occur, depending on how fast they read the buffered data.

Perhaps streams could be buffered in memory pages? This also allows virtual memory translation so the stream becomes one linear memory block. But this also screws up small streams. I can't think of a way to manage this. Perhaps connections should be subclassed as you said. But I don't know what subclasses there might be, and how they interface to each other. Could you possibly cite some example sublcasses?

Also, since you're working on the Atlantis project, I might just as well post some ideas you could find useful. I plan to implement (if I continue to reinvent the wheel and code foundations) a tree framework to make it easier to program. The tree would be consisted in a similiar way to Plan 9's vision everything is a file (and I mean everything), so everything would be a module on that tree. The tree should be standard (ie. no more 100 of GUI toolkit libraries, just a single GUI interface and many implementations, which are compatible between them). It should be flexible (so we can have for example a module network.sites.utube, which has as it's childs all the online videos, and when you access one of them it is auto-downloaded and cahced).

SOP just seems perfect for such a wild project like that tree. COM would work, but there would be many interfaces and many implementations. I guess to standarize interfaces, every module should handle moments where there are less available streams than required.

I thank you for help on thoughts on "SOP". I also thank you for for sharing with me parts of your code. My problem right now isn't technical, it is more about rethinking it's purpose.

Remote execution could be implemented as stream forwarding. Suppose you have a network module that streams stuff to and from the network. What you would do is open a stream connection (probably TCP), and stream the functions. If I were to handle it in my "tree", I would simple merge trees, having "glue" nodes that represent remote modules. These glue modules would simply stream in and out to the remote modules. This raises another drawback: how would you implement a connection procedure in SOP? Connection is just "connect(addr, port)", there is no stream. Perhaps append to the start of the data stream the connection to make?

Sorry for in post rants. I'm just trying to explain why it interests me. Thanks alot for your uber answers ;-)

,

JVFF

Candy · **Posted:** Thu Feb 08, 2007 10:08 am

Combuster wrote:

I think your wisdom is so über that it scared the rest away

I was hoping for a load of comments... I can't figure out what won't work with it and if it does work, why hasn't anybody else figured it out yet?

Quote:

Seriously, i dont see anything left to discuss here rather than questioning your design, which is basically advocating one's taste since it seems technically correct. (For me, I'd like have a copy of the dynamic variables that are implemented lazily)

Lazy variant

Attached file is non-lazy updating dynamic variables, link to server is lazy updating dynamic variables. Main difference is that the lazy one only recalculates when/if you use the output, the non-lazy one recalculates always. If you add an "if" construction to it the lazy variant may be faster in convoluted examples (since you don't re-evaluate the non-changing side).

Code:

int main() {
        dyn<int> w, x, y(29568);
        dyn<int> a = w * w,
                 b = y / x,
                 z = a + b,
                 c = z - 2;
        dyn<double> d = w;
        dyn<double> e = x;
        dyn<double> f = y;
        dyn<double> g = d * d,
                    h = f / e,
                    i = g + h,
                    j = i - 3.141592653589793;
        dyn<double> k = dyn<double>(a) * j + (f - 410.0);
        dyn<double> conv_t;
        conv_t = 0.5 * conv_t + e;
      printf("%d %d %d %d %d %d %d\n", a.value(), b.value(), z.value(), c.value(), w.value(), x.value(), y.value());
      printf("%f %f %f %f %f %f %f %f\n", g.value(), h.value(), i.value(), j.value(), k.value(), d.value(), e.value(), f.value());
      w = 5;
        printf("%d %d %d %d %d %d %d\n", a.value(), b.value(), z.value(), c.value(), w.value(), x.value(), y.value());
        printf("%f %f %f %f %f %f %f %f\n", g.value(), h.value(), i.value(), j.value(), k.value(), d.value(), e.value(), f.value());
        cout << conv_t << endl;
        for (int l = 0; l<12; l++) {
                x = l;
              printf("%d %d %d %d %d %d %d\n", a.value(), b.value(), z.value(), c.value(), w.value(), x.value(), y.value());
              printf("%f %f %f %f %f %f %f %f\n", g.value(), h.value(), i.value(), j.value(), k.value(), d.value(), e.value(), f.value());
                cout << conv_t << endl;
        }
}

To make this work on the lazy variant, remove the conv_t lines. Since its value depends on itself, the lazy variant can't retrieve the value (since it keeps calling back infinitely) but the non-lazy variant can forward the current value & let it converge. Of course, if you don't have a converging expression (constant not between -1 and 1, both exclusive), you'll get infinite or an infinite loop. You could use this to use dyn<bool>'s to emulate a flipflop built out of NAND gates, for example. If you, instead of sending out new values instantly, schedule an update somewhere you can convert this code fairly easily to a VHDL emulator.

jvff · **Joined:** Sun Oct 24, 2004 11:00 pm **Posts:** 46

Complemting the thread after some research:

http://en.wikipedia.org/wiki/Flow-based_programming
http://en.wikipedia.org/wiki/Hartmann_pipeline

Reply2:

Looking at Flow Based Programming (FBP), what I liked about it is how processes can be scheduled through threads, CPU time, CPU cores or even computes (aka. distributed computing). And possibly a network compiler could optimize a FBP program, knowing the limitations of a system and the program's components (later to be spawned into processes during run time), you could organize the way processes are scheduled. As an example, consider a process that has an input stream and an output stream. It processes information packets (IPs) independantly and forwards them to the output. If the compiler knows that the streams are processed independantly (and no internal state is altered), it could spawn two processes and split the work between them.

However the compiler could also help module generation. Consider an "add" module that given two inputs, it outputs the sum of them. The compiler could define a module "add" with 3 or more inputs, consisting on cascaded normal "add" modules. One way to do this is through templates. You could define templates with module "slots" which are replaced by the compiler when generating final program.

Another thing I think could be useful is information packets containing modules. This way you could create self-modifying programs simply by re-routing itself. The obvious drawback is possible predictability problems when passing corrupt data. But it could have advantages, for example a module could replace itself with a buffer after calculating a result, resulting in lazy evaluation. Still I don't know how modules in IPs could fit in FBP model, since there would have to be an "exit" module to reconfigure the network. Feedback?

JVFF

Edit: Added another link
Edit2: Since there were no replies, I added what would anther reply to this reply. (Ie. fused them to save posts in pages)

OSDev.org

Stream Oriented Programming?

Who is online