OS Support For Transactional Memory

Kevin McGuire · Post by **Kevin McGuire** » Sun Jun 10, 2007 4:55 am

I forgot to stomp down the fact that you call something a average. A average is always going to be between range of values. In your case the so called average goes out of bounds so to speak.

T = Pa * Ta + Pb * Tb

Lets simplify this.

T = Pa * Ta

Ta should be 3 cycles if the instruction is in cache, or (making a number up) 7 cycles if in cache. The Pa represents the probability that it is in cache.

T = Pa * Ta
2.7 = .9 * 3 (wrong)
6.3 = .9 * 7 (wrong)

Wrong, because: (((7-3)*.9)+3) = 6.3 = .9 * 7; Is Wrong.

Let me do another..

0.7 = .1 * 7 (wrong) (Out Of Bounds)

Wrong. Wrong. Wrong. Wrong. Wrong. Wrong

Is that enough. Are you eyes working today?

Get a grip man.. I mean what in the world did you think you were doing? It is almost just hysterical to look at. You might make someone think they are going crazy. I would love to see you insert that into a paper somewhere and explain it like you did. That would be the laughing stock for years.

Also keep in mind that I hold a shovel for most of the day. I shovel dirt, gravel, and concrete into holes. The hot sun is always shinning on me. I make 8.84 a hour. I am at the bottom of the totem pole. Why should I be correcting you? How about we exchange jobs?

Also to note is that I really would be far less critical if you had not continued going with it as if it is correct when I know very well you know it is not since you changed the equation when making that post with the really complicated equation in it.

I also know the equation was not supposed to be accurate but:
1) You never stated that in the beginning, and posted a unit of cycles as a result.
2) It is no where near close. Except if you give it certain values!

jnc100 · Post by **jnc100** » Sun Jun 10, 2007 6:05 am

Brendan, if I understand him correctly, is saying that the best case scenario is when the lock is free and we try to acquire it. He estimates that this will take 55.5 cycles and will occur 90% of the time. The other 10% of the time the worst case scenario applies, that the lock is not free and we have to wait for another processor to finish with it and then acquire it. In that case, estimating again, he says that the wait-and-lock process will take 388 cycles.

From elementary statistics,

E(x) = sum of(xi * p(xi))

where E(x) is the expected value of x, xi is a particular value of x and p(xi) is the probability of that value occurring.

This gives E(x) = 55.5 * 0.9 + 388 * 0.1 = 88.75 cycles

which is indeed between the range of best case (55.5 cycles) and worst case (388 cycles).

Regards,
John.

Kevin McGuire · Post by **Kevin McGuire** » Sun Jun 10, 2007 6:53 am

I just got obliterated. I do dig ditches for a reason.
((55.5 * 90) + (388 * 10))/100 = 55.5 * .9 + 388 * .1

Losing an illusion makes you wiser than finding a truth. -Ludwig Borne

Colonel Kernel · Post by **Colonel Kernel** » Sun Jun 10, 2007 11:58 am

@jnc100:
Thank you for properly explaining expected value. I knew I did a half-assed job of it above.

Kevin McGuire wrote:Also keep in mind that I hold a shovel for most of the day. I shovel dirt, gravel, and concrete into holes. The hot sun is always shinning on me. I make 8.84 a hour. I am at the bottom of the totem pole. Why should I be correcting you? How about we exchange jobs?

Why is it that Brendan and others can keep a cool head during these conversations and you can't? Food for thought.

I just got obliterated. I do dig ditches for a reason.
((55.5 * 90) + (388 * 10))/100 = 55.5 * .9 + 388 * .1

I think you owe us an apology.

@bewing:
I think I understand what you mean now. I've never heard that technique called "funnels" before, but yes, it does sound like the actor model, sort of. The actor model comes from OO design. An "actor" is an object that has its own dedicated thread of execution. All messages sent to and from that object have to be marshaled from other threads, usually through some kind of queue. This model is the core principle of Erlang, for example.

I take issue with the way you present your model, however. Firstly, yes, copying messages between threads via shared memory is IPC. I'm not sure there would be a need for an "IPC service thread", since the shared memory mechanism is already in place. Secondly, you mentioned that each processor might have its own scheduler running. If this is the case, then the "kernel" is not really dedicated to only one processor.

What you've described sounds to me like it is conceptually similar to the architecture of Mac OS X, but with kernel_task set to run on only one processor. First some background -- the OS X kernel is a hybrid of Mach and some higher-level components such as the BSD layer. Way back when Mach was still a proper microkernel, those other components executed in separate processes. In OS X, they still sort of do, but they run in kernel mode so that device drivers can be called without the overhead of IPC. However, unlike most monolithic kernels, the kernel_task portion of Mac OS X actually has its own address space. So in a sense, the "kernel" is separated completely from other processes and can in theory be dedicated to only one processor. In reality, the Mach portions still have to be mapped in all address spaces, because they control low-level thread scheduling, interrupt handling, and IPC. To me, this sounds a lot like your system (except your IPC sounds more primitive than Mach's, and maybe more amenable to being implemented without spinlocks). But IMO it's incorrect to say that the kernel only runs on one processor in this kind of system.

Kevin McGuire · Post by **Kevin McGuire** » Sun Jun 10, 2007 12:50 pm

I am sorry. I apologize.

bewing · Post by **bewing** » Sun Jun 10, 2007 10:11 pm

Colonel Kernel wrote: Firstly, yes, copying messages between threads via shared memory is IPC.

There are many many different forms of IPC. Yes, shared memory can be used for that purpose -- however, I consider the concept of shared memory to be a much broader one than simply an interprocess communication technique. In my methodology, every single different method of IPC, from semaphores, to signals, to formal messages, is individually owned by a different "manager" thread -- with shared memory being the one and only primitive that is not managed in that way. Because it is different in nature from the other types of IPC.

Colonel Kernel wrote:Secondly, you mentioned that each processor might have its own scheduler running. If this is the case, then the "kernel" is not really dedicated to only one processor.

I have a much more restrictive definition of the word "kernel" than you do, apparently. A scheduler is just a simple executive. A manager. The operation of the scheduler is almost incidental to the hardware resources that the kernel provides to the running tasks. The only resource the scheduler provides is timeslices -- and really, it could be completely replaced by a simple round-robin loop of code in the IRQ0 handler. My version of the kernel, on CPU#0, "owns" all the hardware on the machine in my implementation. It is the funnel for hardware requests for all the other CPUs. IPC and mutexes are not hardware requests.

So we have some differences of opinion. Eek!