It sounds like you don't have enough stacks. The typical kernel design has one kernel stack per thread. Any threads that run in ring 3 will have an additional stack in ring 3, but that's irrelevant to your multitasking code - all you need to do to switch tasks is switch kernel stacks.
This wiki article does a pretty good job of explaining things, though keep in mind the example code is only an example - you shouldn't blindly trust it to be correct or ideal for your OS. (And, of course, you might not want the typical kernel design. But in that case, you should still learn how the usual design works before you try to come up with something else.)