OSwhatever wrote:
Hypothetically, When you have TLS access (with __tls_get_addr, must be as it is a shared library) it will discover that the generation number is out of date. The size of the DTV might be the same so no reason to resize. Then it can go through the DTV in order to check if any modules have been unloaded and set those entries to NULL. However, if a new module is in the same spot as an old module that was previously unloaded, then there is no way to determine that the pointer to the local TLS area belongs to the old or new module with only a pointer. You need extra information in the DTV entry in order to determine that, like a generation number. If you have that then you can detect a new module is in the same spot and run the initialization code for that TLS area.
Why would you unload the TLS lazily? After unloading a module with TLS, there is no reason to presume that any other module with TLS even remains in the process, or accesses the TLS soon after. No, I was thinking the thread calling dlclose() could just iterate over all other threads and set the DTV pointer for that module to NULL. This of course requires having a good thread list implementation. Then, next time dlopen() is called (on a module with TLS, natch), it can just check if it has a NULL pointer in the existing DTV and reuse the DTV number instead of increasing the size.
Having dlopen() allocate the TLS memory would allow it to fail on memory exhaustion. __tls_get_addr() cannot fail, it can only crash (well, abort, but there is no real difference).
OSwhatever wrote:
Correct that the keys have nothing to do with DTV entries but implementation wise it could be possible. The infrastructure is already there and it can be reused rather than having yet another vector for the thread keys.
As I tried to say, it unfortunately fails to match with the specification. You have to iterate over all threads and set the new TSD pointer NULL either in pthread_key_create() or pthread_key_delete(). Which is easy if you have the TSD vector as part of the thread descriptor, similar to the DTV vector, but next to impossible if you have the TSD vector as some thread-local array in libc's TLS memory.
OSwhatever wrote:
This paper is from 2006 and this model is still not used everywhere. Some things are introduced slowly.
The TLS paper itself is only from a few years prior to that. Bear in mind that through most of the 90ies, threading was this weird research project some people were apparently excited about, but the Unix buffs didn't get the hype at the time. And the now-ubiquitous NPTL implementation of POSIX threads on Linux would also take some time to develop (and before that you had this weird system with the thread server, where the threads were actually different processes).
No, I don't think its young age is the reason for lack of adoption of this extension, it is because it does not solve a pressing need. Several CPU extensions were rolled out in the time since then and have seen greater adoption, partly because they actually do solve a problem.