Wednesday, March 23, 2005

 

Some notes on Thread - 2

这是以前某篇blog的重写版本

1.
Thread Local Storage - The C++ Way
By Roland Schwarz
from http://www.codeproject.com/useritems/tls.asp

Global data, while usually considered poor design, nevertheless often is a useful means to preserve state between related function calls. When it comes to using threads, the issue unfortuantely is complicated by the fact that some access synchronisation is needed, to avoid that more than one thread will modify the data.

There are times when you will want to have a globally visible object, while still having the data content accessible only to the calling thread, without holding off other threads that contend for the "same" global object. This is where thread local storage (TLS) comes in. TLS is something the operating system / threading subsystem provides, and by its very nature is rather low level.

From a globally visible object (in C++) you expect that its constructors are getting called before you enter "main", and that it is disposed properly, after you exit from "main". Consequently one would expect a thread local "global" object beeing constructed, when a thread starts up, and beeing destroyed when the thread exits. But this is not the case! Using the native API one can only have TLS that needs neither code to construct nor code to destruct.

While at first glance this is somewhat disappointing, there are reasons, not to automatically instantiate all these objects on every thread creation. A clean solution to this problem is presented e.g. in the "boost" library. Also the standard "pthread" C library addresses this problem properly. But when you need to use the native windows threading API, or need to write a library that, while making use of TLS, has no control over the threading API the client code is using, you are apparently lost.

Fortunately this is not true, and this is the topic of this article. The Windows Portable Executable (PE) format provides for support of TLS-Callbacks. Altough the documentation is hard to read, it can be done with current compilers i.e. MSVC 6.0,7.1,... Since noone else seemingly was using this feature before, and not even the C runtime library (CRT) is making use of it, you should be a little careful and watch out for undesired behaviour. Having said, that the CRT does not use it, does not mean it does not implement it. Unfortunately there is a small bug present in the MSVC 6.0 implementation, that is also worked-around by my code. (按:这里指的是PE文件的.tls 线程的本地存储器section,参见7/20-21/2004的blog)

If it turns out, that the concepts, presented in this article, prove to be workable in "real life", I would be glad if this article has helped to remove some dust from this topic and make it usable for a broader range of applications. I could e.g. think of a generalized atexit_thread function that makes use of the concepts presented here.

Before going to explain the gory details, I want to mention Aaron W. LaFramboise who made me aware of the existence of the TLS-Callback mechanism.

Using the code
If you are using the precompiled binaries, you simply will need to copy the *.lib files to a convenient directory where your compiler usually will find libraries. So you will copy the files from the include directory to a directory where your compiler searches for includes. Alternatively you may simply copy the files to your project directory.

The following is a simple demonstration of usage, to get you started.

#include
// first include the header file
#include

// this is your class
struct A {
A() : n(42) {
}
~A() {
}
int the_answer_is() {
int m = n;
n = 0;
return m;
}
int n;
};

// now define a tls wrapper of class A
tls_ptr< A > pA;

// this is the threaded procedure
void run(void*)
{
// instantiate a new "A"
pA.reset(new A);

// access the tls-object
ans = pA->the_answer_is();

// note, that we do not need to deallocate
// the object. This is getting done automagically
// when the thread exits.
}

int main(int argc, char* argv[])
{
// the main thread also gets a local copy of the tls.
pA.reset(new A);

// start the thread
_beginthread(&run, 0, 0);

// call into the main threads version
pA->the_answer_is();

// the "run" thread should have ended when we
// are exiting.
Sleep(10000);

// again we do not need to free our tls object.
// this is comparable in behaviour to objects
// at global scope.
return 0;
}
While at first glance it might appear natural that the tls-objects should not be wrapped as pointers, in fact it is not. While the objects are globally visible, they are still "delegates" that forward to a thread local copy. The natural way in C++ to express delegation is a pointer object. (The technical reason of course is, that you cannot overload the "." operator but "->" can be overloaded.)

You can use this mechanism when building a "*.exe" file of course, but you also can use it when building a "*.dll" image. However when you are planning to load your DLL by LoadLibary() you should define the macro TLS_ALLOC when building your DLL. This is not necessary when using your DLL by means of an import library. A similar restriction applies when delay-loading your DLL. Please consult your compiler documentation when you are interested in the reasons for this. (Defining TLS_ALLOC forces the use of the TlsAlloc() family functions from the Win32 API.) (按:这里可以参见Under the Hood 这篇大作,参见10/30-31/2003的blog)

The complete API is kept very simple:

tls_ptr< A > pA; // declare an object of class A
pA.reset(new A); // create a tls of class A when needed
pA.reset(new A(45)); // create a tls of class A with a custom constructor
// note, that this also deletes any prior objects
// that might have been allocated to pA
pA.release(); // same as pA.reset(0), releases the thread local object
A& refA = *pA; // get a temporary reference to the contained object for faster access
pA->the_answer_is(); // access the object
Please again note, that it is not necessary to explicitely call the destructors of your class (or release()). This is very handy, when you are writing a piece of code, that has no control over the calling threads, but must still be multithread safe. One caveat however: The destructors of your class are called _after_ the CRT code has ended the thread. Consequently when you are doing something fancy in your destructors, which causes the CRT to reallocate its internal thread local storage pointers, you will be left with a small memory leak of the CRT. This is comparable in effect to the case when you are using the native Win32 API functions to create a thread, instead of _beginthread().

In principle that is all you need. But wait! I mentioned a small bug in the version 6 of the compiler. Luckily it is easy to work around. I provided an include file tlsfix.h which you will need to include into your program. You need to make sure it is getting included before windows.h. To be more precise: the TLS library must be searched before the default CRT library. So you alternatively may specify the library on the command line on the first place, and omit the inclusion of tlsfix.h.

Background
I will not discuss the user interface in this place. It suffices to say, that it essentialy is the same as in the boost library. However I omitted the feature of beeing able to specify arbitrary deleter functions, since this would have raised the need to include the boost library in my code. I wanted to keep it small and just demonstrate the principles. However, my implementation also deviates from boost insofar as I am featuring native compiler support for TLS variables, thus gaining an almost 4 times speed improvement. No need to say, that my implementation of course is Windows specific.

When thinking about TLS for C++ the main question is how to run the constructors and destructors. A careful study of the PE format (e.g. in the MSDN library) reveals, that it almost ever provided for TLS support. (Thanks again to Aaron W. LaFramboise who read it carefully enough.) Of special interest is the section about TLS-Callback:

The program can provide one or more TLS callback functions (though Microsoft
compilers do not currently use this feature) to support additional
initialization and termination for TLS data objects. A typical reason to use
such a callback function would be to call constructors and destructors for
objects.
Well it is true, that the compilers do not use the feature, but there is nothing that prevents user code to use it though. One somehow must convince the compiler (to be honest it is the linker) to place your callback in a manner, so the operating system will call it. It turns out, that this is surprisingly simple (omitting the deatils for a moment).

// declare your callback
void NTAPI on_tls_callback(PVOID h, DWORD dwReason, PVOID pv)
{
if( DLL_THREAD_DETACH == dwReason )
basic_tls::thread_term();
}

// put a pointer in a special segment
#pragma data_seg(".CRT$XLB")
PIMAGE_TLS_CALLBACK p_thread_callback = on_tls_callback;
#pragma data_seg()
You can even add more callbacks, by appending pointers to the ".CRT$XLB" segment. The fancy definitions are available from the windows.h and winnt.h include files in turn.

Now about the details: You will find at times, that your callbacks are not getting called. The reason for this is when the linker does not correctly wire up your segments. It turns out, that this coincides with when you are not using any __declspec(thread) in your code. A further study of the PE format description reveals:

The Microsoft run-time library facilitates this process by defining a memory
image of the TLS Directory and giving it the special name “__tls_used” (Intel
x86 platforms) or “_tls_used” (other platforms). The linker looks for this
memory image and uses the data there to create the TLS Directory. Other
compilers that support TLS and work with the Microsoft linker must use this same
technique.
Consequentyly, when the linker does not find the _tls_used symbol it won't wire in your callbacks. Luckily this is easy to circumvent:

#pragma comment(linker, "/INCLUDE:__tls_used")
This will pull in the code from CRT that manages TLS. When using a version 7 compiler, that is all you need. (Actually I tried this with 7.1.) It turns out, however that using a version 6 compiler does not work. But the operating system cannot be the culprit, since code compiled by version 7 does work properly. After a little guess-work you will find out, that the CRT code from version 6 is slightly broken, because it inserts a wrong offset to the callback table. It is easy then to replace the errenous code and convince the linker to wire in the work around before the broken version from the CRT. You can study the tlsfix.c file from my submission, if you are interested in the details.

Points of Interest
Which is the first function of your program that is getting called by the operating system? Of course it is not main(). This was easy. Then mainCRTStartup specified as the entry-point in the linker comes to mind. Wrong again. Interestingly the first function beeing called is the Tls-Callback with Reason == DLL_PROCESS_ATTACH. But wait. Don't rely on this. This is not true on WinXP. I observed this on Win2000 only.

I did not yet try the code on Win95/98, WinXP-Home-Edition and Win2003. I would be interested on feedback about using this code on these platforms. In principle it should work, because it is a feature of PE and not the operating system, but ...

(按:这位朋友在CP上的文章修饰的太多,其实他在boost上面就比较坦荡,他说的bug其实是这样的)

from http://lists.boost.org/MailArchives/boost/msg68792.php

On Sun, 01 Aug 2004 16:41:18 -0500 "Aaron W. LaFramboise" wrote:

> Just as a FYI, I now have a copy of MSVC6, and am working on this.
>
> MSVC6 does, in fact, have the necessary support, but there is a bug (I
> had noticed this before, and this was one of the reasons I wasn't able
> to offer more information a few months ago, and I had entirely forgotten
> about it. Oops.). Fortunately, the bug is in the runtime library, not
> in the linker or anything else.

Yes the bug is, that the TLS handlers must be in a contiguous area
between the __xl_a and __xl_z symbols. I fixed this by running a small
piece of code during the startup (in __xi_a .. __xi_z area).

Finally I wrapped everything up into a small C file that either can be bound
to boost or be linked with the user application. Despite now having everything
in a single file, I think boost still should not give away the possibility of
letting the user code call the process/thread startup/termination hooks
directly. There always might be some code that needs this.

Thanks to Aaron now now have a TLS solution that can handle any thread
creation mechansim, while still reside in a statically bound library.

The tsstls.c file follows: To test it compile your application with BOOST_THREAD_USE_LIB
/*
Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

This piece of code is a result of the work of:
Aaron W.LaFramboise, who showed how to implement TLS-callback
Michael Glassford, who factored out the startup code
Bronek Kozicki, who showed me, that it is not harmful
to access the CRT after thread end
Roland Schwarz, who did the writing, runtime initialization
(.CRTXxx), correct dtor behaviour and broken MSVC 6 fix
08.02.2004

*/
#include
#define WIN32_LEAN_AND_MEAN
#include

typedef void (__cdecl *_PVFV)(void);
typedef void (NTAPI* _TLSCB)(HINSTANCE, DWORD, PVOID);

/* some symbols for connection to the runtime environment */
extern IMAGE_TLS_DIRECTORY _tls_used; /* the tls directory (located in .rdata segment) */
extern _TLSCB __xl_a[], __xl_z[]; /* tls initializers */

/* the boost tss startup interface */
extern void on_process_enter(void);
extern void on_process_exit(void);
extern void on_thread_exit(void);

/* some forward declarations */
static void on_tls_prepare(void);
static void on_process_init(void);
static void NTAPI on_thread_callback(HINSTANCE, DWORD, PVOID);

/* The .CRT$Xxx information is taken from Codeguru: */
/* http://www.codeguru.com/Cpp/misc/misc/threadsprocesses/article.php/c6945__2/ */

/* The tls glue code is to be run first */
/* I don't think it is necessary to run it */
/* at .CRT$XIB level, since we are only */
/* interested in thread detachement. But */
/* this could be changed easily if required. */
#pragma data_seg(".CRT$XIU")
static _PVFV p_tls_prepare = on_tls_prepare;
#pragma data_seg()

/* we need to get control after all global ctors */
#pragma data_seg(".CRT$XCU")
static _PVFV p_process_init = on_process_init;
#pragma data_seg()

/* this is the TLS callback */
#pragma data_seg(".CRT$XLB")
_TLSCB p_thread_callback = on_thread_callback;
#pragma data_seg()

/* we will run the termination late */
#pragma data_seg(".CRT$XTU")
static _PVFV p_process_exit = on_process_exit;
#pragma data_seg()

static void on_tls_prepare(void)
{
_TLSCB* pfbegin;
_TLSCB* pfend;
_TLSCB* pfdst;
pfbegin = __xl_a;
pfend = __xl_z;
/* the following line has an important side effect: */
/* if the TLS directory is not already there, it will */
/* be created by the linker. (_tls_used) */
pfdst = (_TLSCB*)_tls_used.AddressOfCallBacks;
/* the following loop will merge the address pointers */
/* into a contiguous area, since the tlssup code seems */
/* to require this (at least on MSVC 6) */
while (pfbegin < pfend) {
if (*pfbegin != 0) {
*pfdst = *pfbegin;
++pfdst;
}
++pfbegin;
}
}

static void on_process_init(void)
{
/* This hooks the main thread exit. It will run the */
/* termination before global dtors, but will not be run */
/* when 'quick' exiting the library! However, this is the */
/* standard behaviour for all global dtors anyways. */
atexit(on_thread_exit);

/* hand over to boost */
on_process_enter();
}

void NTAPI on_thread_callback(HINSTANCE h, DWORD dwReason, PVOID pv)
{
if(dwReason == DLL_THREAD_DETACH)
on_thread_exit();
}

void tss_cleanup_implemented(void) {};
----EOF---

Roland

(按:我们可以看到callback的威力。)

2.
对于GNU/Linux系统,系统内核目前多不支持TLS。我看得一段评论是这样的

“I don't think any POSIX standard covers TLS as yet. However, it's covered by an amendment to C99 and C++98, so it looks like it's going to be standard across all compliant implementations. From the GCC manual:

5.48.1 ISO/IEC 9899:1999 Edits for Thread-Local Storage
-------------------------------------------------------

The following are a set of changes to ISO/IEC 9899:1999 (aka C99) that
document the exact semantics of the language extension.

* `5.1.2 Execution environments'

Add new text after paragraph 1

Within either execution environment, a "thread" is a flow of
control within a program. It is implementation defined
whether or not there may be more than one thread associated
with a program. It is implementation defined how threads
beyond the first are created, the name and type of the
function called at thread startup, and how threads may be
terminated. However, objects with thread storage duration
shall be initialized before thread startup.

* `6.2.4 Storage durations of objects'

Add new text before paragraph 3

An object whose identifier is declared with the storage-class
specifier `__thread' has "thread storage duration". Its
lifetime is the entire execution of the thread, and its
stored value is initialized only once, prior to thread
startup.

* `6..1 Keywords'

Add `__thread'.

* `6.7.1 Storage-class specifiers'

Add `__thread' to the list of storage class specifiers in
paragraph 1.

Change paragraph 2 to

With the exception of `__thread', at most one storage-class
specifier may be given [...]. The `__thread' specifier may
be used alone, or immediately following `extern' or `static'.

Add new text after paragraph 6

The declaration of an identifier for a variable that has
block scope that specifies `__thread' shall also specify
either `extern' or `static'.

The `__thread' specifier shall be used only with variables.


5.48.2 ISO/IEC 14882:1998 Edits for Thread-Local Storage
--------------------------------------------------------

The following are a set of changes to ISO/IEC 14882:1998 (aka C++98)
that document the exact semantics of the language extension.

* [intro.execution]

New text after paragraph 4

A "thread" is a flow of control within the abstract machine.
It is implementation defined whether or not there may be more
than one thread.

New text after paragraph 7

It is unspecified whether additional action must be taken to
ensure when and whether side effects are visible to other
threads.

* [lex.key]

Add `__thread'.

* [basic.start.main]

Add after paragraph 5

The thread that begins execution at the `main' function is
called the "main thread". It is implementation defined how
functions beginning threads other than the main thread are
designated or typed. A function so designated, as well as
the `main' function, is called a "thread startup function".
It is implementation defined what happens if a thread startup
function returns. It is implementation defined what happens
to other threads when any thread calls `exit'.

* [basic.start.init]

Add after paragraph 4

The storage for an object of thread storage duration shall be
statically initialized before the first statement of the
thread startup function. An object of thread storage
duration shall not require dynamic initialization.

* [basic.start.term]

Add after paragraph 3

The type of an object with thread storage duration shall not
have a non-trivial destructor, nor shall it be an array type
whose elements (directly or indirectly) have non-trivial
destructors.

* [basic.stc]

Add "thread storage duration" to the list in paragraph 1.

Change paragraph 2

Thread, static, and automatic storage durations are
associated with objects introduced by declarations [...].

Add `__thread' to the list of specifiers in paragraph 3.

* [basic.stc.thread]

New section before [basic.stc.static]

The keyword `__thread' applied to a non-local object gives the
object thread storage duration.

A local variable or class data member declared both `static'
and `__thread' gives the variable or member thread storage
duration.

* [basic.stc.static]

Change paragraph 1

All objects which have neither thread storage duration,
dynamic storage duration nor are local [...].

* [dcl.stc]

Add `__thread' to the list in paragraph 1.

Change paragraph 1

With the exception of `__thread', at most one
STORAGE-CLASS-SPECIFIER shall appear in a given
DECL-SPECIFIER-SEQ. The `__thread' specifier may be used
alone, or immediately following the `extern' or `static'
specifiers. [...]

Add after paragraph 5

The `__thread' specifier can be applied only to the names of
objects and to anonymous unions.

* [class.mem]

Add after paragraph 6

Non-`static' members shall not be `__thread'.

例如,FreeBSD对Thread Local Storage (TLS)的支持可以从下表清楚的看出
http://people.freebsd.org/~linimon/architectures.html
对此,还有个小小的说明
http://people.freebsd.org/~marcel/tls.html
可见FreeBSD的TLS能力还在发展中,不过GCC已经部分准备好了。

3.
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
by Matt Pietrek
from http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndebug/html/msdn_peeringpe.asp

说明了编译器对于TLS的支持,他写到

When you use the compiler directive _ _declspec(thread), the data that you define doesn't go into either the .data or .bss sections. It ends up in the .tls section, which refers to "thread local storage," and is related to the TlsAlloc family of Win32 functions. When dealing with a .tls section, the memory manager sets up the page tables so that whenever a process switches threads, a new set of physical memory pages is mapped to the .tls section's address space. This permits per-thread global variables. In most cases, it is much easier to use this mechanism than to allocate memory on a per-thread basis and store its pointer in a TlsAlloc'ed slot.

There's one unfortunate note that must be added about the .tls section and _ _declspec(thread) variables. In Windows NT and Windows 95, this thread local storage mechanism won't work in a DLL if the DLL is loaded dynamically by LoadLibrary. In an EXE or an implicitly loaded DLL, everything works fine. If you can't implicitly link to the DLL, but need per-thread data, you'll have to fall back to using TlsAlloc and TlsGetValue with dynamically allocated memory.

Although the .rdata section usually falls between the .data and .bss sections, your program generally doesn't see or use the data in this section. The .rdata section is used for at least two things. First, in Microsoft linker-produced EXEs, the .rdata section holds the debug directory, which is only present in EXE files. (In TLINK32 EXEs, the debug directory is in a section named .debug.) The debug directory is an array of IMAGE_DEBUG_DIRECTORY structures. These structures hold information about the type, size, and location of the various types of debug information stored in the file. Three main types of debug information appear: CodeView, COFF, and FPO.

更为详细的介绍可以参见著名的The Programming Applications for Microsoft Windows book published by Microsoft Press。其21节专门讨论Thread Local Storage。NOTE: This book was formerly titled Advanced Windows。

来看几个应用

4.
句柄映射
作者:jiangsheng
出处:http://www.csdn.net/develop/article/23/23171.shtm

MFC在把句柄封装成对象方面不遗余力,为了保证同一线程内对象<->句柄映射是一对一的,创建了各种各样的句柄映射,窗口,GDI对象,菜单诸如此类。为了封装GetDlgItem,SelectObject这样的API返回的临时的句柄,MFC还产生临时的对象<->句柄映射。句柄映射使得GetParentFrame这样的函数可以实现。

CFrameWnd* CWnd::GetParentFrame() const
{
if (GetSafeHwnd() == NULL) // no Window attached
return NULL;

ASSERT_VALID(this);

CWnd* pParentWnd = GetParent(); // start with one parent up
while (pParentWnd != NULL)
{
if (pParentWnd->IsFrameWnd())
return (CFrameWnd*)pParentWnd;
pParentWnd = pParentWnd->GetParent();
}
return NULL;
}

_AFXWIN_INLINE CWnd* CWnd::GetParent() const
{ ASSERT(::IsWindow(m_hWnd)); return CWnd::FromHandle(::GetParent(m_hWnd)); }

看到了么,它首先调用API GetParent,然后去本线程的窗口<->句柄映射查找对象指针,然后调用CWnd::IsFrameWnd来决定对象是否是框架。(谢天谢地,这个函数是用虚函数而不是用CObject::IsKindOf,不然又得遍历一遍运行时类信息)

在一些经常调用的函数里面也使用到这个映射

LRESULT CALLBACK
AfxWndProc(HWND hWnd, UINT nMsg, WPARAM wParam, LPARAM lParam)
{
// special message which identifies the window as using AfxWndProc
if (nMsg == WM_QUERYAFXWNDPROC)
return 1;

// all other messages route through message map
CWnd* pWnd = CWnd::FromHandlePermanent(hWnd);
ASSERT(pWnd != NULL);
ASSERT(pWnd->m_hWnd == hWnd);
if (pWnd == NULL || pWnd->m_hWnd != hWnd)
return :efWindowProc(hWnd, nMsg, wParam, lParam);
return AfxCallWndProc(pWnd, hWnd, nMsg, wParam, lParam);
}
也就是说,它要遍历一遍afxMapHWND()返回的对象里面的永久的句柄映射。而这个函数在每个消息到达的时候都要调用。这是MFC应用程序性能损失的原因之一。

同样的,由于这些对象是被线程所拥有的,MFC的这些句柄映射的存储方式是线程局部存储(thread-local-storage ,TLS)。也就是说,对于同一个句柄,句柄映射中相应的对象可以不一致。这在多线程程序中会造成一些问题,参见微软知识库文章Q147578 CWnd Derived MFC Objects and Multi-threaded Applications
http://support.microsoft.com/default.aspx?scid=kb;EN-US;147578
记得有个同学曾经在此卡了一下。

5.
Microsoft编译了一个所有可能的错误代码列表,并且为每个错误代码分配了一个32位的号码. WinError.h头文件(大约2万多行)包含了Micorsoft定义的错误代码列表. 当一个Windows函数检测到一个错误时,它会使用线程本地存储(thread-local storage)机制,将相应的错误代码号码与调用的线程关联起来.这将使线程能够互相独立的运行,而不会影响各自的错误代码.

6.
线程和窗口
作者:lostall
出处:http://comcamp.myrice.com/techarticles/debug/0003.htm

一、Project结构

这是一个多线程MFC程序。在线程A中创建了一个窗口A,在线程B中创建了一个窗口B,并设置窗口B是窗口A的子窗口。


二、现象

因为窗口A是窗口B的父窗口,所以在线程A中可以轻松得到在另一个线程B中创建的窗口B的指针。但在这个指针上调用IsKindOf函数时总是出错。


三、调试过程

这个Bug并不难解,只是它说明的问题却比较有意思。
我是用GetWindow(GW_CHILD)来获得窗口B的句柄的。一开始我花了些时间考虑是否得到的这个句柄有效,主要是因为窗口A和窗口B不在一个线程,我担心多线程在这种情况下会有什么诡秘之处。不过没有任何异常。
我是用CWnd::FromHandle函数得到这个句柄的CWnd指针的。我发现这个指针在VC的Datatip窗口下显示为CTempWnd指针!这绝对是个异常。为什么它不是窗口B的类型呢?我跟进CWnd::FromHandle,以下是相关的代码:

CWnd::FromHandle( HWND hWnd
--> CHandleMap* pMap = afxMapHWND(TRUE); // create map if not exist
--> CWnd* pWnd = (CWnd*)pMap->FromHandle(hWnd);

CHandleMap* PASCAL afxMapHWND(BOOL bCreate)
{
AFX_MODULE_THREAD_STATE* pState = AfxGetModuleThreadState();
if (pState->m_pmapHWND == NULL && bCreate)
{
// 创建一个新的CHandleMap
}
return pState->m_pmapHWND; // CHandleMap* m_pmapHWND
}

AfxGetModuleThreadState()
--> return AfxGetModuleState()->m_thread.GetData();
--> _AFX_THREAD_STATE* pState = _afxThreadState;// AFX_DATADEF CThreadLocal<_AFX_THREAD_STATE> _afxThreadState
--> return pState->m_pModuleState;

由ThreadState得到ModuleState,再由ModuleState得到ModuleThreadState。
顾名思义,_afxThreadState是线程本地存储。
而m_thread的定义为:AFX_DATADEF CThreadLocal m_thread,它也是线程本地存储。 继续跟踪CThreadLocal的话,可以看到线程本地存储的标志:TlsAlloc和TlsFree。

CHandleMap::FromHandle(HANDLE h)
--> CObject* pObject = LookupPermanent(h); // 从Permanent映射表中取该句柄对应的真正的窗口指针
--> return (CObject*)m_permanentMap.GetValueAt((LPVOID)h);
--> 若pObject不为空,则返回,否则
pObject = LookupTemporary(h); // 从Temporary映射表中取该句柄对应的临时窗口指针
--> 若pObject不为空,则返回,否则
// This handle wasn't created by us, so we must create a temporary C++ object to wrap it.
创建一个临时的窗口对象,它是CTempWnd类型,CTempWnd是从CWnd继承的一个没有任何成员的空类。

真相已经大白。窗口B的句柄和CWnd*指针的映射保存在线程B的HandleMap中,而在线程A中调用FromHandle时,因为线程A的HandleMap中没有保存这个映射,所以得到的pObject为NULL,所以创建一个临时窗口对象,即CTempWnd*。在CTempWnd*上调用IsKindOf不出错才奇怪了。

为什么HandleMap要是线程本地存储呢?原因似乎很明显,因为每个线程有自己的消息队列。它只会把消息分发给属于自己的窗口。

还有最后一件事需要验证:是否是在窗口创建时,把句柄与CWnd*指针的映射关系加进了HandleMap中。跟踪CreateWindowEx可以验证这一点:

::CreateWindowEx
--> _AfxCbtFilterHook
--> CWnd::Attach
--> CHandleMap* pMap = afxMapHWND(TRUE);
pMap->SetPermanent(m_hWnd = hWndNew, this);

四、经验教训

本案例再次告诉我们,MFC在提供巨大便利的同时也隐藏了太多的细节,深入研究其源代码是学好MFC的唯一捷径。

7.
by duyanning
from http://dev.csdn.net/article/29/29430.shtm

对于局部变量,不同的线程每一次调用函数时都会在栈上得到该变量的一份新的拷贝,全局的和静态的变量则只有一份实体,MFC提供了一种机制,使得可以像定义全局变量一样定义
线程局部数据,所谓线程局部数据是指对于每一个访问它们的线程都会有一份只属于该线程的拷贝。

可使用宏THREAD_LOCAL(class_name, ident_name)定义线程局部数据,THREAD_LOCAL定义如下:

#define THREAD_LOCAL(class_name, ident_name) ?AFX_DATADEF CThreadLocal ident_name;

其中AFX_DATADEF是个占位符,class_name为线程局部数据的类型,ident_name为线程局部数据的名称,
class_name必须派生于CNoTrackObject。

举例:
struct CMyThreadData : public CNoTrackObject
{
?? CString strThread;
};

THREAD_LOCAL(CMyThreadData, threadData)
上面这行相当于
CThreadLocal threadData;
定义了线程局部数据threadData,其实threadData依然是全局的,并不是真的的线程局部,但它的所有成员都是货真价实的线程局部数据,无论有多少个线程访问threadData->strThread都会得到一份只属于自己的拷贝,这些拷贝其实是分配在堆中的,分配过程发生在每个线程首次访问时。

CThreadLocal重载了operator->,每个线程首次访问其成员时,它会new一个CMyThreadData出来放在堆上,那CThreadLocal又是靠什么记住堆上的哪个CMyThreadData是属于哪个线程的呢?答案是TLS。

CThreadLocal派生于CThreadLocalObject,CThreadLocalObject有一个成员m_nSlot,对于每一个由THREAD_LOCAL定义产生的CThreadLocalObject子对象,有独一无二的值。

跳空到DotNet框架下继续谈线程本地储存。

首先,我们注意到rotor代码中,CLR loader并没有检测TLS可能存在的callback。
http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/clr/vm/peverifier_8cpp-source.html
中的PEVerifier::CheckDirectories() 方法只有

static DWORD s_dwAllowedBitmap =
((1 << (IMAGE_DIRECTORY_ENTRY_IMPORT )) |
(1 << (IMAGE_DIRECTORY_ENTRY_RESOURCE )) |
(1 << (IMAGE_DIRECTORY_ENTRY_SECURITY )) |
(1 << (IMAGE_DIRECTORY_ENTRY_BASERELOC)) |
(1 << (IMAGE_DIRECTORY_ENTRY_DEBUG )) |
(1 << (IMAGE_DIRECTORY_ENTRY_IAT )) |
(1 << (IMAGE_DIRECTORY_ENTRY_COMHEADER)));

CLR这么做当然是从使用的安全性出发。那么,DotNet中,我们怎么设置Thread的本地化呢?答案是ThreadStaticAttribute class。其属于System域名空间,派生方式是

System.Object
System.Attribute
System.ThreadStaticAttribute

在rotor代码中,我们只能看到如下内容

namespace System {

using System;

///
[AttributeUsage(AttributeTargets.Field, Inherited = false),Serializable()]
public class ThreadStaticAttribute : Attribute
{
///
public ThreadStaticAttribute()
{
}
}
}

没有代码,不爽不爽。对此,CB的解释是这样的
http://blogs.msdn.com/cbrumme/archive/2003/04/15/51317.aspx

Static Fields
By default, static fields are scoped to AppDomains. In other words, each AppDomain gets its own copy of all the static fields for the types that are loaded into that AppDomain. This is independent of whether the code was loaded as domain-neutral or not. Loading code as domain neutral affects whether we can share the code and certain other runtime structures. It is not supposed to have any effect other than performance.

Although per-AppDomain is the default for static fields, there are 3 other possibilities:

RVA-based static fields are process-global. These are restricted to scalars and value types, because we do not want to allow objects to bleed across AppDomain boundaries. That would cause all sorts of problems, especially during AppDomain unloads. Some languages like ILASM and MC++ make it convenient to define RVA-based static fields. Most languages do not.

Static fields marked with System.ThreadStaticAttribute are scoped per-thread per-AppDomain. You get convenient declarative thread-local storage over and above the normal per-AppDomain cloning of static fields.

Static fields marked with System.ContextStaticAttribute are scoped per-context per-AppDomain. If you are using managed contexts and ContextBoundObject, this is a convenient way to get storage cloned in each managed context.

We considered (briefly) building thread-relative and context-relative versions of the existing .cctor class constructor. But that’s a lot of machinery to ensure that all static fields are initialized via a constructor that is coordinated by the system.

Instead, our docs recommend against initializing your thread-relative and context-relative static fields in a .cctor. The reason is that a .cctor executes only once per AppDomain. The static fields will get initialized in whatever thread and context the .cctor happens to run in. But all subsequent threads and contexts will have uninitialized data.

So the model you have today is that you should be prepared to initialize your thread-relative and context-relative statics on first use. This is fairly easy to do since we guarantee these statics are first initialized to 0. So you can use a thread-relative or context-relative static Boolean field (inited to false) or static Object reference (inited to null) to indicate that initialization hasn’t occurred yet.

JGTM'2004在http://blog.joycode.com/jgtm2000/posts/11379.aspx 说道

  TLS(线程局部存储,thread local storage)在类库和多线程应用开发中是个有用的东东,在很多语言和工具中都有很好的支持(如Visual C++里面的__declspec(thread),Delphi中的threadvar等等,Win32 API中也有对应的Tls族函数)。有些刚接触.NET的朋友就开始抱怨了,说在管制环境下没有TLS了,得自己写了。其实不然,虽然在C#/VB.NET等语言中没有直接的关键字或语句来声明TLS,但是CLR通过定制属性更直观的支持着这一特性,这个属性就是ThreadStaticAttribute。

  如果你希望一个静态成员(static in C#, Shared in VB.NET)对于不同的线程(更准确的说,app-domain与线程的组合)有不同的值(也即TLS的行为),则只需要为其设置ThreadStatic属性就可以了,无需作任何编程处理(当然这是declarative的做法,相应的programmatic方法也有,具体的可以参见Thread.AllocateDataSlot和Thread.AllocateNamedDataSlot方法,或检索.NET SDK Documentation Index中的TLS条目)。

  文档中提醒一点要注意的是,任何访问线程局部静态成员的代码,只要不是运行在访问该成员所在类的第一个线程上时,都应该将该成员看作是null引用(引用类型)或默认初始值(值类型)。也就是说,不要依赖于类的构造器去初始化ThreadStatic成员,原因是显而易见的。

  另外,在ASP.NET等多线程环境中使用TLS成员也要慎重,因为这些线程的生命周期不是由你来控制的,它们是从HttpRuntime管理的线程池中被重用的,因此在一次请求中使用的TLS成员有可能在另一次毫不相关的请求中被得到或修改(除非这就是你希望的效果)。如果希望使用请求相关的存储环境,可以考虑使用HttpContext.Current实例的Items集合(该集合在Server.Transfer等情况下可用于在同一次请求的不同页面间传递和共享状态)。

lostinet在 http://blog.joycode.com/lostinet/posts/22026.aspx
也说了一下ThreadStaticAttribute。

ThreadStaticAttribute 的作用是告诉CLR,它标记的静态字段的存取是依赖当前线程,而独立于其他线程的。

例如:

class MyClass{
[ThreadStatic] static public string threadvalue;
}

MyClass 中的threadvalue就是一个线程静态字段 。 如果一个程序中同时有多个线程同时访问这个字段,则每个线程访问的都是独立的threadvalue 。例如线程1设置它为”hello”,然后线程2设置它为”world”,最后线程1读取它的时候,得到的是”hello”。

基于这个,线程静态字段有以下特征:

它是静态的字段。所以不需要MyClass的实例,直接用 MyClass.threadvalue的形式来访问就可以了。
它的存取是根据线程来指定内存位置的,所以它的存取速度较慢。
访问线程静态字段不可能发生线程不同步问题。因为虽然语意上不同线程访问的是同一字段,但实际上不同线程访问的是不同的内存块。
一条线程不可能访问到另外一条线程上的线程静态字段。就算你得到另外一条线程的System.Threading.Thread 对象的引用也不行。
但是,使用线程静态字段要注意:

字段上的初始化语句在类的静态构造方法中执行。所以语句只执行一句。其他线程再访问这个字段时,为字段的类型的默认值。
如果不同线程上的字段都引用同一对象,那么不代表该对象是线程同步的。因为[ThreadStatic]指的是字段是隔离的。但是它引用的对象则不被[ThreadStatic]控制。
如果你知道 System.Runtime.Remoting.Messaging.Context (以下简称MContext)
那么MContext和线程静态字段有什么不同呢?

他们可以说都是与当前线程相关的。但是他们不是同一个东西。
MContext是基于名称的,可以根据名称储存不同的数据。而ThreadStatic不会有名称冲突。
ThreadStatic 是CLR内部实现的。而 MContext 是附属在 System.Threading.Thread 对象的一个字典。
线程静态字段的存取速度比MContext的快
MContext被叫做逻辑线程上下文数据。它的数据会在异步调用的时候复制到另外一条线程中。而线程静态字段是不会被复制的。(例如 eventHandlerInst.BeginInvoke(...)时,在新的线程中,就拥有原线程上MContext的数据。在eventHandlerInst.EndInvoke执行时,新线程上的MContext上的数据就会还原到调用EndInvoke的线程上.这个在以后讲到Remoting时会详细说)
System.Web.HttpContext.Current 是用 MContext 实现的。

[ThreadStatic]的一些应用的例子?

你可能需要模仿HttpContext,弄个MyClass.Current , 或者根据Singleton模式,弄个ThreadSingleton.你可能会有工作线程。你以往是通过线程上传递一个对象用于共享数据的。这样所有的对象啊,方法啊都变得怪怪的。现在你可以把这些数据直接以[ThreadStatic]的形式存取了。
你可能需要弄个这样的类:

以下内容为程序代码:

//未经测试
public class ConnHolder : IDisposable
{
[ThreadStatic] SqlConnection threadconn;
bool createbyme=false;
public ConnHolder()
{
if(threadconn==null)
{
threadconn=new SqlConnection(Config.ConnectionString);
createbyme=true;
}
}
public SqlConnection Connection
{
get
{
SqlConnection conn=threadconn;
if(conn.State!=ConnectionState.Open)
{
conn.Open();
}
return conn;
}
}
public void Dispose()
{
if(!createbyme)return;
try
{
threadconn.Dispose();
}
finally
{
threadconn=null;
}
}
}

使用:
using(ConnHolder ch=new ConnHolder())
{
using(SqlCommand cmd=new SqlCommand(ch.Connection))
{
//...
}
}

(按:有趣的方案)

最后是Brian Grunkemeyer 的一个预测(的确,我们只能说是预测)

Brian Grunkemeyer
.NET Framework Base Class Library team

In V1 and V1.1, each logical thread corresponded with exactly one physical
thread in the OS. We also provide a Threadpool used for async IO
operations, async delegate invokations, and to process any requests that go through QueueUserWorkItem. The choice of GC doesn't limit the number of threads in an application to one processor or anything like that. It may affect your scalability and performance though.

Note that in version 2, in some hosted enviroments like SQL Server,
multiple logical threads may be multi-tasked on the same physical thread,
using fibers. We might possibly add multiple finalizer threads in a future
version as well. Rest assured that we don't limit all your threads to one
processor (unless you set the thread's affinity), and don't make many
assumptions in your code about exactly which OS thread is running your
code. You can safely use Thread's AllocateDataSlot and
AllocateNamedDataSlot if you need a managed equivalent to thread-local
storage.

If you have any questions about the GC implementation, please read chapter 19 in Jeffrey Richter's "Applied Microsoft .NET Framework Programming". Understanding when to use finalizers & the dispose pattern and being able to use a managed memory profiler on your code will almost certainly give you a better return for your time than trying to figure out whether to use the server or workstation GC.

关于这一点,CB还有一个很长的精彩论述,不能不看,由于太长,这里就不转啦
Hosting
http://blogs.msdn.com/cbrumme/archive/2004/02/21/77595.aspx

看来一时半会掌握这些就够了。Brian Grunkemeyer的blog中说微软即将开发的Threading的新功能是

Semaphore class which was a missing functionality in the framework.
Named Events to enhance the cross-process communication.
Abandoned mutex detection.

因为用户需求至上。比如它在
http://blogs.msdn.com/bclteam/archive/2004/01/19/60368.aspx
中说道

Gheorghe Marius asked this question:

Question: Are there any plans to add support for memory mapped files in Whidbey ? If no...why ?

The answer is no Gheorghe. The why comes down to priorities, there simply wasn't enough interest in the feature at the point we were deciding what makes it into Whidbey, and what doesn't. We have become aware recently of the demand for this item, and we will be exploring it for the near future after Whidbey.

想起去年hbifts,flier等人的讨论,不觉一乐。



<< Home

This page is powered by Blogger. Isn't yours?