Monday, August 30, 2004

 

Some notes on IOCP Thread Pooling in C# - 2

1.
Managed I/O Completion Ports (IOCP)
By P.Adityanand

A fully managed .NET implementation of Win32 IOCP's waitable event queuing mechanism.

http://www.codeproject.com/csharp/managediocp.asp

Re: Another article with similar topic... P.Adityanand 8:43 4 May '05

Hi,

The link you provided is an article (he means the article I cited yesterday http://www.devarticles.com/c/a/C-Sharp/IOCP-Thread-Pooling-in-C-sharp-Part-I/) that shows how to build thread pools using Win32 Native IOCP. The _difference_ between my article and the one you mentioned is that I implemented the waitable queuing mechanism of Win32 IOCP itself, so that you can create thread pools in .Net without PInvoke.

Also the ManagedIOCP that I posted is _not_ a thread pool class. It allows you to create thread pools, like Win32 IOCP, but without PInvoke. You can do much more with my ManagedIOCP class. Read the complete article.

Hope this helps...

Thanks,
Aditya.P

Re: Threadpool P.Adityanand 14:38 5 May '05

Hi,

Firstly it would be comfortable addressing people with some identity. So, next time you post, please provide your name (atleast nick name or original) so that I can address you properly. This is only a request and if you want to remain anonymous it is your wish.

Coming to your point of .Net ThreadPool. This is a good and interesting question. I'll try my best to clarify this for the benefit of other readers too. If anyone feel that I'm not correct in any of the points discussed below, please feel free to point it out. It would be a good learning experience for me too.

The current implementation of .Net ThreadPool class uses Win32 I/O Completion Ports, but only based on what type of operation you are doing. If you have queued a worker item that does not perform async I/O operation, then .Net ThreadPool uses one of its Worker threads to handle it. This is fine if we can control the concurrency factor of the worker threads used by ThreadPool. But unfortunately .Net ThreadPool does not allow you to specify how many concurent threads (both worker and IOCP) should be running in parallel.

So if you queued 30 worker items, then a max of 25 worker threads would be operating on them, which would degrade the performance due to heavy context switching between 25 worker threads. Also if you are using some common data structures and objects, it will add up to lock contention, lock convoying effects and thread priority inversions that are common cause of performance problems in multi-threaded applications. IOCP based approach like ManagedIOCP with control on several factors like concurrency management, queued object count, etc would be far better choice than a .Net ThreadPool in this scenario. With ManagedIOCP you can create your own ThreadPool easily with controlled concurrency.

Also ThreadPool (either .Net ThreadPool or custom developed using my ManagedIOCP) is good when you have multiple consumers. Like in socket server application where in you have multiple client connections. But when you have multiple producers and a single consumer, then ThreadPool is not a good design choice. For instance if you want to log messages to a file in a multi-threaded application, there may be multiple known threads logging messages.

But for the log messages to make sense you would queue them up and use a single thread to log messages in the order they arrive into the queue. This is a typical multiple producer and single consumer scenario. In this case ThreadPool will not help you, but you need a waitable queue mechanism. This can be achieved easily using IOCP's waitable event queue feature. And ManagedIOCP provides this feature in a fully managed .Net environment.

Even in multiple producer and multiple consumer scenario like socket servers, With .Net ThreadPool and its dedicated IOCP threads, there would still be some problem in utilizing it in building scalable socket server applications. I believe so because the socket server is composed of two main tasks.
1. Read & write to client socket connections.
2. Process client messages.
First one can be achieved perfectly using .Net ThreadPool as it uses IOCP threads when dealing with async I/O like socket read & writes.
But if you try to do second thing, processing a completely read client message, inside a delegate fired by one of hte IOCP threads, then you will end up blocking critical IOCP threads in ThreadPool for message processing. This will slow down your ability to process client requests, which will result in time-outs for client connections, increased wait time for client applications, etc.Then, if you try to post the message processing request back to the .Net ThreadPool, you will risk running un-controlled number of concurrent worker threads processing your messages. This will bring down the performance of your application.

In this scenario, you could use .Net ThreadPool for async socket I/O and can use my ManagedIOCP for message processing using controlled number of concurrent threads.

So, finally as you can see from above discussion, you can achieve much more using my ManagedIOCP as an independent library for your multi-threaded server side applications or in conjunction with .Net ThreadPool to deal with async I/O based server applications like socket servers, etc.

My point is not to prove the superiority of my ManagedIOCP over .Net ThreadPool. I wanted to share my knowledge so that people can benifit from it and wanted to show where to use .Net ThreadPool, where to use ManagedIOCP and where to use both.

Hope this clarifies your question "Anonymous".

If someone is interested, below are links to two article that discuss .Net ThreadPool and its relation to IOCP in depth.

1. http://www.dotnet247.com/247reference/msgs/26/132978.aspx
2. http://www.dotnet247.com/247reference/msgs/51/257134.aspx

Regards,
Aditya.P

Re: Use for Asynchronous Web Services Craig Neuwirt 12:56 1 Jul '05

Hello P.Adityanand,

Thanks for the speedy response. I think it would be a great fit, but I am still a little confused about how to apply your recommendation. Let me summarize in details

- Write an IHttpAsyncHandler to accept Http Requests in the BeginProcessRequest method
- Create a custom IAsyncResult object that encapsulates the request.
- Return the custom IAsyncResult from the BeginProcessRequest method
- Pass the custom IAsyncResult (containing the request) to the ManagedIOCP wrapper for dispatch to a worker thread
- A worker thread picks it up and make the long running Web Service call.
? Since this is a long running call, won't all my worker threads end up waiting on I/O and starve all remaining requests?
- When the Web Service response is received, the custom IAsyncResult is triggered which will notify ASP.NET that the request is complete.

My main question is what happens to the threads waiting on the long Web Service response? If I have 5 worker threads, won't they all be consumed quickly (even though they are doing almost no CPU) which leaves no remaining support for incoming requests. Must I do Async sends to the seconds Web Service to prevent this? This is a general question concerning what can be done in the worker threads. It seems that this model won't work if they are I/O bound. Is this true or am I confused

Thanks again,
craig

Hi Craig,

Your Quote>
"My main question is what happens to the threads waiting on the long Web Service response? If I have 5 worker threads, won't they all be consumed quickly (even though they are doing almost no CPU) which leaves no remaining support for incoming requests."
/Your Quote>

I understand your concern on the above point. Let me explain how ManagedIOCP can handle this situation. Let us say you have 50 waiting threads on ManagedIOCP. No harm because they are all in waiting state. You specified 5 as the number of concurrent threads that this ManagedIOCP instance should wake up to process dispatched objects in parallel. Now you have 5 requests from ASP.Net to process. ManagedIOCP will wake up 5 threads from the 50 threads that are waiting on it to process the five requests. Now assume all these five threads call into another Web Service and go into suspended mode waiting on its response. Now let us say another 5 requests are dispatched to ManagedIOCP by your IHttpAsyncHandler. Now the ManagedIOCP sees that the 5 threads are in suspended mode and there are less number of active threads (0) that are allowed to handle dispatches concurrently, it wakes up 5 more threads to handle the new set of 5 requests.

This way ManagedIOCP guarantees that _atleast_ the desired number of active threads are processing the objects dispatched to it. This is perfectly fine because initial 5 threads are suspended and are not running. Also this will not inadvertently overshoot the number of threads because you control the number of threads that are waiting on the ManagedIOCP to process objects. In our case it is 50. But the way ManagedIOCP works, atleast 5 threads will be always processing your objects. In some cases where let us say our 5 inital threads are suspended on second level of web service calls and ManagedIOCP wake up 5 more new threads to process pending requests. In this case there may be a situation where the initial 5 threads may return from the second web service call and strat doing some processing while the second set of 5 threads are also doing some processing. So now there would be 10 threads running in parallel. But this would be for only some amount of time. After that only 5 threads would be processing the requests.

Also note that any object dispatched to ManagedIOCP will go into its Queue and will be serviced by waiting threads at one point or other.

Your Quote>
"Must I do Async sends to the seconds Web Service to prevent this?"
/Your QUote>

You should not do the above, as calling a second web service asynchronously will draw a thread from same process wide thread pool used by ASP.Net requests, which would defeat the purpose of using ManagedIOCP or anyother custom threading altogether.

Hope this clarifies your doubts. Let me know if not.

Thanks,
Aditya.P

2.
From http://www.cnblogs.com/asilas/archive/2006/01/05/311309.html

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-05 15:05 Sumtec
如果我没有记错的话,.NET里面的很多异步IO,背后都是用的IOCP的。也就是说,用BeginXXX/EndXXX实际上应该就是使用的IOCP。不太记得了,我要查一下。 回复

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-05 15:16 Sumtec
刚查过了,确认.NET背后的机制就是IOCP(个人认为)。用Reflector看System.Threading.IOCompletionCallback的callee,可以看到这么一条调用链:

System.Threading.IOCompletionCallback
Depends On
Used By
System.Net.Sockets.OverlappedCache..ctor(Overlapped, Object, IOCompletionCallback, Boolean)
Depends On
Used By
System.Net.Sockets.BaseOverlappedAsyncResult.SetUnmanagedStructures(Object) : Void
Depends On
Used By
System.Net.Sockets.OverlappedAsyncResult.SetUnmanagedStructures(Byte[], Int32, Int32, SocketAddress, Boolean) : Void
Depends On
Used By
System.Net.Sockets.OverlappedAsyncResult.SetUnmanagedStructures(Byte[], Int32, Int32, SocketAddress, Boolean, OverlappedCache&) : Void
Depends On
Used By
System.Net.Sockets.Socket.DoBeginReceive(Byte[], Int32, Int32, SocketFlags, OverlappedAsyncResult) : SocketError
System.Net.Sockets.Socket.DoBeginReceiveFrom(Byte[], Int32, Int32, SocketFlags, EndPoint, SocketAddress, OverlappedAsyncResult) : Void
System.Net.Sockets.Socket.DoBeginSend(Byte[], Int32, Int32, SocketFlags, OverlappedAsyncResult) : SocketError
System.Net.Sockets.Socket.DoBeginSendTo(Byte[], Int32, Int32, SocketFlags, EndPoint, SocketAddress, OverlappedAsyncResult) : Void

说错了不要扔板砖啊,大过年的。 回复

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-05 17:22 ccBoy
Inside I/O completion ports
http://www.sysinternals.com/Information/IoCompletionPorts.html

其实IOCP对应到Linux平台就是Linux AIO,这个在Linux 2.5之后版本才完全支持,但好像还不完全支持socket,其实IOCP这种模式下相当于对应每一个客户请求使用一个系统的线程,并且使用异步IO的方式。这种方式下你的线程切换时间非常的少,几乎不切换。

回复

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-05 17:28 Sumtec

@ 所有人:

大家别忙活了,我的记忆还是没有什么大问题。刚才搜了一片文章,微软的够权威了,确实是用的IoCompletionPort(除非微软撒谎),前提是操作系统支持:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/progthrepool.asp

请大家注意下面这段话:

CompletionPortThreads
This kind of thread is used for I/O operations, whenever is possible. Windows NT, Windows 2000, and Windows XP offer an object specialized on asynchronous operations, called IOCompletionPort. With the API associated with this object we can launch asynchronous I/O operations managed with a thread pool by the system, in an efficient way and with few resources. However, Windows 95, Windows 98, and Windows Me have some limitations with asynchronous I/O operations. For example, IOCompletionPorts functionality is not offered and asynchronous operations on some devices, such as disks and mail slots, cannot be performed. Here you can see one of the greatest features of the .NET Framework: compile once and execute on multiple systems. Depending on the target platform, the .NET Framework will decide to use the IOCompletionPorts API or not, maximizing the performance and minimizing the resources.



If you run this program on Microsoft Windows NT, Windows 2000, or Windows XP, you will see the following output:

Connected to localhost:80
WorkerThreads: 24, CompletionPortThreads: 25
Request sent to localhost:80
WorkerThreads: 25, CompletionPortThreads: 24

As you can see, connecting with a socket uses a worker thread, while sending the data uses a CompletionPort. This following sequence is followed:

We get the local IP address and connect to it asynchronously.
Socket performs the asynchronous connection on a worker thread, since Windows IOCompletionPorts cannot be used to establish connections on sockets.
Once the connection is established, the Socket class calls the specified function ConnectCallback. This callback shows the number of available threads on the pool, this way we can see that it is being executed on a worker thread.
An asynchronous request is sent from the same function ConnectCallback. We use for this the BeginSend method, after encoding the Get / request in ASCII code.
Send/receive operations on a socket can be performed asynchronously with an IOCompletionPort, so when our request is done, the callback function SendCallback is executed on a CompletionPortthread. We can check this because the function itself shows the number of available threads and we can see that only those corresponding to CompletionPorts have been decreased.
If we run the same code on a Windows 95, Windows 98, or Windows Me platform, the result will be the same on the connection, but the request will be sent on a worker thread, instead of a CompletionPort. The important thing you should learn about this is that the Socket class always uses the best available mechanism, so you can develop your application without taking into account the target platform.


各位,可以盖棺定论了吗?

回复

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-05 18:46 ccBoy
这种方式你不能根据CPU控制线程池中线程数的多少,你也不能控制并发的隔离等级。

关键是大家知道原理和优劣了,不要拘泥于某个结论:)这么说来,天生这样上面的四种分类和大概性能需xxx的结论要调整调整或重新思考了

还要好好再学习和细细体会,真是学无止境啊,Sumtec列的文章也看过,但是对文章中提的CompletionPort完全没有印象

Ps: Sumtec , 你记忆力真好,而且有信心保持自己的观点,赞一个。

# re: .NET平台下几种SOCKET模型的简要性能供参考 2006-01-29 15:26 飞刀.Net
我这几天安心看了一下完成端口方面的资料.

发现他的模型与Java中的nio差不多,应当nio的windows实现就是使用的完成端口.

我看了资料后发现,ThreadPool,Select在完成端口的实现中一个都没有少过,好像没有太新的东东(当然他上个世纪末就出现了),所以楼主对几种的分类,好像界限不对.

唯一的区别就是把用户态的线程调入了核心态,减少了可能的线程切换,更有利操作系统本身对CPU时间片的利用.

单就模型来说,在用户态中亦可实现,就像Sonic.Net一样以及java中的concurrent中线程池的实现.

其实楼上的争论似乎没有太大的意义,.Net中的线程似乎也是个本地线程(不像Java中是JVM的自身实现的线程),其本身的ThreadPool就更不用说了,而且Windows的线程本身就是可以调入核心态的.从理论上说,用.net实现的IOCP确实可以获是和Win32一样的性能,现实估计也差不了很多(Sonic.Net作者自己说的).

现在我想的问题的就是在Linux这样的一个分时系统之下,需要如何完成类似完成端口的东东,如何减少线程(进程)的切换?
ccBoy提供了一个aio,我不太清楚,我呆会儿查查.

不知道大过年的,有没有人还有兴趣和我讨论这些? 回复

3.
From http://www.sysinternals.com/Information/IoCompletionPorts.html

Inside I/O Completion Ports

Copyright © 1998 Mark Russinovich
Last Updated: July 30, 1998
Introduction
Writing a high-performance server application requires implementing an efficient threading model. Having either too few or too many server threads to process client requests can lead to performance problems. For example, if a server creates a single thread to handle all requests clients can become starved since the server will be tied up processing one request at a time. Of course, a single thread could simultaneously process multiple requests, switching from one to another as I/O operations are started, but this architecture introduces significant complexity and cannot take advantage of multiprocessor systems. At the other extreme a server could create a big pool of threads so that virtually every client request is processed by a dedicated thread. This scenario usually leads to thread-thrashing, where lots of threads wake-up, perform some CPU processing, block waiting for I/O and then after request procesing is completed block again waiting for a new request. If nothing else, context-switches are caused by the scheduler having to divide processor time among multiple active threads.

The goal of a server is to incur as few context switches as possible by having its threads avoid unnecessary blocking, while at the same time maximizing parallelism by using multiple threads. The ideal is for there to be a thread actively servicing a client request on every processor and for those threads not to block if there are additional requests waiting when they complete a request. For this to work correctly however, there must be a way for the application to activate another thread when one processing a client request blocks on I/O (like when it reads from a file as part of the processing).

Windows NT 3.5 introduced a set of APIs that make this goal relatively easy to achieve. The APIs are centered on an object called a completion port. In this article I'm going to provide an overview of how completion ports are used and then go inside them to show you how Windows NT implements them.

Using I/O Completion Ports
Applications use completion ports as the the focal point for the completion of I/O associated with multiple file handles. Once a file is associated with a completion port any asynchronous I/O operations that complete on the file result in a completion packet being queued to the port. A thread can wait for any outstanding I/Os to complete on multiple files simply by waiting for a completion packet to be queued on the completion port. The Win32 API provides similar functionality with the WaitForMultipleObjects API, but the advantage that completion ports have is that concurrency, or the number of threads that an application has actively servicing client requests, is controlled with the aid of the system.

When an application creates a completion port it specifies a concurrency value. This value indicates the maximum number of threads associated with the port that should be running at any given point in time. As I stated earlier, the ideal is to have one thread active at any given point in time for every processor in the system. The concurrency value associated with a port is used by NT to control how many threads an application has active - if the number of active threads associated with a port equals the concurrency value then a thread that is waiting on the completion port will not be allowed to run. Instead, it is expected that one of the active threads will finish processing its current request and check to see if there's another packet waiting at the port - if there is then it simply grabs it and goes off to process it. When this happens there is no context switch, and the CPUs are utilized to near their full capacity.

Figure 1 below shows a high-level picture of completion port operation. Incoming client requests cause completion packets to be queued at the port. A number of threads, up to the concurrency limit for the port, are allowed by NT to process client requests. Any additional threads associated with the port are blocked until the number of active threads drops, as can happen when an active thread blocks on file I/O. I'll discuss this further a little later.


A completion port is created with a call to the Win32 API CreateIoCompletionPort:

HANDLE CreateIoCompletionPort(
HANDLE FileHandle,
HANDLE ExistingCompletionPort,
DWORD CompletionKey,
DWORD NumberOfConcurrentThreads
);To create the port an application passes in a NULL for the ExistingCompletionPort parameter and indicates the concurreny value with the NumberOfConcurrentThreads parameter. If a FileHandle parameter is specified then the file handle becomes associated with the port. When an I/O request that has been issued on the file handle completes a completion packet is queued to the completion port. To retrieve a completion packet and possibly block waiting for one to arrive a thread calls the GetQueuedCompletionStatus API:

BOOL GetQueuedCompletionStatus(
HANDLE CompletionPort,
LPDWORD lpNumberOfBytesTransferred,
LPDWORD CompletionKey,
LPOVERLAPPED *lpOverlapped,
DWORD dwMiillisecondTimeout
); Threads that block on a completion port become associated with the port and are woken in LIFO order so that the thread that blocked most recently is the one that is given the next packet. Threads that block for long periods of time can have their stacks swapped out to disk, so if there are more threads associated with a port then there is work to process the in-memory footprints of threads blocked the longest are minimized.

A server application will usually receive client requests via network endpoints that are represented as file handles. Examples include Winsock2 sockets or named pipes. As the server creates its communications endpoints it associates them with a completion port and its threads wait for incoming requests by calling GetQueuedCompletionStatus on the port. When a thread is given a packet from the completion port it will go off and start processing the request, becoming an active thread. Many times a thread will block during its processing, like when it needs to read or write data to a file on disk, or when it synchronizes with other threads. Windows NT is clever enough to detect this and recognize that the completion port has one less active thread. Therefore, when a thread becomes inactive because it blocks, a thread waiting on the completion port will be woken if there is packet in the queue.

Microsoft's guidelines are to set the concurrency value roughly equal to the number of processors in a system. Note that it is possible for the number of active threads for a completion port to exceed the concurrency limit. Consider a case where the limit is specified as 1. A client request comes in and a thread is dispatched to process the request, becoming active. A second requests comes in but a second thread waiting on the port is not allowed to proceed because the concurrency limit has been reached. Then the first thread blocks waiting for a file I/O so it becomes inactive. The second thread is then released and while it is still active the first thread's file I/O is completes, making it active again. At that point in time, and until one of the threads blocks, the concurrency value is 2, which is higher than the limit of 1. Most of the time the active count will remain at or just above the concurrency limit.

The completion port API also makes it possible for a server application to queue privately defined completion packets to a completion port using PostQueuedCompletionStatus. Servers typically use this function to inform its threads of external events such as the need to shut down gracefully.

Completion Port Internals
A call to the Win32 API CreateIoCompletionPort with a NULL completion port handle results in the execution of the native API function NtCreateIoCompletion, which invokes the corresponding kernel-mode system service of the same name. Internally, completion ports are based on an undocumented executive synchronization object called a Queue. Thus, the system service creates a completion port object and initializes a queue object in the port's allocated memory (a pointer to the port also points to the queue object since the queue is at the start of the port memory). A queue object has (coincidentally) a concurrency value that is specified when a thread initializes one, and in this case the value that is used is the one that was passed to CreateIoCompletionPort. KeInitializeQueue is the function that NtCreateIoCompletion calls to initialize a port's queue object.

When an application calls CreateIoCompletionPort to associate a file handle with a port the Win32 API invokes the native function NtSetInformationFile with the file handle as the primary parameter. The information class that is set is FileCompletionInformation and the completion port's handle and the CompletionKey parameter from CreateIoCompletionPort are the data values. NtSetInformationFile dereferences the file handle to obtain the file object and allocates a completion context data structure, which is defined in NTDDK.H as:

typedef struct _IO_COMPLETION_CONTEXT {
PVOID Port;
ULONG Key;
} IO_COMPLETION_CONTEXT, *PIO_COMPLETION_CONTEXT;Finally, NtSetInformationFile sets the CompletionContext field in the file object to point at the context structure. When an I/O operation completes on a file object the internal I/O manager function IopCompleteRequest executes and, if the I/O was asynchronous, checks to see if the CompletionContext field in the file object is non-NULL. If its non-NULL the I/O Manager allocates a completion packet and queues it to the completion port by calling KeInsertQueue with the port as the queue on which to insert the packet (remember that the completion port object and queue object are synonymous).

When GetQueuedCompletionStatus is invoked by a server thread, it calls the native API function NtRemoveIoCompletion, which transfers control to the NtRemoveIoCompletion system service. After validating parameters and translating the completion port handle to a pointer to the port, NtRemoveIoCompletion calls KeRemoveQueue.

As you can see, KeRemoveQueue and KeInsertQueue are the engine behind completion ports and are the functions that determine whether a thread waiting for an I/O completion packet should be activated or not. Internally, a queue object maintains a count of the current number of active threads and the maximum active threads. If the current number equals or exceeds the maximum when a thread calls KeRemoveQueue, the thread will be put (in LIFO order) onto a list of threads waiting for a turn to process a completion packet. The list of threads hangs off the queue object. A thread's control block data structure has a pointer in it that references the queue object of a queue that it is associated with; if the pointer is NULL then the thread is not associated with a queue.

So how does NT keep track of threads that become inactive because they block on something other than the completion port? The answer lies in the queue pointer in a thread's control block. The scheduler routines that are executed in response to a thread blocking (KeWaitForSingleObject, KeDelayExecutionThread, etc.) check the thread's queue pointer and if its not NULL they will call KiActivateWaiterQueue, a queue-related function. KiActivateWaiterQueue decrements the count of active threads associated with the queue, and if the result is less than the maximum and there is at least one completion packet in the queue then the thread at the front of the queue's thread list is woken and given the oldest packet. Conversely, whenever a thread that is associated with a queue wakes up after blocking the scheduler executes the function KiUnwaitThread, which increments the queue's active count.

Finally, the PostQueuedCompletionStatus Win32 API calls upon the native function NtSetIoCompletion. As with the other native APIs in the completion port group, this one invokes a system service bearing the same name, which simply inserts that packet onto the completion port's queue using KeInsertQueue.

Not Exported
Windows NT's completion port API provides an easy-to-use and efficient way to maximize a server's performance by minimizing context switches while obtaining high-degrees of parallelism. The API is made possible with support in the I/O Manager, Kernel, and system services. While the Queue object is exported for use by device drivers (it is undocumented but its interfaces are relatively easy to figure out), the completion port APIs are not. However, if the queue interfaces are derived it is possible to mimick the completion port interfaces by simply using the queue routines and manually associating file objects with queues by setting the CompletionContext entry.

4.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/progthrepool.asp

Programming the Thread Pool in the .NET Framework

David Carmona
Premier Support for Developers
Microsoft Spain

June 2002

Summary: Provides an in-depth look at the thread pool support in the Microsoft .NET Framework, shows why you need a pool and the implementation provided in .NET, and includes a complete reference for its use in your applications. (25 printed pages)

Contents
Introduction
Thread Pool in .NET
Executing Functions on the Pool
Using Timers
Execution Based on Synchronization Objects
Asynchronous I/O Operations
Monitoring the Pool
Deadlocks
About Security
Conclusion
For More Information

Introduction
If you have experience with multithreaded programming in any programming language, you are already familiar with the typical examples of it. Usually, multithreaded programming is associated with user interface-based applications that need to perform a time-consuming operation without affecting the end user. Take any reference book and open it to the chapter dedicated to threads: can you find a multithreaded example that can perform a mathematical calculation running in parallel with your user interface?

It is not my intention that you throw away your books—don't do that! Multithreaded programming is just perfect for user interface-based applications. In fact, the Microsoft® .NET Framework makes multithreaded programming available to any language using Windows Forms, allowing the developer to design very rich interfaces with a better experience for the end user. However, multithreaded programming is not only for user interfaces; there are times that we need more than one execution stream without having any user interface in our application.

Let's use a hardware store client/server application as an example. The clients are the cash registers and the server is an application running on a separate machine in the warehouse. If you think about it, the server application can have no user interface at all. However, what would the implementation be without multithreading?

The server would receive requests from the clients via a channel (HTTP, sockets, files, etc.); it would process them and send a response back to the clients. Figure 1 shows how it would work.



Figure 1. Server application with one thread

The server application has to implement some kind of queue so that no requests are omitted. Figure 1 shows three requests arriving at the same time, but only one can be processed. While the server executes the request, "Decrease stock of monkey wrench," the other two must wait for their turns to be processed in the queue. When the execution of the first request is finished, the second one is next, and so on. This method is commonly used in many existing applications, but it poorly utilizes system resources. Imagine that decreasing the stock requires a modification of a file on disk. While this file is being written, the CPU will not be used even if in the meantime there are requests waiting to be processed. A general indication of these systems is a long response time with low CPU usage, even in stress conditions.

Another strategy used in current systems is to create different threads for each request. When a new request arrives, the server creates a new thread dedicated to the incoming request and it is destroyed when the execution finishes. The following diagram shows this case:



Figure 2. Multithreaded server application

As Figure 2 illustrates, we won't have a low CPU usage now—just the opposite. Even if it is not slow, creating and destroying threads is not optimum. If the operations performed by the thread are not complex, the extra time it takes to create and destroy threads can severely affect the final response time. Another point is the huge impact of these threads in stress conditions. Having all the requests executing at once on different threads would cause the CPU to reach 100% and most of the time would be wasted in context switching, even more than processing the request itself. Typical behaviors in this kind of system are an exponential increment of response time with the number of requests, and a high usage of CPU privileged time (this time can be viewed with the Task Manager and is affected by context switches between threads).

An optimum implementation is based on a hybrid of the previous two illustrations and introduces the concept of thread pool. When a request arrives, the application adds it to an incoming queue. A group of threads retrieves requests from this queue and processes them; as each thread is freed up, another request is executed from the queue. This schema is shown in the following figure:



Figure 3. Server application using a thread pool

In this example, we use a thread pool of two threads. When three requests arrive, they are immediately placed in the queue waiting to be processed; because both threads are free, the first two requests begin to execute. Once the process of any of these requests is finished, the free thread takes the third request and executes it. In this scenario, there is no need to create or destroy threads for each request; the threads are recycled between them. If the implementation of this thread pool is efficient, it will be able to add or remove threads from the pool for best performance. For example, when the pool is executing two requests and the CPU doesn't reach 50% utilization, it means that the executed requests are waiting for events or doing some kind of I/O operation. The pool can detect this situation and increase the number of threads so that more requests can be processed at the same time. In the opposite case when the CPU reaches 100% utilization, the pool decreases the number of threads to get more real CPU time without wasting it in context switches.

Thread Pool in .NET
Based on the above example, it is crucial to have an efficient implementation of the thread pool in enterprise applications. Microsoft realized this in the development of the .NET Framework; included in the heart of the system is an optimum thread pool ready for use.

This pool is not only available to applications that want to use it, but it is also integrated with most of the classes included in the Framework. In addition, there is important functionality of .NET built on the same pool. For example, .NET Remoting uses it to process requests on remote objects.

When a managed application is executed, the runtime offers a pool of threads that will be created the first time the code accesses it. This pool is associated with the physical process where the application is running, an important detail when you are using the functionality available in the .NET infrastructure to run several applications (called application domains) within the same process. If this is the case, one bad application can affect the rest within the same process because they all use the same pool.

You can use the thread pool or retrieve information about it through the class ThreadPool, in the System.Threading namespace. If you take a look at this class, you will see that all the members are static and there is no public constructor. This makes sense, because there's only one pool per process and we cannot create a new one. The purpose of this limitation is to centralize all the asynchronous programming in the same pool, so that we do not have a third-party component that creates a parallel pool that we cannot manage and whose threads are degrading our performance.

Executing Functions on the Pool
The ThreadPool.QueueUserWorkItem method allows us to launch the execution of a function on the system thread pool. Its declaration is as follows:

public static bool QueueUserWorkItem (WaitCallback callBack, object state)

The first parameter specifies the function that we want to execute on the pool. Its signature must match the delegate WaitCallback:

public delegate void WaitCallback (object state);

The state parameter allows any information to be passed to the method and it is specified in the call to QueueUserWorkItem. Let's see the implementation of our application for the hardware store with the new concepts:

using System;
using System.Threading;

namespace ThreadPoolTest
{
class MainApp
{
static void Main()
{
WaitCallback callBack;

callBack = new WaitCallback(PooledFunc);
ThreadPool.QueueUserWorkItem(callBack,
"Is there any screw left?");
ThreadPool.QueueUserWorkItem(callBack,
"How much is a 40W bulb?");
ThreadPool.QueueUserWorkItem(callBack,
"Decrease stock of monkey wrench");
Console.ReadLine();
}

static void PooledFunc(object state)
{
Console.WriteLine("Processing request '{0}'", (string)state);
// Simulation of processing time
Thread.Sleep(2000);
Console.WriteLine("Request processed");
}
}
}

In this case, just to simplify the example, we have created a static method inside the main class that processes the requests. Because of the flexibility of delegates, we can specify any instance method to process requests, provided that it has the same signature as the delegate. In the example, the method implements a delay of two seconds with a call to Thread.Sleep, simulating the processing time.

If you compile and execute the last application you will see the following output:

Processing request 'Is there any screw left?'
Processing request 'How much is a 40W bulb?'
Processing request 'Decrease stock of monkey wrench'
Request processed
Request processed
Request processed

Notice that all the requests have been processed on different threads in parallel. We can see it in more detail by adding the following code to both functions:

// Main method
Console.WriteLine("Main thread. Is pool thread: {0}, Hash: {1}",
Thread.CurrentThread.IsThreadPoolThread,
Thread.CurrentThread.GetHashCode());

// Pool method
Console.WriteLine("Processing request '{0}'." +
" Is pool thread: {1}, Hash: {2}",
(string)state, Thread.CurrentThread.IsThreadPoolThread,
Thread.CurrentThread.GetHashCode());

We have added a call to Thread.IsThreadPoolThread. This property returns True if the target thread belongs to the thread pool. In addition, we show the result of GetHashCode method from the current thread; this gives us a unique value with which to identify the executing thread. Take a look to the output now:

Main thread. Is pool thread: False, Hash: 2
Processing request 'Is there any screw left?'. Is pool thread: True, Hash: 4
Processing request 'How much is a 40W bulb?'. Is pool thread: True, Hash: 8
Processing request 'Decrease stock of monkey wrench '. Is pool thread: True, Hash: 9
Request processed
Request processed
Request processed

You can see how all of our requests execute on different threads belonging to the system pool thread. Launch the example again and notice the CPU utilization of your system. If you don't have any other application running in the background, it should be at almost 0%; because the only processing that is being done is suspending the execution for two seconds.

Let's modify the application. This time we don't suspend the thread that is processing the request; instead we will keep our system very busy. To do this, we can build a loop executing two seconds into each request using Environment.TickCount. This property returns the elapsed time in milliseconds since the last reboot of the system. Replace the "Thread.Sleep(2000)" line with the following code:

int ticks = Environment.TickCount;
while(Environment.TickCount - ticks < 2000);

Now examine the CPU utilization from the Task Manager, and you will see that the application utilizes 100% of CPU time. Take a look at the output of our application:

Processing request 'Is there any screw left?'. Is pool thread: True, Hash: 7
Processing request 'How much is a 40W bulb?'. Is pool thread: True, Hash: 8
Request processed
Processing request 'Decrease stock of monkey wrench '. Is pool thread: True, Hash: 7
Request processed
Request processed

Notice that the third request is not processed until the first one finishes, and it reuses thread number 7 for its execution. The reason is that the thread pool detects that the CPU is at 100% and it decides to wait until a thread is free, without creating a new one. This way there are less context switches and overall performance is better.

Using Timers
If you have developed Microsoft Win32® applications, you know the function SetTimer is part of its API. With this function you can specify a window that receives WM_TIMER messages sent by the system in a given period. The first problem encountered with this implementation is that you need a window to receive the notifications, so you cannot use it in console applications. In addition, messaging-based implementations are not accurate, and the situation can be even worse if your application is busy processing other messages.

An important improvement over Win32-based timers is the creation of a different thread that sleeps a specified time and notifies a callback function in .NET. With this, our timer does not need the Microsoft Windows® messaging system, so it's more accurate and can be used in console-based applications. The following code shows a possible implementation of this technique:

class MainApp
{
static void Main()
{
MyTimer myTimer = new MyTimer(2000);
Console.ReadLine();
}
}
class MyTimer
{
int m_period;

public MyTimer(int period)
{
Thread thread;

m_period = period;
thread = new Thread(new ThreadStart(TimerThread));
thread.Start();
}
void TimerThread()
{
Thread.Sleep(m_period);
OnTimer();
}
void OnTimer()
{
Console.WriteLine("OnTimer");
}
}

This code is commonly used in Win32 applications. Each timer creates a separate thread that waits the specified time, calling after that a callback function. As you can see, the cost of this implementation is very high; if our application uses several timers, the number of threads increases with them.

Now that we have .NET providing a thread pool, we can change the waiting function to a request to the pool. Even if this is perfectly valid and it would make the performance better, we will encounter two problems:

If the pool is full (all its threads are being used), the request waits in the queue and the timer is no longer accurate.
If several timers are created, the thread pool is busy waiting for them to expire.
To avoid these problems, the .NET Framework thread pool offers the possibility of time-dependent requests. With this functionality, we can have hundreds of timers without using any thread—it's the pool itself that will process the request once the timer expires.

This feature is available in two different classes:

System.Threading.Timer
A simple version of a timer; it allows the developer to specify a delegate for its periodic execution on the pool.

System.Timers.Timer
A component version of System.Threading.Timer; it can be inserted in a form and allows the developer to specify the executed function in terms of events.

It's important to understand the differences between the aforementioned two classes and another one named System.Windows.Forms.Timer. This class wraps the counters we had in Win32 based on Windows messages. Use this class only if you do not plan to develop a multithreaded application.

For the next example, we will use the System.Threading.Timer class, the simplest implementation of a timer. We only need the constructor, defined as follows:

public Timer(TimerCallback callback,
object state,
int dueTime,
int period);

With the first parameter (callback), we can specify the function that we want to execute periodically; the second parameter, state, is a generic object passed to the function; the third parameter, dueTime, is the delay until the counter is started; and last parameter, period, is the number of milliseconds between executions.

The below example creates two timers, timer1 and timer2:

class MainApp
{
static void Main()
{
Timer timer1 = new Timer(new TimerCallback(OnTimer), 1, 0, 2000);
Timer timer2 = new Timer(new TimerCallback(OnTimer), 2, 0, 3000);

Console.ReadLine();
}
static void OnTimer(object obj)
{
Console.WriteLine("Timer: {0} Thread: {1} Is pool thread: {2}",
(int)obj,
Thread.CurrentThread.GetHashCode(),
Thread.CurrentThread.IsThreadPoolThread);
}
}

The output will be the following:

Timer: 1 Thread: 2 Is pool thread: True
Timer: 2 Thread: 2 Is pool thread: True
Timer: 1 Thread: 2 Is pool thread: True
Timer: 2 Thread: 2 Is pool thread: True
Timer: 1 Thread: 2 Is pool thread: True
Timer: 1 Thread: 2 Is pool thread: True
Timer: 2 Thread: 2 Is pool thread: True

As you can see, all the functions associated with both timers are executed on the same thread (ID = 2), minimizing the resources used by the application.

Execution Based on Synchronization Objects
In addition to timers, the .NET thread pool allows the execution of functions based upon synchronization objects. To share resources among threads within the multithreaded environment, we need to use .NET synchronization objects.

If we did not have a pool, threads would block waiting for the event to be signaled. As I mentioned before, this increases the total number of threads in the application, as the result it requires additional system resources and CPU.

The thread pool allows us to queue requests that will only be executed when a specified synchronization object is signaled. While this does not happen, the function does not need any thread, so optimization is assured. The ThreadPool class offers the following method:

public static RegisteredWaitHandle RegisterWaitForSingleObject(
WaitHandle waitObject,
WaitOrTimerCallback callBack,
object state,
int millisecondsTimeOutInterval,
bool executeOnlyOnce);

The first parameter, waitObject, can be any object derived from WaitHandle:

Mutex
ManualResetEvent
AutoResetEvent
As you can see, only system synchronization objects can be used—that is, objects derived from WaitHandle—and you cannot use any other synchronization mechanism, like monitors or reader-writer locks.

The rest of the parameters allow us to specify a function that will be executed once the object is signaled (callBack); a state that will be passed to this function (state); the maximum time that the pool waits for the object (millisecondsTimeOutInterval) and a flag indicating if the function has to be executed once or whenever the object is signaled (executeOnlyOnce). The delegate declaration used for the callback function is the following:

delegate void WaitOrTimerCallback(
object state,
bool timedOut);

This function is called if the timedOut parameter set with the maximum time expires without the synchronization object being signaled.

The following example uses a manual event and a Mutex to signal the execution of a function on the thread pool:

class MainApp
{
static void Main(string[] args)
{
ManualResetEvent evt = new ManualResetEvent(false);
Mutex mtx = new Mutex(true);

ThreadPool.RegisterWaitForSingleObject(evt,
new WaitOrTimerCallback(PoolFunc),
null, Timeout.Infinite, true);
ThreadPool.RegisterWaitForSingleObject(mtx,
new WaitOrTimerCallback(PoolFunc),
null, Timeout.Infinite, true);

for(int i=1;i<=5;i++)
{
Console.Write("{0}...", i);
Thread.Sleep(1000);
}
Console.WriteLine();
evt.Set();
mtx.ReleaseMutex();

Console.ReadLine();
}
static void PoolFunc(object obj, bool TimedOut)
{
Console.WriteLine("Synchronization object signaled, Thread: {0} Is pool: {1}",
Thread.CurrentThread.GetHashCode(),
Thread.CurrentThread.IsThreadPoolThread);
}
}

The output will show again that both functions will be executed on the pool and on the same thread:

1...2...3...4...5...
Synchronization object signaled, Thread: 6 Is pool: True
Synchronization object signaled, Thread: 6 Is pool: True

Asynchronous I/O Operations
The most common scenario for a thread pool is I/O (Input/Output) operations. Most applications need to wait for disk reads, data sent to sockets, Internet connections and so on. All of these operations have something in common, that is they do not require CPU time while they are performed.

.NET Framework offers in all of its I/O classes the possibility of performing operations asynchronously. When the operation is finished, the specified function is executed on the thread pool. The performance with this feature can be much better, especially if we have several threads performing I/O operations, as in most of the server-based applications.

In this first example, we will write a file asynchronously to the hard drive. Take a look to the FileStream constructor used:

public FileStream(
string path,
FileMode mode,
FleAccess access,
FleShare share,
int bufferSize,
bool useAsync);

The last parameter is the interesting one. We should set useAsync for our file to perform asynchronous operations. If we do not do this, even if we can use asynchronous functions, they are executed on the calling thread blocking its execution.

The following example illustrates a file write with the FileStream BeginWrite method, specifying a callback function that will be executed on the thread pool once the operation finishes. Notice that we can access the IAsyncResult interface any time, which can be used to know the current status of the operation. We use its CompletedSynchronously property to indicate whether the operation is performed asynchronously, and the IsCompleted property to set a flag when the operation finishes. IAsyncResult offers many others interesting properties, such as AsyncWaitHandle, a synchronization object that will be signaled once the operation is performed.

class MainApp
{
static void Main()
{
const string fileName = "temp.dat";
FileStream fs;
byte[] data = new Byte[10000];
IAsyncResult ar;

fs = new FileStream(fileName,
FileMode.Create,
FileAccess.Write,
FileShare.None,
1,
true);
ar = fs.BeginWrite(data, 0, 10000,
new AsyncCallback(UserCallback), null);
Console.WriteLine("Main thread:{0}",
Thread.CurrentThread.GetHashCode());
Console.WriteLine("Synchronous operation: {0}",
ar.CompletedSynchronously);
Console.ReadLine();
}
static void UserCallback(IAsyncResult ar)
{
Console.Write("Operation finished: {0} on thread ID:{1}, is pool: {2}",
ar.IsCompleted,
Thread.CurrentThread.GetHashCode(),
Thread.CurrentThread.IsThreadPoolThread);
}
}

The output will show us that the operation is performed asynchronously; once the operation is finished, the user function is executed on the thread pool.

Main thread:9
Synchronous operation: False
Operation finished: True on thread ID:10, is pool: True

In the case of sockets, using the thread pool is even more important because its I/O operations are usually slower than disks. The procedure is the same as before, the Socket class offers methods to perform any operation asynchronously:

BeginReceive
BeginSend
BeginConnect
BeginAccept
If your server application uses sockets to communicate with the other clients, always use these methods. This way, instead of needing a thread for each connected client, all the operations are performed asynchronously on the thread pool.

The following example uses another class that supports asynchronous operations, HttpWebRequest. With this class, we can establish a connection with a Web server on the Internet. The method used now is called BeginGetResponse, but in this case there's an important difference. In the last example, we wrote a file to disk and we didn't need any results from the operation. However, we now need the response from the Web server when the operation is finished. In order to retrieve this information, all the .NET Framework classes that offer I/O operations have a pair of methods for each one. In the case of HttpWebRequest, the pair is BeginGetResponse and EndGetResponse.

With the End version, we can retrieve the result of the operation. In our case, EndGetResponse will return the response from the Web server. Even if EndGetResponse can be called any time, in our example we do it using the callback function, just when we know that the asynchronous request is done. If we call EndGetResponse before that, the call will block until the operation is finished.

In the following example, we send a request to Microsoft Web and show the size of the received response:

class MainApp
{
static void Main()
{
HttpWebRequest request;
IAsyncResult ar;

request = (HttpWebRequest)WebRequest.CreateDefault(
new Uri("http://www.microsoft.com"));
ar = request.BeginGetResponse(new AsyncCallback(PoolFunc), request);
Console.WriteLine("Synchronous: {0}", ar.CompletedSynchronously);
Console.ReadLine();
}
static void PoolFunc(IAsyncResult ar)
{
HttpWebRequest request;
HttpWebResponse response;

Console.WriteLine("Response received on pool: {0}",
Thread.CurrentThread.IsThreadPoolThread);
request = (HttpWebRequest)ar.AsyncState;
response = (HttpWebResponse)request.EndGetResponse(ar);
Console.WriteLine(" Response size: {0}",
response.ContentLength);
}
}

The output will show at the beginning the following message, indicating that the operation is being performed asynchronously:

Synchronous: False

After a while, when the response is received, the following output will be shown:

Response received on pool: True
Response size: 27331

As you can see, once the response is received, the callback function is executed on the thread pool.

Monitoring the Pool
The ThreadPool class offers two methods used to query the status of the pool. With the first one we can retrieve the number of free threads:

public static void GetAvailableThreads(
out int workerThreads,
out int completionPortThreads);

As you can see there are two different kinds of threads:

WorkerThreads
The worker threads are part of the standard system pool. They are standard threads managed by the .NET Framework and most of the functions are executed on them, specifically user requests (QueueUserWorkItem method), functions based on synchronization objects (RegisterWaitForSingleObject method), and timers (Timer classes).

CompletionPortThreads
This kind of thread is used for I/O operations, whenever is possible. Windows NT, Windows 2000, and Windows XP offer an object specialized on asynchronous operations, called IOCompletionPort. With the API associated with this object we can launch asynchronous I/O operations managed with a thread pool by the system, in an efficient way and with few resources. However, Windows 95, Windows 98, and Windows Me have some limitations with asynchronous I/O operations. For example, IOCompletionPorts functionality is not offered and asynchronous operations on some devices, such as disks and mail slots, cannot be performed. Here you can see one of the greatest features of the .NET Framework: compile once and execute on multiple systems. Depending on the target platform, the .NET Framework will decide to use the IOCompletionPorts API or not, maximizing the performance and minimizing the resources.

This section includes an example using the socket classes. In this case we are going to establish a connection asynchronously with the local Web server and send a Get request. With this, we can easily identify both kinds of threads.

using System;
using System.Threading;
using System.Net;
using System.Net.Sockets;
using System.Text;

namespace ThreadPoolTest
{
class MainApp
{
static void Main()
{
Socket s;
IPHostEntry hostEntry;
IPAddress ipAddress;
IPEndPoint ipEndPoint;

hostEntry = Dns.Resolve(Dns.GetHostName());
ipAddress = hostEntry.AddressList[0];
ipEndPoint = new IPEndPoint(ipAddress, 80);
s = new Socket(ipAddress.AddressFamily,
SocketType.Stream, ProtocolType.Tcp);
s.BeginConnect(ipEndPoint, new AsyncCallback(ConnectCallback),s);

Console.ReadLine();
}
static void ConnectCallback(IAsyncResult ar)
{
byte[] data;
Socket s = (Socket)ar.AsyncState;
data = Encoding.ASCII.GetBytes("GET /\n");

Console.WriteLine("Connected to localhost:80");
ShowAvailableThreads();
s.BeginSend(data, 0,data.Length,SocketFlags.None,
new AsyncCallback(SendCallback), null);
}
static void SendCallback(IAsyncResult ar)
{
Console.WriteLine("Request sent to localhost:80");
ShowAvailableThreads();
}
static void ShowAvailableThreads()
{
int workerThreads, completionPortThreads;

ThreadPool.GetAvailableThreads(out workerThreads,
out completionPortThreads);
Console.WriteLine("WorkerThreads: {0}," +
" CompletionPortThreads: {1}",
workerThreads, completionPortThreads);
}
}
}

If you run this program on Microsoft Windows NT, Windows 2000, or Windows XP, you will see the following output:

Connected to localhost:80
WorkerThreads: 24, CompletionPortThreads: 25
Request sent to localhost:80
WorkerThreads: 25, CompletionPortThreads: 24

As you can see, connecting with a socket uses a worker thread, while sending the data uses a CompletionPort. This following sequence is followed:

We get the local IP address and connect to it asynchronously.
Socket performs the asynchronous connection on a worker thread, since Windows IOCompletionPorts cannot be used to establish connections on sockets.
Once the connection is established, the Socket class calls the specified function ConnectCallback. This callback shows the number of available threads on the pool, this way we can see that it is being executed on a worker thread.
An asynchronous request is sent from the same function ConnectCallback. We use for this the BeginSend method, after encoding the Get / request in ASCII code.
Send/receive operations on a socket can be performed asynchronously with an IOCompletionPort, so when our request is done, the callback function SendCallback is executed on a CompletionPort thread. We can check this because the function itself shows the number of available threads and we can see that only those corresponding to CompletionPorts have been decreased.
If we run the same code on a Windows 95, Windows 98, or Windows Me platform, the result will be the same on the connection, but the request will be sent on a worker thread, instead of a CompletionPort. The important thing you should learn about this is that the Socket class always uses the best available mechanism, so you can develop your application without taking into account the target platform.

You have seen in the example that the maximum number of available threads is 25 for each type (exactly 25 times the number of processors in your machine). We can retrieve this number using the GetMaxThreads method:

public static void GetMaxThreads(
out int workerThreads,
out int completionPortThreads);

Once this maximum value is reached, no new threads are created and requests are queued. If you see all the methods declared in the ThreadPool class, you will notice that none of them allows us to change this maximum value. As we mentioned before, the thread pool is a unique shared resource per process; that is why it is impossible for an application domain to change its configuration. Imagine the consequences if a third-party component changes the maximum number of threads on the pool to 1, the entire application can stop working and even other application domains in the same process will be affected. For the same reason, systems that host the common language runtime do have the possibility to change the configuration. For example, Microsoft ASP.NET allows the Administrator to change the maximum number of threads available on the pool.

Deadlocks
Before starting to use the thread pool in your applications you should know one additional concept: deadlocks. A bad implementation of asynchronous functions executed on the pool can make your entire application hang.

Imagine a method in your code that needs to connect via socket with a Web server. A possible implementation is opening the connection asynchronously with the Socket class' BeginConnect method and wait for the connection to be established with the EndConnect method. The code will be as follows:

class ConnectionSocket
{
public void Connect()
{
IPHostEntry ipHostEntry = Dns.Resolve(Dns.GetHostName());
IPEndPoint ipEndPoint = new IPEndPoint(ipHostEntry.AddressList[0],
80);
Socket s = new Socket(ipEndPoint.AddressFamily, SocketType.Stream,
ProtocolType.Tcp);
IAsyncResult ar = s.BeginConnect(ipEndPoint, null, null);
s.EndConnect(ar);
}
}

So far, so good—calling BeginConnect makes the asynchronous operation execute on the thread pool and EndConnect blocks waiting for the connection to be established.

What happens if we use this class from a function executed on the thread pool? Imagine that the size of the pool is just two threads and we launch two asynchronous functions that use our connection class. With both functions executing on the pool, there is no room for additional requests until the functions are finished. The problem is that these functions call our class' Connect method. This method launches again an asynchronous operation on the thread pool, but since the pool is full, the request is queued waiting any thread to be free. Unfortunately, this will never happen because the functions that are using the pool are waiting for the queued functions to finish. The conclusion: our application is blocked.

We can extrapolate this behavior for a pool of 25 threads. If 25 functions are waiting for an asynchronous operation to be finished, the situation becomes the same and the deadlock occurs again.

In the following fragment of code we have included a call to the last class to reproduce the problem:

class MainApp
{
static void Main()
{
for(int i=0;i<30;i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(PoolFunc));
}
Console.ReadLine();
}

static void PoolFunc(object state)
{
int workerThreads,completionPortThreads;
ThreadPool.GetAvailableThreads(out workerThreads,
out completionPortThreads);
Console.WriteLine("WorkerThreads: {0}, CompletionPortThreads: {1}",
workerThreads, completionPortThreads);

Thread.Sleep(15000);
ConnectionSocket connection = new ConnectionSocket();
connection.Connect();
}
}

If you run the example, you see how the threads on the pool are decreasing until the available threads reach 0 and the application stops working. We have a deadlock.

In general, a deadlock can appear whenever a pool thread waits for an asynchronous function to finish. If we change the code so that we use the synchronous version of Connect, the problem will disappear:

class ConnectionSocket
{
public void Connect()
{
IPHostEntry ipHostEntry = Dns.Resolve(Dns.GetHostName());
IPEndPoint ipEndPoint = new IPEndPoint(ipHostEntry.AddressList[0], 80);
Socket s = new Socket(ipEndPoint.AddressFamily, SocketType.Stream,
ProtocolType.Tcp);
s.Connect(ipEndPoint);
}
}

If you want to avoid deadlocks in your applications, do not ever block a thread executed on the pool that is waiting for another function on the pool. This seems to be easy, but keep in mind that this rule implies two more:

Do not create any class whose synchronous methods wait for asynchronous functions, since this class could be called from a thread on the pool.
Do not use any class inside an asynchronous function if the class blocks waiting for asynchronous functions.
If you want to detect a deadlock in your application, check the available number of threads on the thread pool when your system is hung. The lack of available threads and CPU utilization near 0% are clear symptoms of a deadlock. You should monitor your code to identify where a function executed on the pool is waiting for an asynchronous operation and remove it.

About Security
If you take a look to the ThreadPool class, you see that there are two methods that we haven't seen yet: UnsafeQueueUserWorkItem and UnsafeRegisterWaitForSingleObject. To fully understand the purpose of these methods we have to remember first how code security works in the .NET Framework.

Security in Windows is focused on resources. The operating system itself allows setting permissions on files, users, registry keys or any other resource of the system. This approach is perfect for applications trusted by the user, but it has limitations when the user does not trust the applications he uses, for example those downloaded from the Internet. In this case, once the user installs the application, it can perform any operation allowed by his permissions. For example, if the user is able to delete all the shared files of his company, any application downloaded from the Internet could do it, too.

.NET offers security applied to the application, not to the user. This means that, within the limit of the user's permissions, we can restrict resources to any execution unit (Assembly). With the MMC snap-in, we can define groups of assemblies by several conditions and set different security policies for each group. A typical example of this is restricting the disk access to applications downloaded from the Internet.

For this to work, the .NET Framework must maintain a calling stack between different assemblies. Imagine an application that has no permission to access to disk, but it calls a class library with full access to the system. When the second assembly performs an operation to disk, the set of permissions associated with it allows the action, but the permissions applied to the calling assembly do not. The .NET Framework must check not only the current assembly permissions, but also the permissions applied to the entire calling stack. This stack is highly optimized, but they add an overhead to calls between functions of different assemblies.

UnsafeQueueUserWorkItem and UnsafeRegisterWaitForSingleObject are equivalent functions to QueueUserWorkItem and RegisterWaitForSingleObject, but the unsafe versions do not maintain the calling stack for the asynchronous functions they perform. So, unsafe versions are faster but the callback functions will be executed only with the current assembly security policies, losing all the permissions applied to the calling stack of assemblies.

My recommendation is that you should use these unsafe functions with extreme caution and only in situations where the performance is very important and the security is controlled. For example, you can use unsafe versions if you are building an application without the possibility of being called from another assembly or where the policies applied to it allows only another well-known assembly to use it. You should never use these methods if you are developing a class library that can be used by any third-party application, because they could use your library to gain access to limited resources of the system.

In the following example, you can see the risk of using UnsafeQueueUserWorkItem. We will build two separated assemblies, in the first one we will use the thread pool to create a file and we will export a class so that this operation can be performed from another assembly:

using System;
using System.Threading;
using System.IO;

namespace ThreadSecurityTest
{
public class PoolCheck
{
public void CheckIt()
{
ThreadPool.QueueUserWorkItem(new WaitCallback(UserItem), null);
}

private void UserItem(object obj)
{
FileStream fs = new FileStream("test.dat", FileMode.Create);
fs.Close();
Console.WriteLine("File created");
}
}
}

The second assembly references the first one and it uses the CheckIt method to create the file:

using System;

namespace ThreadSecurityTest
{
class MainApp
{
static void Main()
{
PoolCheck pc = new PoolCheck();
pc.CheckIt();
Console.ReadLine();
}
}
}

Compile both assemblies and run the main application. By default, your system is configured to allow disk operations to be performed, so the application works perfectly and the file is generated:

File created

Now open the Microsoft .NET Framework configuration—in this case, to simplify the example, we create only a code group associated with the main application. To do this, expand the Runtime Security Policy/Machine/Code Groups/All_Code node and add a new group called ThreadSecurityTest. In the wizard, select the Hash condition and import the hash of our application. Now set an Internet permission level and force it with the This policy level will only have the permissions from the permission set associated with this code group option.

Run the application again and see what happens:

Unhandled Exception: System.Security.SecurityException: Request for the
permission of type System.Security.Permissions.FileIOPermission,
mscorlib, Version=1.0.3300.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089 failed.
...

Our policy has worked and the application cannot create the file. Notice that this is possible because the .NET Framework is maintaining the calling stack for us, since the library that created the file had full access to the system.

Now change the code of the library and use the UnsafeQueueUserWorkItem method instead of QueueUserWorkItem. Compile the assembly again and run the main application. The output will now be:

File created

Even if our application does not have enough permission to access the disk, we have created a library that exposes this functionality to the whole system without maintaining the calling stack. Remember the golden rule: Use unsafe functions only when your code cannot be called from other applications or when you restrict the access to well-known assemblies.

Conclusion
In this article, we have seen why we need a thread pool to optimize the resources and CPU usage in our server applications. We have studied how a thread pool has to be implemented; taking into account many factors like percentage used of CPU, queued requests, or number of processors in the system.

The .NET Framework offers a fully functional implementation of thread pool ready to be used by our application and tightly integrated with the rest of classes of the .NET Framework. This pool is highly optimized, minimizing the privileged CPU time utilization with minimal resources, and is always adapted to the target operating system.

Because of the integration with the Framework, most of the classes offered by it use the thread pool internally, offering to the developers a centralized place to manage and monitor the pool in their applications. Third-party components are encouraged to use the thread pool; this way their clients can use all the functionality provided, allowing the execution of user functions, timers, I/O operations or synchronization objects.

If you are developing server applications, whenever possible use the thread pool in your request processing system. Otherwise, if your development is a library that can be used by a server application, always offer the possibility of asynchronous processing on the system thread pool.



<< Home

This page is powered by Blogger. Isn't yours?