23 Aug 2010

onebit_26

Note: This is part of a series, you can find the rest of the parts in the series index.

PLINQ, which is Parallel LINQ or the ability to run LINQ queries with parallel extensions in .NET 4. The idea is that you take a simple LINQ query and append .AsParallel on the end and it is magically parallel – as in my insane solution to Fizz Buzz below:

var result = from i in Enumerable.Range(0, 1000).AsParallel()
             where (i % 3 == 0 || i % 5 == 0)
             select new { value = i, answer = i % 3 == 0 ? i % 5 == 0 ? "Fizz Buzz" : "Buzz" : "Fizz" };

foreach (var item in result)
{
    Console.WriteLine("{0} gets a {1}", item.value, item.answer);
}

If you have been to one of my what’s new in .NET 4 talks you would’ve even seen me demo it this way, and for that I am VERY VERY SORRY – because I was wrong wrong wrong. Sad smile

In Pull I used this exact mistake above to get the updating of podcasts to run in parallel and it wasn’t until I implemented some status view that I noticed it wasn’t actually in parallel (two weeks and 46 check-ins before I realised this).

The problem is that appending .AsParallel does nothing but some setup. To make use of parallel-ness you must use the .ForAll extension as in the example below (note the difference is in processing, note the query change on line 5):

var result = from i in Enumerable.Range(0, 30).AsParallel()
             where (i % 3 == 0 || i % 5 == 0)
             select new { value = i, answer = i % 3 == 0 ? i % 5 == 0 ? "Fizz Buzz" : "Buzz" : "Fizz" };

result.ForAll(item =>
{
    Console.WriteLine("{0} gets a {1}", item.value, item.answer);
});

Now Pull works all in parallel and I am happy to move on, right? WRONG again. In my research I found a white paper written by Pamela Vagata from the Parallel Computing Platform Group at Microsoft which covers when to use PLINQ and when to use Parallel.ForEach. This paper is fantastic and highlights that these are not equal and that you should use the right tool for the job. My quick reference table based on that white paper (the smile-y indicates what you should use):

Action PLINQ Parallel.ForEach
Simple Data-Parallel Operation with Independent Actions   Smile
Ordered Data-Parallel Operation Smile  
Streaming Data-Parallel Operation Smile  
Operating over Two Collections Smile  
Thread-Local State   Smile
Exiting from Operations   Smile

If you do not know what those actions mean then you must grab the white paper. As you can see for me I should never have used PLINQ because I am doing a Simple Data-Parallel Operation with Independent Actions. Why is PLINQ wrong, well the white paper explains:

While PLINQ has the ForAll operator, it may be easier to think in terms of parallel loops rather than parallel
queries for this type of scenario. Furthermore, PLINQ may be too heavyweight for a simple independent action.
With Parallel.ForEach, you can specify ParallelOptions.MaxDegreeOfParallelism, which specifies
that at most N threads are required. Thus, if the ThreadPool’s resources are scarce, even if the number of available
threads is less than N, those threads will begin assisting the execution of Parallel.ForEach. As more threads
become available, those resources will then be used for execution of the loop’s body delegate as well. However,
PLINQ requires exactly N threads, which is specified by using the WithDegreeOfParallelism() extension
method. In other words, for PLINQ N represents the number of threads which are actively involved in the PLINQ
query.

Final Thoughts

.NET 4 has made doing parallel very each, in fact it is too easy to do the wrong thing and still have it work. Spending time researching the right method is vital for software development, don’t just assume.

Comments

Visitor's picture

The link to the whitepaper by Pamela Vagata is dead. Happen to have a copy?

Add new comment