tsJensen

A quest for software excellence...

Making Distributed Computing Relevant and Accessible

First, let us assume that distributed computing is generally that area of developing and running software designed to process large numbers of long running tasks on servers that are optimally proximal to the data being processed.

Second, let us agree, if for this discussion only, that distributed computing is NOT your collection of services on back end servers that support your service oriented architecture (SOA) for your web and mobile apps.

Third, let us presume that you are NOT already blessed with a job where you write distributed computing software.

How then can distributed computing be relevant to you? And how can you take advantage of distributed computing without becoming an expert in one of the several well known distributed computing platforms on the market today?

Both are excellent questions. Thank you for asking. Let’s try a practical approach.

Imagine you are at your desk and your boss comes to you and ask how fast your web servers respond to the customer. Of course, your first instinct is to write this program to find out:

private static void DoTenUrlsInParallel()
{
   Console.WriteLine("Do 10 urls in parallel");
   var sw = Stopwatch.StartNew();
   ISpeedTest test = new SpeedTest();
   Parallel.ForEach(TestUrls, (url) =>
   {
      var result = test.GetSpeed(url);
      Console.WriteLine("r:{0}, s:{1}, b:{2}, u:{3}",
         result.ResponseTimeMs, result.ReadStreamTimeMs, 
         result.ResponseLength, result.Url);
   });
   sw.Stop();
   Console.WriteLine("Total elapsed time: {0}", 
      sw.ElapsedMilliseconds);
   Console.WriteLine(string.Empty);
}

You take him the results and he says, “But isn’t this from your desk? I want to know what these numbers look like from all around the world. East and west U.S. North and west Europe. And east and south east Asia. And I want a regular stream of these numbers fed into a spreadsheet for me every day.”

Do you say, “No problem.” You do if you have a Windows Azure account and you know about the distributed task parallel library from DuoVia called DuoVia.Net.Distributed. You go back to your desk and modify the code to look like this:

private static void DoTenUrlsThreeTimesEachAroundTheWorldInParallel(bool runLocal = false)
{
   var serverEndpoints = new IPEndPoint[0];
   if (runLocal)
   {
      serverEndpoints = new IPEndPoint[] { new IPEndPoint(IPAddress.Parse("127.0.0.1"), 9096) };
   }
   else
   {
      //these server names are temporary - to run this test use your own
      var servers = new string[]
      {
         "myaz-westus.cloudapp.net",
         "myaz-eastus.cloudapp.net",
         "myaz-northeu.cloudapp.net",
         "myaz-westeu.cloudapp.net",
         "myaz-soeastasia.cloudapp.net",
         "myaz-eastasia.cloudapp.net"
      };

      serverEndpoints = new IPEndPoint[servers.Length];
      for (int i = 0; i < servers.Length; i++)
      {
         var host = Dns.GetHostAddresses(servers[i]);
         var ip = (from n in host 
                   where n.AddressFamily == AddressFamily.InterNetwork 
                   select n).First();
         serverEndpoints[i] = new IPEndPoint(ip, 9096);
      }
   }

   float subscriptionRate = 2.0f; //oversubscribed 
   int logPollingIntervalSeconds = 2;
   using (DistributedClient<ISpeedTest> client = 
          Distributor.Connect<ISpeedTest>(typeof(SpeedTest),
          subscriptionRate,
          logPollingIntervalSeconds,
          LogLevel.Debug,
          serverEndpoints))
   {
      for (int i = 0; i < 3; i++)
      {
         var sw = Stopwatch.StartNew();
         Console.WriteLine(@"round:{0}", i + 1);
         var loopResult = client.ForEach(TestUrls, (url, proxy) => proxy.GetSpeed(url));
         foreach (var result in loopResult.Results)
         {
            Console.WriteLine(@"r:{0}, s:{1}, b:{2}, on: {3}, u:{4}",
               result.ResponseTimeMs, result.ReadStreamTimeMs, 
			   result.ResponseLength, result.MachineName, result.Url);
         }
         sw.Stop();
         Console.WriteLine("Total elapsed time: {0}", sw.ElapsedMilliseconds);
         Console.WriteLine(string.Empty);
      }
   }
}

And you and your boss are happy.

Sometimes distributed computing is more about location and proximity to data or infrastructure than it is to getting massive amounts of data processed in as little time as possible.

You can find the full demo source code here.

Diversions in the D Programming Language

I am not a systems programmer, meaning I do not write operating system device drivers or file systems or operating system modules, etc, all written in a language that will compile down to raw machine code. I write in C# primarily which is arguably an applications programming language, running in the much loved .NET Common Language Runtime.

The vast majority of systems programming is done in C and C++. And for some reason, C++ has always been a daunting mess of libraries, odd syntax and pointer and memory allocation madness to me. Even setting up an environment to get the right build libraries, the right compiler and linker, etc., have always led me to fits of impatience. And for that reason and many others, I have stuck to C# and applications development.

But every once in a while, I look in on systems programming to see if anyone has really solved the problems I love to hate with respect to C and C++. And for a few years I’ve read a little about the D programming language here and there. A week ago, over the weekend, I decided to give it a try and really see what I could learn.

I have to say, I have been impressed. The D programming language offers a few things that I would dearly love to see in C#.

1. Exception Safety – the scope keyword

void abc() 
{ 
  auto resource = getresource();  // acquire some resource 
  scope(exit) resource.close();   // close the resource 
  doSomeProcessing();             // do processing
}

As C# programmers, we’re used to the try..catch..finally blocks. And we clean up a resource in the finally block. The trouble with that is many lines of code can end up separating your resource acquisition code from your resource cleanup code. Yes, with vigilance and well written tests, this is okay. But wouldn’t it be cool to be able to tell the compiler, “Hey, when I’m done with this thing I just now created, clean it up for me, no matter what code comes after this in this method.” I would love to see the scope keyword added to C#.

2. Concurrency approach

int perThread;
shared int PerProcess;

In C#, when you declare a class level variable, it is automatically shared between threads. You can use the [ThreadStatic] attribute to get a per thread instance of a given value or object. But it then has to be static. With the D programming language, you get thread safety in class variables. To override that safety, you have to explicitly tell the compiler you want the value shared. While I’m not advocating a change to C# in this regard, I would love to have a way to assure that a variable cannot be modified across thread boundaries.

3. Message based threads

import std.concurrency, std.stdio;
void main() {
   auto low = 0, high = 100;
   auto tid = spawn(&writer);
   foreach (i; low .. high) {
      writeln("Main thread: ", i);
      tid.send(thisTid, i);
      enforce(receiveOnly!Tid() == tid);
   }
}

void writer() {
   for (;;) {
      auto msg = receiveOnly!(Tid, int)();
      writeln("Secondary thread: ", msg[1]);
      msg[0].send(thisTid);
   }
}

For me, this is perhaps the coolest part of the D programming language’s base class library which they call Phobos. Note that main spawns a thread calling writer. The loop in main then sends a message to writer and the loop in writer receives the messages and operates on them and then sends a message back to the original thread.

You can learn a lot about D on www.dlang.org and read more about D concurrency on Informit. And if you want to play with D in Visual Studio, hop on over to see VisualD on dsource.org.