tsJensen

A quest for software excellence...

Very Large Pseudo Random Number Generator in C#

Thanks to a friend of mine, over the past few weeks the idea of zero knowledge authentication, which has been around for a long time but to which I’d never paid any attention, has been percolating in my head. Finally on Thursday (Thanksgiving holiday here in the U.S.), I dove in and added it to ServiceWire.

Now you can secure your ServiceWire RPC calls without PKI or exchanging keys or exposing a password across the wire. It validates on the server that the client knows the secret key (password) and on the client that the server knows the secret key. It creates a shared encryption key that is used for that one connection session and is never passed across the wire, allowing all subsequent communications between client and server to be encrypted with strong and fast symmetric encryption, denying any man in the middle from access to the data or even the service definition.

But this post is not about the entire zero knowledge addition to ServiceWire. Instead I want to share one part of that addition which you may use to create a very large random number. In this case, 512 bytes (that’s 4096 bits for those of you who can only count to 1). If you want a larger random number, you’ll need a larger safe prime from a Sophie Germain prime. In ServiceWire, I only include the 512 byte safe prime. There are a number of sources for larger safe primes freely available on the web.

This is not an algorithm entirely of my invention. It is based on several different algorithms I studied while investigating zero knowledge secure remote password protocols. I did take some liberties by adding a randomly selected smaller safe prime as the exponent in the algorithm.

And here’s the code:

using System.Numerics;

public class ZkProtocol
{
   private readonly SHA256 _sha;
   private readonly Random _random;
   private readonly BigInteger _n;

   public ZkProtocol()
   {
      _sha = SHA256.Create(); //used in other protocol methods
      _random = new Random(DateTime.Now.Millisecond);
      _n = new BigInteger(ZkSafePrimes.N4);
   }
   
   //...other protocol methods excluded for post

   /// <summary>
   /// Generate crypto safe, pseudo random number.
   /// </summary>
   /// <param name="bits">max value supported is 4096</param>
   /// <returns></returns>
   public byte[] CryptRand(int bits = 4096)
   {
      var rb = new byte[256];
      _random.NextBytes(rb);
      var bigrand = new BigInteger(rb);
      var crand = BigInteger.ModPow(bigrand, 
         ZkSafePrimes.GetSafePrime(_random.Next(0, 2047)), _n);
      var bytes = crand.ToByteArray();
      if (bits >= 4096) return bytes;
      var count = bits / 8;
      var skip = _random.Next(0, bytes.Length - count);
      return bytes.Skip(skip).Take(count).ToArray();
   }
}

internal static class ZkSafePrimes
{
   internal static byte[] N4
   {
      get { return Safe4096BitPrime; }
   }

   private static byte[] Safe4096BitPrime = new byte[]
   {
      //data not shown here for post (see on GitHub)
   };
   
   public static int GetSafePrime(int index)
   {
      if (index < 1) index = 1;
      else if (index > 2047) index = 2047;
      else if (index % 2 == 0) index++;
      return _safePrimes[index];
   }

   //2048 values, every second value is a safe primes 
   //the previous value is the Sophie Germain prime
   //as in q is the Sophie Germain and N = 2q+1 is safe
   private static int[] _safePrimes = new[]
   {
      5903,11807,8741,17483 //first four value only - see GitHub
   };
}

See the code on GitHub. Look specifically at the ZkProtocol and ZkSafePrimes classes.

I should note that in ServiceWire a new instance of ZkProtocol is used for each connection, so the Random object should provide a healthy random seed to the algoritm given the clock's milliseconds seed on construction. If you find this useful, I would love to hear from you.

PriorityQueue<T> in C# Using Heap

Today I had occasion to use in a work project the PriorityQueue<T> and Heap code that I had written and blogged about recently. In doing so, I discovered a couple of bugs and fixed them and added tests to cover the issue that was uncovered.

Here’s what changed in the PriorityQueue<T>. You can follow the link above to see the change to Heap.

// before
public void Enqueue(T item, T minItem, T maxItem)
{
   if (_order == PriorityOrder.Max)
      Heap.MaxInsert(_data, item, maxItem, _heapSize++);
   else
      Heap.MinInsert(_data, item, minItem, _heapSize++);
}

// after
public void Enqueue(T item, T minItem, T maxItem)
{
   if (_order == PriorityOrder.Max)
      Heap.MaxInsert(_data, item, minItem, _heapSize++);
   else
      Heap.MinInsert(_data, item, maxItem, _heapSize++);
}

And here is the full code for PriorityQueue<T>:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace AlgoLib.Structures
{
   public enum PriorityOrder
   {
      Min,
      Max
   }

   public class PriorityQueue<T> : IEnumerable<T>, 
      ICollection, IEnumerable where T : IComparable
   {
      private readonly PriorityOrder _order;
      private readonly IList<T> _data;
      private int _heapSize = 0;

      public PriorityQueue(PriorityOrder order)
      {
         _data = new List<T>();
         _order = order;
      }

      public PriorityQueue(IEnumerable<T> data, PriorityOrder order)
      {
         _data = data as IList<T>;
         if (_data == null) _data = new List<T>(data);
         _order = order;
         _heapSize = _data.Count;
         if (_order == PriorityOrder.Max)
            Heap.BuildMaxHeap(_data);
         else
            Heap.BuildMinHeap(_data);
      }

      public PriorityQueue(int initialCapacity, PriorityOrder order)
      {
         _data = new List<T>(initialCapacity);
         _order = order;
      }

      public void Clear()
      {
         _data.Clear();
      }

      public bool Contains(T item)
      {
         if (_order == PriorityOrder.Max)
            return Heap.MaxContains(_data, item, 0, _heapSize);
         else
            return Heap.MinContains(_data, item, 0, _heapSize);
      }

      public T Dequeue()
      {
         if (_heapSize == 0) throw new InvalidOperationException();
         if (_order == PriorityOrder.Max)
            return Heap.ExtractMax(_data, _heapSize--);
         else
            return Heap.ExtractMin(_data, _heapSize--);
      }

      public void Enqueue(T item, T minItem, T maxItem)
      {
         if (_order == PriorityOrder.Max)
            Heap.MaxInsert(_data, item, minItem, _heapSize++);
         else
            Heap.MinInsert(_data, item, maxItem, _heapSize++);
      }

      public T Peek()
      {
         return _data[0];
      }

      public void TrimExcess()
      {
         // trim remove items in _data beyond _heapSize
         while (_heapSize < _data.Count)
         {
            _data.RemoveAt(_data.Count - 1);
         }
      }

      public T[] ToArray()
      {
         return _data.ToArray();
      }

      public IEnumerator<T> GetEnumerator()
      {
         return _data.GetEnumerator();
      }

      IEnumerator IEnumerable.GetEnumerator()
      {
         return _data.GetEnumerator();
      }

      public void CopyTo(Array array, int index)
      {
         _data.CopyTo((T[])array, index);
      }

      public int Count
      {
         get { return _heapSize; }
      }

      public int Size
      {
         get { return _data.Count; }
      }

      public bool IsSynchronized
      {
         get { return false; }
      }

      public object SyncRoot
      {
         get { return _data; }
      }
   }
}

If you like it, let me know.

Priority Queue with Heap Algorithms in C#

Continuing with my C# Algorithms series, I’ve just completed a rather lengthy effort to implement and test min and max heap algorithms, including heap sort, along with a priority queue, something not provided by the BCL. While the heap sort is not always the most efficient, the algorithms required to accomplish the sort, specifically the heap data structure functions, supply the requisite functionality to make the implementation of a priority based queue quite easy to implement.

This work is a part of my continuing effort to work through all of the common algorithms found in Introduction to Algorithms 3rd Edition, I highly recommend this work to anyone who wishes to study classic computer science algorithms. For me, the exercise of implementing them in C# is a great learning experience.

First, let’s look at how it works. You have a job scheduler that needs to execute jobs in order of priority but jobs are enqueued with different priorities but need to be dequeued according to that priority. Here’s the Job class. Note the implementation of the MinValue and MaxValue which is needed for the heap based ordering required after a dequeue.

public class Job : IComparable
{
   public string JobId { get; set; }
   public double Priority { get; set; }

   public int CompareTo(object obj)
   {
      var other = obj as Job;
      if (null == other) return 1; //null is always less
      return this.Priority.CompareTo(other.Priority);
   }

   private static Job _min;
   private static Job _max;

   public static Job MinValue
   {
      get
      {
         if (_min == null)
         {
            _min = new Job { JobId = null, Priority = double.MinValue };
         }
         return _min;
      }
   }

   public static Job MaxValue
   {
      get
      {
         if (_max == null)
         {
            _max = new Job { JobId = null, Priority = double.MaxValue };
         }
         return _max;
      }
   }
}

And now here’s the test code that demonstrates the use of PriorityQueue<T> with the Job class.

[TestMethod]
public void JobTest()
{
   IList<Job> jobs = new List<Job> 
   {
      new Job { JobId = "test1", Priority = 45.0 },
      new Job { JobId = "test2", Priority = 25.0 },
      new Job { JobId = "test3", Priority = 4.0 },
      new Job { JobId = "test4", Priority = 88.0 },
      new Job { JobId = "test5", Priority = 96.0 },
      new Job { JobId = "test6", Priority = 18.0 },
      new Job { JobId = "test7", Priority = 101.0 },
      new Job { JobId = "test8", Priority = 7.0 }
   };
   var jobQueue = new PriorityQueue<Job>(jobs, PriorityOrder.Max);
   jobQueue.Enqueue(new Job 
                   { 
                      JobId = "test8", 
                      Priority = 232.0 
                   },
                   // min and max needed for MaxInsert or MinInsert
                   Job.MinValue, Job.MaxValue);
   Assert.IsTrue(jobQueue.Count == 9);
   var val = jobQueue.Dequeue();
   Assert.IsTrue(val.Priority == 232.0);
   Assert.IsTrue(jobQueue.Count == 8);
   Assert.IsTrue(jobQueue.Size == 9); //heapSize not same
   val = jobQueue.Peek();
   Assert.IsTrue(val.Priority == 101.0);
   jobQueue.TrimExcess();
   Assert.IsTrue(jobQueue.Count == jobQueue.Size);
}

I’m posting the PriorityQueue<T> class here along with the static Heap class that provides the underlying heap structure algorithms. I hope you get some use out of them.

public class PriorityQueue<T> : IEnumerable<T>, 
   ICollection, IEnumerable where T : IComparable
{
   private readonly PriorityOrder _order;
   private readonly IList<T> _data;
   private int _heapSize = 0;

   public PriorityQueue(PriorityOrder order)
   {
      _data = new List<T>();
      _order = order;
   }

   public PriorityQueue(IEnumerable<T> data, PriorityOrder order)
   {
      _data = data as IList<T>;
      if (_data == null) _data = new List<T>(data);
      _order = order;
      _heapSize = _data.Count;
      if (_order == PriorityOrder.Max)
         Heap.BuildMaxHeap(_data);
      else
         Heap.BuildMinHeap(_data);
   }

   public PriorityQueue(int initialCapacity, PriorityOrder order)
   {
      _data = new List<T>(initialCapacity);
      _order = order;
   }

   public void Clear()
   {
      _data.Clear();
   }

   public bool Contains(T item)
   {
      if (_order == PriorityOrder.Max)
         return Heap.MaxContains(_data, item, 0, _heapSize);
      else
         return Heap.MinContains(_data, item, 0, _heapSize);
   }

   public T Dequeue()
   {
      if (_heapSize == 0) throw new InvalidOperationException();
      if (_order == PriorityOrder.Max)
         return Heap.ExtractMax(_data, _heapSize--);
      else
         return Heap.ExtractMin(_data, _heapSize--);
   }

   public void Enqueue(T item, T minItem, T maxItem)
   {
      if (_order == PriorityOrder.Max)
         Heap.MaxInsert(_data, item, maxItem, _heapSize++);
      else
         Heap.MinInsert(_data, item, minItem, _heapSize++);
   }

   public T Peek()
   {
      return _data[0];
   }

   public void TrimExcess()
   {
      // trim remove items in _data beyond _heapSize
      while (_heapSize < _data.Count)
      {
         _data.RemoveAt(_data.Count - 1);
      }
   }

   public T[] ToArray()
   {
      return _data.ToArray();
   }

   public IEnumerator<T> GetEnumerator()
   {
      return _data.GetEnumerator();
   }

   IEnumerator IEnumerable.GetEnumerator()
   {
      return _data.GetEnumerator();
   }

   public void CopyTo(Array array, int index)
   {
      _data.CopyTo((T[])array, index);
   }

   public int Count
   {
      get { return _heapSize; }
   }

   public int Size
   {
      get { return _data.Count; }
   }

   public bool IsSynchronized
   {
      get { return false; }
   }

   public object SyncRoot
   {
      get { return _data; }
   }
}

And here’s the Heap code. Not light reading. For help in walking through the code, review the unit tests on GitHub.

public static class Heap
{
   /* What a max heap looks like from 45, 25, 4, 88, 96, 18, 101, 7:
    * 
    *                    0
    *                  /   \
    *                 1     2                       
    *                / \   / \
    *               3   4 5   6
    *              /
    *             7
    *                   101
    *            96             45
    *       88        25    18       4
    *    7
    */

   /// <summary>
   /// Convert IList to a max-heap from bottom up such that each node maintains the
   /// max-heap property (data[Parent[index]] >= data[index] where Parent = index / 2).
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   public static void BuildMaxHeap<T>(IList<T> data) where T : IComparable
   {
      var heapSize = data.Count;
      for (int index = (heapSize / 2) - 1; index > -1; index--)
      {
         MaxHeapify(data, index, heapSize);
      }
   }

   /* What a min heap looks like from 45, 25, 4, 88, 96, 18, 101, 7:
    * 
    *                 0
    *               /   \
    *              1     2                       
    *             / \   / \
    *            3   4 5   6
    *           /
    *          7
    *             
    *                 4
    *           7          18
    *       25    96    45    101
    *    88       
    */

   /// <summary>
   /// Convert IList to a min-heap from bottom up such that each node maintains the
   /// min-heap property (data[Parent[index]] <= data[index] where Parent = index / 2).
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   public static void BuildMinHeap<T>(IList<T> data) where T : IComparable
   {
      var heapSize = data.Count;
      for (int index = (heapSize / 2) - 1; index > -1; index--)
      {
         MinHeapify(data, index, heapSize);
      }
   }

   /// <summary>
   /// Maintain max-heap property for data at index location for specified heap size 
   /// such that data[Parent[index]] >= data[index] where Parent = index / 2.
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="index"></param>
   /// <param name="heapSize"></param>
   public static void MaxHeapify<T>(IList<T> data, int index, int heapSize) where T : IComparable
   {
      var largest = index;
      var left = HeapLeft(index);
      var right = HeapRight(index);
      if (left < heapSize
         && (data[left] != null
            && data[left].CompareTo(data[index]) > 0))
      {
         largest = left;
      }
      if (right < heapSize
         && (data[right] != null
            && data[right].CompareTo(data[largest]) > 0))
      {
         largest = right;
      }
      if (largest != index)
      {
         //exchange data[index] with data[largest}
         var tempRef = data[index];
         data[index] = data[largest];
         data[largest] = tempRef;
         //recurse
         MaxHeapify(data, largest, heapSize);
      }
   }

   /// <summary>
   /// Maintain min-heap property for data at index location for specified heap size
   /// such that data[Parent[index]] <= data[index]
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="index"></param>
   /// <param name="heapSize"></param>
   public static void MinHeapify<T>(IList<T> data, int index, int heapSize) where T : IComparable
   {
      var smallest = index;
      var left = HeapLeft(index);
      var right = HeapRight(index);
      if (left < heapSize
         && (data[left] == null
            || data[left].CompareTo(data[index]) < 0))
      {
         smallest = left;
      }
      if (right < heapSize
         && (data[right] == null
            || data[right].CompareTo(data[smallest]) < 0))
      {
         smallest = right;
      }
      if (smallest != index)
      {
         //exchange data[index] with data[largest}
         var tempRef = data[index];
         data[index] = data[smallest];
         data[smallest] = tempRef;
         //recurse
         MinHeapify(data, smallest, heapSize);
      }
   }

   /// <summary>
   /// Extrax max and re-heapify with decremented heapSize. 
   /// Caller must remember to decrement local heap size.
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="heapSize"></param>
   /// <returns></returns>
   public static T ExtractMax<T>(IList<T> data, int heapSize) where T : IComparable
   {
      heapSize--;
      if (heapSize < 0) throw new IndexOutOfRangeException();
      T max = data[0];
      data[0] = data[heapSize];
      if (heapSize > 0) MaxHeapify(data, 0, heapSize);
      return max;
   }

   /// <summary>
   /// Extrax min and re-heapify with decremented heapSize. 
   /// Caller must remember to decrement local heap size.
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="heapSize"></param>
   /// <returns></returns>
   public static T ExtractMin<T>(IList<T> data, int heapSize) where T : IComparable
   {
      heapSize--;
      if (heapSize < 0) throw new IndexOutOfRangeException();
      T max = data[0];
      data[0] = data[heapSize];
      if (heapSize > 0) MinHeapify(data, 0, heapSize);
      return max;
   }

   public static void MaxIncrease<T>(IList<T> data, int index, T item) where T : IComparable
   {
      if (null == item || item.CompareTo(data[index]) > 0) 
         throw new ArgumentException("new item is smaller than the current item", "item");

      data[index] = item;
      var parent = HeapParent(index);
      while (index > 0 
         && (data[parent] == null
            || data[parent].CompareTo(data[index]) < 0))
      {
         //exchange data[index] with data[parent}
         var tempRef = data[index];
         data[index] = data[parent];
         data[parent] = tempRef;
         index = parent;
         parent = HeapParent(index);
      }
   }

   public static void MinDecrease<T>(IList<T> data, int index, T item) where T : IComparable
   {
      if (null == item || item.CompareTo(data[index]) < 0)
         throw new ArgumentException("new item is greater than the current item", "item");

      data[index] = item;
      var parent = HeapParent(index);
      while (index > 0
         && (data[index] == null
            || data[index].CompareTo(data[parent]) < 0))
      {
         //exchange data[index] with data[parent}
         var tempRef = data[index];
         data[index] = data[parent];
         data[parent] = tempRef;
         index = parent;
         parent = HeapParent(index);
      }
   }

   /// <summary>
   /// Insert item into max heap. Caller must remember to increment heapSize locally.
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="item"></param>
   /// <param name="minOfT"></param>
   /// <param name="heapSize"></param>
   public static void MaxInsert<T>(IList<T> data, T item, T minOfT, int heapSize) 
      where T : IComparable
   {
      heapSize++;
      if (heapSize < data.Count)
         data[heapSize] = minOfT;
      else
         data.Add(minOfT);
      MaxIncrease(data, heapSize - 1, item);
   }

   /// <summary>
   /// Insert item into min heap. Caller must remember to increment heapSize locally.
   /// </summary>
   /// <typeparam name="T"></typeparam>
   /// <param name="data"></param>
   /// <param name="item"></param>
   /// <param name="maxOfT"></param>
   /// <param name="heapSize"></param>
   public static void MinInsert<T>(IList<T> data, T item, T maxOfT, int heapSize) 
      where T : IComparable
   {
      heapSize++;
      if (heapSize < data.Count)
         data[heapSize] = maxOfT;
      else
         data.Add(maxOfT);
      MinDecrease(data, heapSize - 1, item);
   }

   public static bool MaxContains<T>(IList<T> data, T item, int index, int heapSize) 
      where T : IComparable
   {
      if (index >= heapSize) return false;
      if (index == 0)
      {
         if (data[index] == null)
         {
            if (item == null) return true;
         }
         else
         {
            var rootComp = data[index].CompareTo(item);
            if (rootComp == 0) return true;
            if (rootComp < 0) return false;
         }
      }
      var left = HeapLeft(index);
      var leftComp = 0;
      if (left < heapSize)
      {
         if (data[left] == null)
         {
            if (item == null) return true;
         }
         else
         {
            leftComp = data[left].CompareTo(item);
            if (leftComp == 0) return true;
         }
      }

      var right = HeapRight(index);
      var rightComp = 0;
      if (right < heapSize)
      {
         if (data[right] == null)
         {
            if (item == null) return true;
         }
         else
         {
            rightComp = data[right].CompareTo(item);
            if (rightComp == 0) return true;
         }
      }

      if (leftComp < 0 && rightComp < 0) return false;

      var leftResult = false;
      if (leftComp > 0)
      {
         leftResult = MaxContains(data, item, left, heapSize);
      }
      if (leftResult) return true;

      var rightResult = false;
      if (rightComp > 0)
      {
         rightResult = MaxContains(data, item, right, heapSize);
      }
      return rightResult;
   }


   public static bool MinContains<T>(IList<T> data, T item, int index, int heapSize)
      where T : IComparable
   {
      if (index >= heapSize) return false;
      if (index == 0)
      {
         if (data[index] == null)
         {
            if (item == null) return true;
         }
         else
         {
            var rootComp = data[index].CompareTo(item);
            if (rootComp == 0) return true;
            if (rootComp > 0) return false;
         }
      }
      var left = HeapLeft(index);
      var leftComp = 0;
      if (left < heapSize)
      {
         if (data[left] == null)
         {
            if (item == null) return true;
         }
         else
         {
            leftComp = data[left].CompareTo(item);
            if (leftComp == 0) return true;
         }
      }

      var right = HeapRight(index);
      var rightComp = 0;
      if (right < heapSize)
      {
         if (data[right] == null)
         {
            if (item == null) return true;
         }
         else
         {
            rightComp = data[right].CompareTo(item);
            if (rightComp == 0) return true;
         }
      }

      if (leftComp > 0 && rightComp > 0) return false;

      var leftResult = false;
      if (leftComp < 0)
      {
         leftResult = MinContains(data, item, left, heapSize);
      }
      if (leftResult) return true;

      var rightResult = false;
      if (rightComp < 0)
      {
         rightResult = MinContains(data, item, right, heapSize);
      }
      return rightResult;
   }


   private static int HeapParent(int i)
   {
      return i >> 1; // i / 2
   }

   private static int HeapLeft(int i)
   {
      return (i << 1) + 1; //i * 2 + 1
   }

   private static int HeapRight(int i)
   {
      return (i << 1) + 2; //i * 2 + 2
   }
}

If you find any flaws, please do let me know. I will be working on improving this library over time, so look for updates and check GitHub for latest code. Enjoy.

Insertion Sort and Sorted Merge in C#

Update (9/6/2014) - Updated code on GitHub with performance enhancement to sorted merge algorithm.

As promised a few days ago, here’s the first installment in the algorithm series. A simple insertion sort and the previous sorted merge combine to provide you with a quick way to sort and merge multiple arrays without copying data from one array to another. You can find complete code and tests on GitHub.

Here’s the insertion sort code:

/// <summary>
/// Simple insertion sort of IList or array in place.
/// </summary>
/// <typeparam name="T">Type to be sorted.</typeparam>
/// <param name="data">IList or array to be sorted.</param>
public static void InsertionSort<T>(IList<T> data) 
   where T : IComparable
{
   if (data == null || data.Count < 2) return;
   for (int keyIndex = 1; keyIndex < data.Count; keyIndex++)
   {
      var key = data[keyIndex];
      var priorIndex = keyIndex - 1;
      while (priorIndex > -1 
         && data[priorIndex] != null 
         && data[priorIndex].CompareTo(key) > 0)
      {
         data[priorIndex + 1] = data[priorIndex];
         priorIndex -= 1;
      }
      data[priorIndex + 1] = key;
   }
}

And here’s one test example:

[TestMethod]
public void CombinedMergeSortTest()
{
   IList<MergeSortTestData> a = new List<MergeSortTestData> 
   { 
      new MergeSortTestData { Name = "Robert", Age = 43.5 },
      null,
   };
   IList<MergeSortTestData> b = new List<MergeSortTestData> 
   { 
      new MergeSortTestData { Name = "Robert", Age = 23.5 },
      null,
   };

   Sorter.InsertionSort(a);
   Sorter.InsertionSort(b);

   MergeSortTestData prev = null;
   int count = 0;
   foreach (var val in Merger.SortedMerge<MergeSortTestData>(a, b))
   {
      if (null != val) Assert.IsTrue(val.CompareTo(prev) > 0);
      prev = val;
      count++;
   }
   Assert.AreEqual(count, 4);
}


public class MergeSortTestData : IComparable
{
   public string Name { get; set; }
   public double Age { get; set; }

   public int CompareTo(object obj)
   {
      var other = obj as MergeSortTestData;
      if (null == other) return 1; //null is always less
      if (this.Name == other.Name)
      {
         return this.Age.CompareTo(other.Age);
      }
      return this.Name.CompareTo(other.Name);
   }
}

Merge Algorithm for Multiple Sorted IEnumerable<T> Sources

This evening I was asked to write a merge algorithm to efficiently merge multiple iterator sources, yielding a merged iterator that would not require the algorithm to read all of the data into memory should the sources be very large. I’ve never written such an algorithm nor can I recall seeing one, so I didn’t have a very good good answer. Of course that left a simmering thread of though on the back burner of my brain.

After letting it rattle around a bit and without resorting to old fashioned Googling, I sat down and banged out the following code. It was fun to write and works but it took me much too long to write from scratch—about 90 minutes. It may be time to refresh and reload, perhaps by writing a series of posts that implement C# versions of selected algorithms found in a book I recently purchased but have since spent no time reading: Introduction to Algorithms 3rd Edition.

Updated Code (9/6/2014)The original code gets a big performance boost with this refactoring:

public static IEnumerable<T> SortedMerge<T>
  (params IEnumerable<T>[] sortedSources)
  where T : IComparable
{
  if (sortedSources == null || sortedSources.Length == 0)
    throw new ArgumentNullException("sortedSources");

  //1. fetch enumerators for each sourc
  var enums = (from n in sortedSources
         select n.GetEnumerator()).ToArray();

  //2. create index list indicating what MoveNext returned for each enumerator
  var enumHasValue = new List<bool>(enums.Length);
  // MoveNext on all and initialize enumHasValue
  for (int i = 0; i < enums.Length; i++)
  {
    enumHasValue.Add(enums[i].MoveNext());
  }

  // if all false, nothing to iterate over
  if (enumHasValue.All(x => !x)) yield break;

  //3. loop through
  while (true)
  {
    //find index with lowest value
    var lowIdx = -1;
    T lowVal = default(T);
    for (int i = 0; i < enums.Length; i++)
    {
      if (enumHasValue[i])
      {
        // must get first before doing any compares
        if (lowIdx < 0 
		    || null == enums[i].Current //null sorts lowest
		    || enums[i].Current.CompareTo(lowVal) < 0)
        {
          lowIdx = i;
          lowVal = enums[i].Current;
        }
      }
    }

    //if none found, we're done
    if (lowIdx < 0) break;

    //get next value for enumerator chosen
    enumHasValue[lowIdx] = enums[lowIdx].MoveNext();

    //yield up the lowest value
    yield return lowVal;
  }
}

Here’s the original code. I hope you enjoy it. And if you see ways to improve on it, please let me know.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Merger
{
  class Program
  {
    static void Main(string[] args)
    {
      int[] a = { 1, 3, 6, 102, 105, 230 };
      int[] b = { 101, 103, 112, 155, 231 };

      var mm = new MergeMania();

      foreach(var val in mm.Merge<int>(a, b))
      {
        Console.WriteLine(val);
      }
      Console.ReadLine();
    }
  }

  public class MergeMania
  {
    public IEnumerable<T> Merge<T>(params IEnumerable<T>[] sortedSources) 
      where T : IComparable
    {
      if (sortedSources == null || sortedSources.Length == 0) 
        throw new ArgumentNullException("sortedSources");
      
      //1. fetch enumerators for each sourc
      var enums = (from n in sortedSources 
             select n.GetEnumerator()).ToArray();
      
      //2. fetch enumerators that have at least one value
      var enumsWithValues = (from n in enums 
                   where n.MoveNext() 
                   select n).ToArray();
      if (enumsWithValues.Length == 0) yield break; //nothing to iterate over
       
      //3. sort by current value in List<IEnumerator<T>>
      var enumsByCurrent = (from n in enumsWithValues 
                  orderby n.Current 
                  select n).ToList();
      //4. loop through
      while (true)
      {
        //yield up the lowest value
        yield return enumsByCurrent[0].Current;

        //move the pointer on the enumerator with that lowest value
        if (!enumsByCurrent[0].MoveNext())
        {
          //remove the first item in the list
          enumsByCurrent.RemoveAt(0);

          //check for empty
          if (enumsByCurrent.Count == 0) break; //we're done
        }
        enumsByCurrent = enumsByCurrent.OrderBy(x => x.Current).ToList();
      }
    }
  }
}

And if this answers any questions for you, please do drop me a line to let me know.