C# Linked List using a Dll by edlukens

Last Updated: 11 January 2005

One of the challenges faced by an application designer is reading in large sequential files and having to also load the data in processing arrays as it is read. For example, let’s say we have a large data file and we have groupings within that file bounded by data indicators that signal a switching of the group. We read and process the file by group and when we are finished with the group, the data buffers must be refreshed. This data structure at first sounds like an array. However, here is the real issue: How big do we build the array? This is a question not easily answered when one cannot determine the “largest case” scenario for an array. And the issue grows even larger when we have a requirement to have what is known as a “jagged array” where dimensions are not always equal or uniform.

In this article, I will refer to an actual project where a large input file was opened up for sequential read and the data within that file was logically divided into groups with varying amounts of related data per group. For the data within the group, we might have to access any one of the rows to determine if it is there in order to make processing decisions. In the old days, we would try to estimate the maximum number of rows in a group and build the array that size. This can cause a couple of issues. First, we do not really know the maximum number of rows that will be in any given group — we can only estimate. If we estimate wrong, then this leads to an error where the array is not large enough. Second, even if we size it right, there is tremendous memory waste because the array area must be allocated in a fixed length. Figure 1 below shows this situation.


 Figure 1

Figure 1 shows a diagram of a group structure if fixed length arrays are used. GP1 is the group header record from the sequential file. We need to store the record in the array because there are items of data needed within it and therefore we have to allocate a full array row for it. SGA, SGB, and SGC are all different segments types that are stored in the array within group GP1. The segment type is also like a sub-grouping within GP1 and their individual segments are suffixed with a number 1, 2, 3, etc. Every cell in the diagram marked “Unused” is just that. They are “Unused” because the number of elements in that particular segment are assigned a cell but the array is fixed length and we therefore have to allocate up to the maximum number of cells. Every cell marked “Unused” is a wasted memory area.

There are two data structures that were used to build a jagged array and solve these problems. The first structure was a linked-list implementation which had to be built through code and the second is a built-in C# structure known as an ArrayList. This article will discuss how that using a combination of the two structures will yield an overall structure as is shown in Figure 2:


     Figure 2

The Linked List and ArrayList Structures

The linked list structure is basically a series of nodes with a “next” pointer that binds them like a chain. The advantages of this type of structure are:

  • Very fast

  • Can be quickly discarded and declared new (as opposed to clearing each element like in an array)

  • Uses only the memory needed — there is no memory waste

The disadvantages are:

  • Can be very cryptic and hard to understand

In our implementation, we use a C# structure known as an ArrayList to store the linked lists that will be the offshoot branches along the chain. Why do we use an ArrayList and not another linked list? Simple. It is for simplicity sake only. If we were to use a series of branches of linked lists, then we would have to have more complex implementations of the GoToNode method in the dll. We would have to traverse the tree to find a specific node on a sub branch. With an ArrayList, we simply push the entire sublist on the ArrayList and can pop it off with one index (as it will only hold one list in our implementation here).

Implementation: Building the DLL

The basic building block of a linked list is a structure known as the tree node. We started this application by building a .dll named LinkList.dll. Initially, the class declaration to define a node within the linked list is shown in figure 3. The squares in figure 2 above would be referred to as nodes.

public class LlNode // This is the node object within a linked list
public string status; // You can put a status indicator in here
public ArrayList minornode = new ArrayList(); // This is the offshoot branch
public string data; // This is the data buffer -- put your data here
public LlNode next; // Will point to the next node in the list

}; // *** LlNode declaration ***

Figure 3

The field “status” is a string area that you can use for anything to indicate some kind of state information or anything you want for that matter. The data within this field exists for the life of the node. The next field within the class is of type ArrayList and is named minornode. Here, we will actually place another linked list as a branch node. The segments SGA, SGB, and SGC displayed in figure 2 actually start at minornode. The string area “data” is the actual data buffer and will hold the input record area. And finally, but very important, is the field next which is actually a pointer or references another node associated with the list to build.

Once we have the node defined, we want to create a class for the linked list data structure and methods. Our class will have the following methods:

LinkedList() This is the constructor method
GoToNode(int pos) This method is invoked to position to an absolute location (indicated by pos) on the linked list
PushNode(LlNode newnode, int pos) We use this method to push a new node onto the linked list (which in our implementation is somewhat like a stack in that nodes are placed on the list in a last-in-first-out method)
ListLen() Returns an integer value of the linked list length.
PrintList() To print the contents of an entire linked list on the console.
Exists(L1Node pnode, string srch, ref int pindex) This method will return a boolean to indicate if a type of data segment (i.e., SGA, SGB, etc.) exists on the linked list tree. This was used in the implementation for validating that required segments were there.

Remember, what we want is a way to read in an associated grouping of records into a jagged array, manipulate or extract the information from the array, and then dispose of the grouping in memory when finished. This is useful for processing situations like reading in all personal information for an account holder, getting the information you need, and then disposing of the structure. When using fixed-length arrays, you have to initialize each cell. This can be time consuming and resource intensive. With a linked list, all you have to do is reset or declare “new” the list array object.

Creating a New List

To instantiate a new linked list you must declare a type of LinkedList and construct the new object. For example, to declare and instantiate the LinkedList object MyList we would have the following code:

public static LinkedList MyList = new LinkedList();

Nodes are part of the list therefore we must instantiate a new node with the following code:

public static LlNode MyNode = new LlNode();

Building the List

When building the list, we work at the node level first. We fill the node with the input record data and then push it onto the list much the same way we would treat a stack structure. In following code example, we add the data from the input flat file record to the instance of node and then use the PushNode method to place it on the list:

LinkedList templist = new LinkedList();    // templist is a new linked list structure
 LlNode node = new LlNode      // Llnode is the node that will get placed on the list
 node.data = sInRec // the data buffer of node now has the input record data
 templist.PushNode(node,0)  // Push the node on the list at the entry point

Processing the Records

Our sample file has the following records in it:

HDR 0100

This file, as listed above, has a header record (HDR), three group records (GRP), and various segments (SGT) per group.

Now, let’s trace the life cycle of a record from flat file to linked list to disposal. Looking at the program listing LinkedListMain.cs in the class MainClass, we see that the program has one argument on command line: the input text filename (above) which we named datafile.txt. The file is opened for input via the StreamReader class and read sequentially. For each record read, control is passed to the “Process” method:

StreamReader oInfile = File.OpenText(sInfilename);
 sInRec = oInfile.ReadLine();
 iState = HEADER;
 while (sInRec != null) {
 sInRec = oInfile.ReadLine();

The method “Process” starts by passing each record through a simple state machine. Basically, if the first three bytes of the record is “HDR” then this indicates an informational header record.The “state” becomes the constant HEADER. The code next looks for a record type (first 3 bytes in the record) of GRP. The state then becomes the constant GRP.

if (sType == "HDR") {
 iState = HEADER;
 if (sType == "GRP" && iState == HEADER) {
 MyNode.status = "Initialized";
 iState = GRP; }
 if (sType == "GRP" && iState == GRP) {
 MyNode.data = "MyNode";
 MyList.PushNode(MyNode, 0);
 Console.WriteLine("Show list");
 bExists = MyList.Exists(MyNode,"GRP",ref pindex);
 PrintMyList(0); // Print the grouping node
 PrintMyList(1); // Print all segments in the group (branch)

At the occurrence of the GRP record, a new list node or LlNode is created. This will store the linked list (or really a pointer to the linklist stored in minornode) of all segments within the grouping. A temporary list named templist is created and a new instance of the list node area MyNode is declared. The data from the flat file record is placed in the MyNode.data buffer and the node is pushed on at position 0 in the linked list templist.

switch (sType) {
case "GRP":
// We can put some checks here to make sure
// that we do not add more than one GRP per grouping

// This is the first segment in this loop and thus a
// new grouping List node is declared;
MyNode = new LlNode();

// node is the ListNode object where the segment's data
// is stored.
LlNode node = new LlNode();

    // templist will ultimately end up on the minornode ArrayList
templist = new LinkedList();
node.data = sInRec;

    // add the data for this segment to templist

// push templist onto the ArrayList minornode
MyNode.status = "nothing";

Figure 4 shows the structure of the list at the occurrence of a new grouping event:

Figure 4

After the GRP segment, the program expects the data segments within the group or SGT records. The program will read as many as there are before the next GRP record. Here is where the branch list gets built off of the minor node of MyNode for each SGT record read.

case "SGT":
// Declare new instance of node -- it will store the input rec
node = new LlNode();
node.data = sInRec;

// Declare a new working list
templist = new LinkedList();

// Insert node into working list

// Check to see if there is already a "SGT" list in this loop node
// The Exists() method will return the index of the segment if true
bexists = MyList.Exists(MyNode,"SGT",ref pindex);

Figure 5 shows the resulting structure after reading and processing the SGT segments:

Figure 5

The linked list just keeps building until the end of the flat file or another GRP record is encountered to indicate a new grouping. Note that in order to “push” the templist onto the ArrayList indicated in minornode, we must remove the old list first then insert the new list with the RemoveAt and Insert methods of the ArrayList class.

Upon encountering a new GRP record, we can extract, process, and manipulate the data in the linked list before discarding it. Before placing a new GRP segment in the new grouping, we initialize or create a new list node by doing: MyNode = new LlNode();. At the point where MyNode is declared new, the old memory allocated for the linked list branching from MyNode is gone and the .NET garbage collector comes along to reuse it. A new node with the same name, MyNode, is now prepared to point to a new list.

By using the method PrintMyList(node index), we can dump the contents of the entire list to the console. This method travels down the chain from the end to the start in order to print the list in first in first out order. Why is the list built in reverse? It is because we are always pushing a new node on at position 0 with the PushNode method. This is much easier than trying to figure out where the end of list is and append to the end. In the PrintMyList method, the order is reversed so that you have the impression that the records are on the list in original order.

A Recommendation

Use the Visual Studio or DBGCLR debuggers to correct any bugs found in your linked list implementation. Otherwise you will get thoroughly confused when trying to isolate problems in the linked list processing. These debuggers do an excellent job in “exploding” the linked list and letting you see the entire tree. Figure 6 shows a Visual Studio .NET debugging session for this program with the linked list exploded:

Figure 6

Some Final Thoughts

This is a simple example of a combination linked list and ArrayList. The intent is for you to be able to take this code and adapt and modify to fit your particular data processing needs. Without structures like this, we are forced to table grouped information in fixed-types of structures such as random access flat files, database tables, and fixed-length arrays. All of these structures are either memory-intensive or input/output intensive. The linked list is very efficient on memory and very fast. When you have to do large volume table processing of sequential data give this structure consideration. It just might do the trick for you.


Demo Project Files: LinkedListMain.zip

Dll Project Files: LinkedListLib