> Towards encapsulation in C

 > Notes

Home

Metadata:

How to emulate object classes and methods in C to work towards encapsulation.
- Initially published on 11-06-2024.


Towards implementing encapsulation in C

Recap of Abstract Data Types

Encapsulation in object oriented design is the notion of abstracting data such that access to information and behavior of an object is relegated to a set of interfaces. These interfaces are methods in which another object or environment can ask to perform some action. This provides the advantage of abstracting away the detail of implementation to allow a developer to be not so concerned with the details of an object.

The intuitive approach to implementing something like a stack within C will involve having some collection and a set of functions which operate on the collection. These entities will exist in whatever scope which is inherited from the file that contains them. Let's take a moment to recal exactly what these functions are and the pieces of data are required for them to operate.

A stack that is implemented within a collection like an integer array will need to know the quantity of items that exist in the collection. This value is necessary to perform many of the operations expected of a stack. These operations include a peek operation, which returns the value that exists at the top of the stack. There will also need to be a function that places a value onto the stack - push. And then there will be need of a pop operation which removes and returns the value of said item. Lastly, it helps to have a utility function which simply prints the stack.

All of these functions operate with respect to the quantity of the stack such that its value is indicative of the index where the top item of the stack resides. That is, the top is quantity - 1 in a zero-based indexing scheme.

Thus, within the C program, the following are defined:

  • An integer array labeled collection.

  • An integer labeled quantity.

  • A function labeled peek which takes an integer array as an input and returns an integer.

    • peek(integer array) -> integer

  • A function labeled pop which takes an integer array as an input and returns an integer.

    • pop(integer array) -> integer

  • A function labeled push which takes an integer array and an integer as an input and returns nothing.

    • push(integer array, integer) -> void

  • A function labeled print which takes an integer array as an input and returns a string.

    • print(integer array) -> string

Notion of an interface

Naively, having these functions is fine and dandy with respect to an immediate problem whose solution requires a stack. But things can get a bit messy while considering the ambiguity of the function names, (specifically peek and print). Print is a pretty vague function name, and can apply to any range of subroutines and datatypes from the perspective of some agent who is simply glancing over function names; one who is not too concerned with the implementation. Does the function called print that is being recognized by an IDE's linter work for any integer array regardless of abstraction?

What if print is given an integer array that represents a queue abstract datatype? Will it process the arrangement of values correctly? What about peek? What if we ask peek to take a look at a queue as well? Knowing that a program may expect a different schema of data representation within the integer array makes this frivolous usage of peek and print a dangerous decision.

To compensate, the functions can be renamed:

  • stack_peek(integer array) -> integer

  • stack_pop(integer array) -> integer

  • stack_push(integer array, integer) -> void

  • stack_print(integer array) -> string

Using this order of functions also brings forth the issue of differentiating integer stack operations from string stack operations. The solution to this question is out of the scope of this discussion, as it pertains to implementing generics - a feature that is not explicit in C which will be the topic of a later discussion. To compensate for this, a loose notion of the interface can be introduced. That is, constructs representing something like a stack or a queue such that an instance contains a method called peek or print.

There can exist an arbitrary amount of functions which represent something like the peek operation, such as:

  • integer_array_stack_peek(integer array) -> integer

  • integer_node_stack_peek(integer node) -> integer

  • character_array_stack_peek(character node) -> character

  • character_node_stack_peek(character node) -> character

  • string_array_stack_peek(string array) -> string

  • string_node_stack_peek(string node) -> string

  • etc.

Here, knowledge of the associated function operator is required. To alleviate potential messiness and confusion that this can cause, hese functions can instead be coupled into some construct such that a method can be expected:

  • instance_name_u.peek(integer array) -> integer

  • instance_name_v.peek(integer node) -> integer

  • instance_name_w.peek(character array) -> character

  • instance_name_x.peek(character node) -> character

  • instance_name_y.peek(string array) -> string

  • instance_name_z.peek(string node) -> string

  • etc.

Structure of an object

C doesn't natively support object oriented design. There are no class definitions where properties and methods can be declared. Many will reach to an extension of C, C++, to gain access to these programming constructs.

C does have a feature that allows the assignment of a grouping of values to a single namespace - the struct. Thinking back to the example of an integer stack, recall the value associated with the quantity variable. Let's also recall that there is the collection variable which points to an integer array. With these two pieces we can make a simple struct:

Accessing elements of a struct

The struct above has two different values. The value associated with the namespace quantity represents an integer in memory. The value associated with the namespace collection is a pointer which represents the position in memory in which an integer array exists.

When creating an instance of this struct, memory needs to be allocated such that space is reserved for both the quantity and the collection. The size of the collection is arbitrary at this point. Recall that creating an array in C requires telling the compiler the size of the array. That is, the amount of elements it contains. In this case, these elements are integers. The shorthand for creating an integer array may look like this:

Where myArray can contain a total of 8 integers. This shorthand is a syntactic abstraction for the following:

Here, calloc is used to allocate space within memory to house up to 8 integers. Both means can start assigning values using to the array using the following syntax: myArray[0] = <int>;

Initialization of of this 8-integer array will reserve space in memory for 8 integers that can be accessed through a zero-based indexing scheme using the namespace of 'myArray'. When initialized, the values that exist in this area should essentially be treated as garbage. If malloc was used instead of calloc, it should be assumed that values exist there that may have been dereferenced from some other process. This is why one may see an initialization procedure that runs a loop bounded by the size of the array where a zero is placed into each index. calloc zeros out these values for us.

Determining whether or not garbage exists in a certain index is made trivial once we consider the quantity value of the struct. The various operations on the stack will use the value of quantity to differentiate valid values from the invalid. If quantity is zero, for example, then it can be assumed that all values that exist in collection have no meaning.

How can the values of a struct be accessed? First, an instance of a struct needs to be initialized. As is the case with an array, memory needs to be reserved. In this case, space needs to be reserved for both the struct and the array:

This initialization creates a pointer to the space of reserved memory which represents the structure. newStack is a pointer to this space in memory.

There are two approaches to referencing data within a struct. The dot (.) operator accesses values to a struct directly. The arrow (->) operator is used to access values associated with a pointer.

Using pointers

A pointer referring to a region in memory will be used so that we can keep in mind what's happening within memory. This is will be advantageous as a pedagogic device. What's important to remember is that accessing a member of a struct from its pointer can be done as a->b where a is the name given to the struct and b is one of its properties. This is a syntactic abstraction of (*a).b. To strictly adhere to using dot-notation would require usage of the typedef keyword when working with structs, which will be circled back to in a later post.

Using the example of the IntegerStack called newStack, values can be assigned to its properties using the arrow operator:

Looking through these statements, one may also ask why a pointer is used with respect to collection. Instead of using the declaration int *collection within the definition of the struct, it is valid to instead int collection[8] and not have to include the statement of newStack -> collection = calloc(8,sizeof(int)); in the block of code above. Using calloc instead allows an easy transition into a implementing a dynamically sized stack. Such a stack isn't bounded by a programmatic value upon declaration.

This can be seen in implementing the push operation for an IntegerStack. How should the stack behave when a 9th item is pushed onto the stack? One approach is to simply provide an error message. A more interesting approach is to resize the stack. To do this, a new value needs to be considered that acts a limiting value which determines when the stack should be resized; a value that tracks the current size of collection. Consider the new definition of IntegerStack:

Where initializing an IntegerStack involves the following set of statements:

Functions to be used as methods

With this, a function which pushes a value onto the integer stack can be described. This function takes as input a struct which represents the integer stack. Here, it will be seen that once the value of quantity is equivalent to the value of limit, new_array is created such that its memory footprint is 1.5 times the size of the old array. Once this new array is created, the old values are placed into it by means of the loop. This new array's pointer is then assigned to the stack's collection value with the statement this -> collection = new_array;.

It should be noted that the first input to push_IntegerStack is referred to as this. Anyone who has experience with object oriented design will recognize the keyword as a means for an object to refer to itself. Within Python, the keyword is self is used instead. In the context of Python, a class definition likely has a set of methods defined as well. The first argument within these definitions is indeed self, not dissimilar to what's happening in the above function definition.

Let's develop the remaining stack operators using this general pattern:

Integrating Methods

Here we have a set of functions which take an IntegerStack struct as input and exhibits expected behavior. This inches us closer to complete data encapsulation. The task at hand is to wrap these functions up into the struct itself. This gets a bit tricky because there is no means to simply place a function definition within the struct. Instead, a pointer to the function can be used! The struct can be defined with these pointers in mind:

Here, the syntax of void (*print)(struct IntegerStack*); can be seen as <return-type> (<method-name>)(<parameters>). Once this struct is defined, a function can be created that acts as a constructor:

Where a an instance of this structure can be initialized by a function call to the constructor:

And its operations can be performed using method calls via the arrow operator:

Towards interoperability

It may seem redundant having to supply the stack as the first argument to each method call. It's likely that other programming languages and extensions perform some form of processing while constructing an abstract syntax tree to eliminate this need. This is why Python was brought up earlier in this writing - to provide an example where complete obfuscation of this self reference doesn't exist. As far as I know, C isn't at a place where it can do this without creating some sort of extension that needs to be compiled on top of what gcc already does.

What has been accomplished so far allows a developer to be unconcerned with the specific naming convention associated with a given datatype. What's been implemented above is an integer stack using an integer array as its mode of collecting values. It could be the case that there exists another type of integer stack which uses a collection of linked nodes to represent a stack. If a struct isn't used to collate their valid operations, then a developer may need to recall that there exists two different functions called peek_NodeIntegerStack and peek_ArrayIntegerStack in order to perform a peek operation.

Instead of needing to recall an arbitrary amount of function names, using a struct definition which captures encapsulation allows a developer need only be concerned with a more simple method name. I.e., stack_instance_1 -> peek and stack_instance_2 -> peek where stack_instance_1 is an instance of the array implementation and stack_instance_2 is an instance of the node implementation.


Continuing the discussion

Future posts will explore how to implement differing data structures and how to implement generics such that a collection may hold any type of implementation of a the same abstract datatype. This allows us to show that it can be iterated upon and expect the existence of a given method regardless of implementation of its contents. These will be linked on this document once they've been published.