Metadata:
How to use typedef to create objects using structs which contain methods.
- Initially published on 11-11-2024.
- Parent note: Towards implementing encapsulation in C
Crafting structs containing methods using typedef
Previous implementation of a stack of integers
In a previous note, the notion of implementing encapsulation within C was explored. This was done by scaffolding properties and methods using the struct keyword. Here, the struct is referred to as an object class, its properties are either literal values or pointers to a some collection of values, and the methods are pointers to some function which includes an argument that is the instance of said object.
What was ultimately implemented was a stack abstract data type which processed integers using an integer array as a means to collect its values. The entire implementation is as follows:
Which can be instantiated with the following function which acts as a constructor:
This provides an interface such that a developer need not remember the particular name of the function which processes a specific stack type. For example, a developer need not know that the function which peeks at an integer stack is called either integer_array_stack_peek or integer_node_stack_peek (dependent on its implementation). A developer simply needs to call a method which is called peek, i.e., instance_name -> peek. More discussion and examples are given in the parent writing: Towards implementing encapsulation in C.
Shortcomings of prior implementation
Creating an instance of an IntegerStack in this context would involve a call to create_IntegerStack:
From here, the methods can be operated on using the arrow operator. I.e.,
On the subject of an interface as a means for a developer to be unconcerned with the contrivances of function names - a concerning observation is that the function name of create_IntegerStack is not descriptive enough for the object type that's being created. It should instead be named something like create_IntegerStack_array where the array suffix indicates the collection model of the abstract data type. An alternative suffix here could be node.
With the current schema of using function pointers within the struct, it's not clear how to fold the logic of a constructor into the original struct such that a method call to the constructor can be made. It should be the case that something like instance_name = New ClassType can be called to create an instance of the stack such that its values are instantiated.
Another intuitive observation is the usage of the arrow operator outside the scope of object implementation. If the goal is to introduce encapsulation that represents object oriented design such that it should be intuitive for a developer, then why should the syntax diverge from the norm? Using the arrow operator in the prior example keeps an eye on the fact that pointers are being used in the background. This helps in a pedagogic context such that an inexperienced C developer can mind the fact that memory is being allocated and that the arrow operation calls upon/invokes the data that these pointers are pointing at. Recall that:
-
(*a).b is equivalent to a->b, or
-
(*object).property is equivalent to object->property where (*object) produces a dereferenced pointer.
Ignoring pedagogic advantages, an interface should be welcoming to an inexperienced C developer. Specifically one who has programmed in a higher level language and is used to the standards set in place from other languages that support object oriented programming.
Using typedef for dot notation
The general pattern from implementing the IntegerStack struct to implementing its methods to then implementing its constructor can described as the following:
-
Creating the struct and telling its methods that they should expect to be pointed to a set of functions.
-
Defining the logic of said functions which are reliant on the definition of the above struct.
-
Declaratively attaching the names of the defined functions to the names of the methods by pointer association.
The last step relies on knowing that the function named push_IntegerStack exists. It also relies on knowing that there exists a struct type of the name IntegerStack. This is evident by looking at the memory allocation procedure. It is also implicit by knowing that push_IntegerStack requires an argument of type IntegerStack. This affords the conclusion that the second step is also dependent on the first step to be carried out.
Executing these three steps must occur in this order. This entails that it is impossible integrate a constructor method into the struct itself. This also entails that arrow notation is required to access data through the method pointers by reason of making the aforementioned pointer association; There is nothing of a function type which has the signature of int (*peek)(struct IntegerStack*); that can be directly referred to within the IntegerStack struct.
These hold true until we consider the typedef keyword.
Forward declarations
The
typedefdeclaration provides a way to declare an identifier as a type alias, to be used to replace a possibly complex type name - cppreference.com
Though the above definition is taken from a cpp reference, it still applies to the language cpp is an extension of. This allows the declaration of a type to use as a name elsewhere.
This allows the alteration to the order of operations discussed in the previous section. The sequence will now be:
-
Create the namespace for the stack.
-
Create the namespaces for the method signatures within the struct that represents the stack. Here, the namespace for the stack will be used for the parameters of these types.
-
Create the struct that represents the stack. Associate the names of the method signatures to the name of the methods. In conjunction with creating the struct using the typedef keyword, this association will allow the usage of dot-notation.
-
This is partly due to the fact that memory is no longer being dynamically allocated for the struct proper; Returns from malloc and calloc are out of the picture at this scope.
-
-
Define the logic of the functions to be associated with the method signatures of the stack.
-
Build a constructor to instantiate the stack; Declaratively attach the names of the defined functions to the names of the methods within the constructor.
Creating the namespace for the stack
This is a simple one-line declaration:
Creating the method signatures
The above statement allows the usage of the IntegerStack datatype in creating the method signatures to be used for the stack:
The general syntax here can be described as typedef <return-type> (*<signature-name>)(<argument-types>).
Creating the struct that represents the stack
The typedef keyword will be used instead of using malloc or calloc to allocate a range of memory and assign a pointer. This will allow dot notation while working with any property of the struct:
Note that the last line, which reads } IntegerStack; is the name given to this type definition. It will be used when creating structs of this type. The name given here is arbitrary.
Method definitions; Passing by reference
If you thought that the first step of this new set of operations was the easiest to implement, then you would be wrong. The definitions of the methods to be used for this new struct are unchanged. The definition if push_IntegerStack, for example, is the same between both versions of the structs that we've discussed. Why is this?
The obvious observation to be made whilst looking at the above method definition is that the arrow operator is still being used! What would this method look like should the dot operator be used in its stead?
The issue becomes apparent when the return type of the altered function is considered. The concept of a push is a destructive operation to a stack; it should perform a permanent change to the stack. C is interesting because it allows the developer to decide whether a value is passed to a function or a reference to a value is passed. When a value is passed, the function operates on that value in its own scope such that it should be considered a copy of the data in the context of the function. This means that the stack referred to as this within the above version of the push function is different from that which is being provided as a function call.
What this means is that the return of said function needs to overwrite that which calls it:
This presents a problem when the other destructive operation is considered. The pop operation should return an integer. Using the above pattern will also require a return of the struct. This presents a situation where a new data structure needs to be introduced as a return value such that it is assigned to the caller and can be parsed within the caller's scope.
This implementation won't be explored here. If the reader decides to make such an implementation, it will be seen that this goes against the philosophy of the interface and the advantages it affords.
Instead, the set of methods will operate on a pass by reference scheme where the pointer of the stack is passed as an argument. This means that a call to one of its methods will follow the pattern of:
Where the ampersand (&) is the address operator which creates the pointer of a given object.
The constructor
Contrary to the method definitions, the constructor should be something that creates a struct. The prior constructor uses malloc to create a region in memory which then uses that region to associate the stack functions to its method signatures wherein the pointer to the stack is returned. The fact a pointer to the struct is returned necessitates the usage of the arrow operator when calling its methods no matter the scope.
Thus, the new constructor will be straight forward in creating a new struct of type IntegerStack as a return:
It's worth noting that calloc is still used to dynamically allocate memory for the collection. This allows for dynamic allocation to continue to occur in the context of the push operation. Consider the case where the struct continues to define collection as int collection; and the constructor includes the statement int collection_array[new_limit];. This statement exists above the instantiation of the new_array struct of where new_array includes the line .collection = *collection_array. Here, memory is automatically allocated by the compiler which is fixed with respect to the size of the collection_array. This implies that should push need to dynamically resize the collection, then it would also need to recreate the entire struct. This would be inefficient, thus a pointer is created in conjunction with calloc instead.
Instantiation of an IntegerStack is still dependent on recalling the name of its constructor function. Forward declaration does not afford the ability to call a constructor as a method.
It's easy to have an organization establish a naming schema to address the issue of convention. I.e., any constructor function shall have a name of create_<object-class>. Adopting a convention like this would solve the problem of knowing which function name to reach for and would be trivial to implement. It's not an interesting solution, though.
Object oriented languages that I have personal experience with typically follow the form of capturing the name of the object class as a function call for the constructor. Consider instantiation with the following languages:
-
Python: stack_instance = IntegerStack(val);
-
Java: IntegerStack stack_instance = new IntegerStack(val);
-
Ruby: stack_instance IntegerStack.new(val)
-
C++: IntegerStack stack_instance = new IntegerStack(val);*
The pattern that will be easiest to emulate is that which Python leverages. A call to IntegerStack(val) will produce an object which is assigned to stack_instance
Using C's function definition syntax, the compiler will not allow the declaration of a function called IntegerStack. This is because it reserves the word through the definition of the struct. To get around this, a macro will instead be used:
What's leveraged here is a function-like macro that accepts one argument. What happens by means of the define keyword is that the pre-processor first replaces the occurrence of new_limit within the macro with whatever the value is for the argument of the macro. The pre-processor then replaces any occurrence of IntegerStack(<val>) with the string that proceeds IntegerStack(<val>) .
The line which instantiates an IntegerStack will be the following:
Which the pre-processor converts to:
Which the compiler then translates to an executable binary.
With these pieces in place, the we have an IntegerStack object which operates on dot notation:
Moving Forward; providing self as an argument
The above set of statements are now valid for the C compiler. What a developer needs to mind, as they continue to use objects that are defined this way, is how to properly handle any function which may operate on them. The developer will need to make decisions in determining whether the stack's value or the stack's reference is passed as they build new subroutines which require a stack. This decision point justifies why &test_stack is being passed to each of the methods above.
With this, we can recall the discussion point of this note's parent on having to supply a reference of the stack for each method call. To alleviate this, another pre-processor needs to be developed that acts as a scanner for this program which captures occurrences of each instance of IntegerStack and then captures each method call and converts the resultant string to a valid string in C. For example, if within the program there exists stack_instance.peek() then the new pre-processor scans for each occurrence of this string and replaces it with stack_instance.peek(&stack_instance).
This new pre-processor would be necessary as C macros do not allow the capturing of a program string with a period in it.
Future discussion on this topic will include how to develop such a pre-processor, how to simulate generics such that stacks of differing types can exist in the same collection, and alternative advanced data types that can be implemented using this means of encapsulation.