Adding Set Types to the C Programming Language

Well, it’s been over a month since my last post, so it’s high time I got back into it. I kinda got distracted by other stuff on other sites, most notably DeviantArt and Discord. But I’m back now, and hopefully I’ll be able to post more consistently in the future.

This is my third article about creating user-defined data types in C. The other two were mainly concerned with implementing numeric types to get better precision. In this one we will be implementing a completely new (non-numeric) data type based on the set concept from set theory. Set types are a feature in some programming languages, most notably the Wirth languages (Pascal, Modula-2, Modula-3, Oberon). They are implemented as simple integers where each bit of the integer represents an element from a universal set. Because the size of this integer is fixed, the number of elements in the universe is restricted to a relatively small quantity.

In our implementation of set types, there will be no restriction on the size or cardinality of the set. We will use an array of unsigned bytes for the elements of the set, and we will have an extra integer field to keep track of the size of this array. Each bit of the set is either a 1 or a 0, indicating whether the corresponding element is in that set. The elements are essentially just integer values, ranging from 0 to the number of bits in the set.

As before, there are two main steps in defining a new C data type: first define a struct encapsulating all the information about the object, and then implement all the operations for that type. This is what our set type definition looks like:


typedef struct {
        int size;       // Size of the set's universe in bytes, or the number of elements in the universe divided by 8
        uint8_t *bits;  // Each bit corresponds to a member of the universeset_t;

As you can see, we have fields to keep track of the size of the set as well as all the data in the set. The data is encapsulated in a uint8_t pointer, so we can have an array of unlimited size to store all the set elements. This is in contrast to Pascal’s method of implementing sets, where a large unsigned integer is used and all sets are limited to a cardinality of 256. Note that the actual size of the set in bits is always a multiple of 8, so if the cardinality is not a multiple of 8, we will be rounding up to the nearest multiple when we do our calculations.

The next thing to do is to implement the operations from set theory in our set type. In set theory there are a number of operations, including operations on a set and an element, operations on one set, and operations on multiple sets. First we should have a function for testing set membership, as well as two functions for adding and removing elements, respectively:


// Adds a member to a set
void set_addset_t *set, int member ){
        int major_index, minor_index;
        major_index = member / 8;
        minor_index = member % 8;
        set->bits[major_index] |= (1 << minor_index);
}

// Deletes a member from a set
void set_delset_t *set, int member ){
        int major_index, minor_index;
        major_index = member / 8;
        minor_index = member % 8;
        set->bits[major_index] &= ~(1 << minor_index);
}

// Determines whether an element is a member of a set
bool set_memberint element, set_t *set ){
        int major_index, minor_index;
        major_index = element / 8;
        minor_index = element % 8;
        return set->bits[major_index] & (1 << minor_index);
}

We can use the set_add() and set_del() functions to build sets to our exacting specifications. To do this, it helps to have a starting point – functions for generating the empty set and set universe of a given size:


// Returns the empty set with the given size
set_t *empty_setint size ){
        set_t *empty_set;
        int i;
        empty_set = (set_t *) mallocsizeofset_t ) );
        empty_set->size = size / 8 + (size % 8 ? 1 : 0);
        // Don't add 1 byte if an exact multiple of 8
        empty_set->bits = (uint8_t *) malloc( size );
        for( i = 0; i < empty_set->size; i++ )
                empty_set->bits[i] = 0x00;
        return empty_set;
}

// Returns the universe with the given size
set_t *set_universeint size ){
        set_t *universe;
        int i;
        universe = (set_t *) mallocsizeofset_t ) );
        universe->size = size / 8 + (size % 8 ? 1 : 0);
        // Don't add 1 byte if an exact multiple of 8
        universe->bits = (uint8_t *) malloc( size );
        for( i = 0; i < universe->size; i++ )
                universe->bits[i] = 0xff;
        return universe;
}

Note that the sets are dynamically allocated on the heap, which means we will have to free them later. As you will see, I have created free() versions of many of the set operations for this purpose, much like I did with libdfloat.

For operations on one set or on multiple sets, we have the set complement, set difference, symmetric set difference, union, intersection, and Cartesian product. I have not implemented symmetric set difference or Cartesian product because they were not relevant to the particular application I was writing this set library for. However, I have implemented the other four. It is worth noting that union, intersection, and complement are equivalent to the AND, OR, and NOT operations from propositional logic. This makes them easily amenable to the Boolean operations provided by C.


// Calculates the set difference between src and dst
// and stores the result in dst
void set_differenceset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] &= ~(src->bits[i]);
}

// Calculates the set complement of dst and stores it
// in dst
void set_complementset_t *dst ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] = ~(dst->bits[i]);
}

// Calculates the set union of src and dst and stores
// the result in dst
void set_unionset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] |= src->bits[i];
}

// Calculates the set intersection of src and dst and
// stores the result in dst
void set_intersectionset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] &= src->bits[i];
}

Note that these functions all have void as their return type. They operate on one or two sets and store the result in the first operand. This is similar to how I implemented the arithmetic operations for my decimal floating point library, and the reason for this is to discourage haphazard use of set operations that may result in objects being created without any symbols corresponding to their pointers, which would result in these objects becoming inaccessible and thus unable to be freed when necessary. This is a classical problem in low-level programming known as the lost object problem. Of course often we want to build complex expressions with these operations, so to accomplish this, I have created versions of the above functions that implicitly free their source operands when used:


// Returns the set difference between the two operands,
// freeing the second operand in the process
set_t *set_difference_fset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] &= ~(src->bits[i]);
        free( src );    
        return dst;
}

// Set complement function that returns the result
set_t *set_complement_fset_t *dst ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] = ~(dst->bits[i]);
        return dst;
}

// Returns the set union of the two operands,
// freeing the second operand in the process
set_t *set_union_fset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] |= src->bits[i];
        free( src );
        return dst;
}

// Calculates the set intersection of the two operands,
// freeing the second operand in the process
set_t *set_intersection_fset_t *dst, set_t *src ){
        int i;
        for( i = 0; i < dst->size; i++ )
                dst->bits[i] &= src->bits[i];
        free( src );
        return dst;
}

These functions make it possible to build complex expressions like set_difference_f( set_union_f( set1, set2 ), set_intersection_f( set1_cpy, set2_cpy ) ) for the symmetric set difference. Note that it is necessary to operate on copies of any named arguments when we do this unless we’re not planning to use them again. To create a copy of a set, you can create a new set_t type with empty_set() or set_universe() and copy over the data from the original set with memcpy().

In addition to all of this I have implemented a couple of I/O functions for reading and writing set types directly from/to hexadecimal strings. These functions were written primarily for debugging purposes, so you can check the value of a set at any time as the program is running. They have limited practical use in programming outside of that.


// Reads a set in hexadecimal from a string
set_t *read_setchar *src ){
        set_t *dst;
        int len;
        int i;
        char buf[3];
        buf[2] = '\0'// Makes sure sscanf() doesn't run off the end
        len = strlen( src );
        dst = (set_t *) mallocsizeofset_t ) );
        dst->size = len / 2;
        dst->bits = (uint8_t *) malloc( len / 2 );
        for( i = 0; i < len; i += 2 ){
                strncpy( buf, src+i, 2 );
                dst->bits[(len-i)/2-1] = (uint8_tstrtol( buf, NULL16 );
        }
        return dst;
}

// Writes a set in hexadecimal to a string
char *write_setset_t *src ){
        char *dst;
        int i;
        int len;
        int hex;
        len = src->size * 2;
        dst = (char *) calloc( len + 11 );
        for( i = 0; i < len; i += 2 ){
                hex = src->bits[(len-i)/2-1];
                snprintf( dst+i, 3"%s%x", (hex<0x10)?"0":"", hex );
        }
        return dst;
}

// Free version of write_set_f in case
// src is an immediate operand
char *write_set_fset_t *src ){
        char *dst;
        dst = write_set( src );
        free( src );
        return dst;
}

So yeah, that’s basically my entire set library. I’ve actually used it as a C module in my larger CSV library called libcsv, albeit with different names for the set type and set functions. Still, it serves its purpose as a standalone set theoretical library as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s