I’ve been tuning FusionCapital Summit systems for some 15 years now, and in my experience the second most popular cause of performance problems are lists in Summit : sENTITYLIST and its friends.1 To understand why we get problems, we need to understand the implementation of the Summit List structure:
The traditional structure comes with two fields:
struct sSOMELIST {
sLIST list;
void * data;
};
This structure comes from the days when Summit was written in C, not C++, and the Standard library did not exist. You would assume from the name of the structure (Summit List), and that (void*) data pointer to the first element, that we are looking at a Singly Linked List. This assumption is wrong. This is not a list 2, it’s an Array. Like any other array, it is a fixed number of elements of a certain size. Exactly how many elements are in the array, and the size of each element is stored in the sLIST structure:
sLIST.ItemsAlloc -> the number of elements in the array
sLIST.StructSize -> the size of an array element
The underlying array is created using the sInitList() function, which will perform an initial array allocation of 20 elements for us. We can insert into the Array with sInsertListItem(), and remove items with sDeleteListItem(). Since our array has a set size, we will need to do something if we want to store more than List.ItemsAlloc records in the array. You don’t have this issue with ‘proper’ lists – you allocate a new element from the heap, and single or double link it into the existing data.
The sManageList() function exists to manage the size of our data Array, and is a multi-mode function that allows us to extend or shrink our underlying array. The most common mode is sLIST_MANAGE, and here’s where the problems arrive.
sLIST_MANAGE makes use of three more fields in the sLIST structure
sLIST.ItemsUsed -> How many elements we are using
sLIST.AllocUnits -> How many new items to allocate at a time
sLIST.MinSpare -> How many items need to be available after our call.
sLIST_MANAGE works out how many free elements we have ( List.ItemsAlloc - List.ItemsUsed ). If that is less than the number of spare items we need to keep around ( List.MinSpare ) we need to allocate one or more List.AllocUnits to grow the array. Arrays need contiguous memory though, unlike lists, and since the array was created on the heap, you can’t just extend the existing array as there’s no guarantee that the memory space immediately after the existing data is free. We need to allocate a whole new array of the correct size from the heap, copy over the data from our existing array, repoint our void* link, and free the old array.
If your list is small, this allocate-copy-repoint-drop operation is trivial. If your list is big, however, this is going to get ugly.
Lets take an example, where I have created a Summit list called cRESULTLIST which holds some cRESULT_STRUCT elements.
AllocUnits = 10
MinSpare = 5
StructSize = 50
ItemsAlloc = 20
In my code, I have a loop with
cRESULTLIST l;
// Initialise list...
while ( /* more calculations */ ) {
cRESULT_STRUCT r; // 50 byte structure
// Do some calculations and store them in r
sInsertListItem( &l, l.List.ItemsUsed, r );
sManageList( &l, sLIST_MANAGE, 0 );
}
With the list settings I have, I’m going to reallocate and copy my data array when l.List.ItemsUsed hits 15, 25, 35 and every 10 items after that. Allocating memory should be pretty quick. Copying 750 bytes should also be pretty quick, but each subsequent copy grows by 500 bytes. Assuming our CPU can memcpy() at 10Gb/s ( which is a reasonable ballpark figure ), we have
| Number Of Items | Final Array Size | Total Copied Data | memcpy() Time |
| 10 | 500 bytes | 0 | 0 |
| 100 | 4.9 Kb | 18.5Kb | 1.8 ns |
| 1,000 | 48Kb | 2.3Mb | 232 ns |
| 10,000 | 488Kb | 237Mb | 23 ms |
| 100,000 | 4.76Mb | 23Gb | 2.38 s |
| 1,000,000 | 47Mb | 2.27Tb | 3m 58s |
| 10,000,000 | 476Mb | 227Tb | 6h 37m |
| 100,000,000 | 4.65Gb | 22Pb | 27.5 days |
| 1,000,000,000 | 46Gb | 2220Pb | > 7 years |
The big issue is that these timings expand exponentially, not linearly. Realistically, you aren’t going to run into problems even with tens of thousands of items, but somewhere around the point where you are trying to process a few million items in one go, this is going to bite. If you don’t run the 64 bit version of Summit, you are probably going to run out of memory somewhere in the 20-40 million item range, but you’ll still have to wait a few hours for it to blow up. If your structure in the list data is larger than 50 bytes, things will get worse much faster – a 2kb record will be 40 times slower than the times shown above — you’ll notice the effect at 100,000 items.
I’ve seen this issue quite a few times in the wild, and it’s often not picked up in testing because the test pack doesn’t include the huge data set in production that triggers the problem.
The solution to this problem is to try and anticipate the size of the list you will need, and allocate it up front using sManageList() in sLIST_ABSOLUTE or sLIST_RELATIVE mode, rather than trying to manage it as you go along. Allocating memory is quick, especially if you only have to do it once. If you can’t get an exact number for your array size, make a best guess, and maybe add some padding.
You can also consider increasing the List.AllocUnits field to a higher value before you start, so that you allocate maybe 1,000 or 10,000 or some larger number each time. This will reduce the number of times you need to reallocate-copy-free and save some time.
One final option: consider using a different data structure — all C++ compilers now come with the Standard Library Containers – maybe a std::list or std::map structure is a better option for what you want to do.
One final point, do remember to sManageList( .., sLIST_FREE, 0 ) any lists that you allocate. That (void*) data isn’t going to free itself when the list structure falls off the stack. Most of the time, your call will be at the end of the program and it probably won’t matter, but one day you’ll be allocating a list inside some STP server code or entity validation routine, and if you don’t free the data — you’ve got a memory leak!
(1) For those that don’t know already, the number one cause of problems is database queries, but we will look at those another time.
(2) If we are going to be pendantic, use the formal Computer Science definition, and squint a bit, it could be a list. In practice, everyone writing a non-formal description ( eg. Authors of data structure text books, C++ standard library programmers ) assumes that a list is a Single or Double linked List structure, or perhaps some sort of tree if you want to be able to nest in it.