It should be as quick as possible to loop over the whole array in a random order, and this is very heavy on RAM bandwidth, so when having more than a few threads doing that at the same time for different arrays, currently the whole RAM bandwidth is quickly saturated.I'm asking since it feels very inefficient to have such a big array (10 MB) when it's actually known that almost all values, apart from 5%, will be either 0 or 1.In most cases the appropriate way to store data is to store the most natural representation.In your case, this is the one you've gone for: a byte for a number between 0 and 255.Also, you'd have to keep a table of "starting points", so you can work your way to the relevant place reasonably quickly.I know from a long time back that some big databases are just a large table in RAM (telephone exchange subscriber data in this example), and one of the problems there is that caches and page-table optimisations in the processor is pretty useless.So when 95% of all values in the array would only actually need 1 bit instead of 8 bit, this would reduce memory usage by almost an order of magnitude.It feels like there has to be a more memory efficient solution that would greatly reduce RAM bandwidth required for this, and as a result also be significantly quicker for random access.

The problem with most common compression algorithms is that they are based on unpacking sequences, so you can't random access them.

Is the table data completely random, or are there sequences of 0 then sequences of 1, with a scattering of other values?

Run length encoding would work well if you have reasonably long sequences of 0 and 1, but won't work if you have "checkerboard of 0/1".

I compiled the code above with g 5.4.0 (, plus some warnings) on Ubuntu 16.04, and ran it on some machines; most of them are running Ubuntu 16.04, some some older Linux, some some newer Linux.

I don't think the OS should be relevant at all in this case.

