Bob's page of mildly useful but still pretty neat code snippets

Work In Progress


Table of Contents

  1. Introduction
  2. Credit and Disclaimer
  3. Absolute value of an integer
  4. Next Power of 2
  5. Most Significant bit
  6. Round to next power-of-2 if not a power-of-2
  7. Checking that a number is a power-of-2
  8. Swapping two values
  9. Adding with saturation
  10. Subtracting with saturation
  11. Faster rounding
  12. Clamping integers
  13. Maximum and minimum
  14. Division by constant powers-of-2
  15. Average of two integers
  16. Links
  17. Changelog

Introduction

Credit and Disclaimer

I do not pretend to have invented any of these tricks. Some of them are well known in the industry, others are less known. All of them are considered "general knowledge". There are other sites that contain similar code snippets (see the Links section).

These code snippets should work on the x86. No guaranty is made on them working on other platforms. The C standard does not guaranty shifts to be signed or unsigned, so most of these code snippets may not work at all depending on the compiler and platform.

As always, keep a straight (and portable) C version of the algorithms, just in case you need to port your software

Most of these tricks have the feature of eliminating conditional branches, which tend to be slow on modern CPUs. They also leave room for SIMD-type optimizations, where operations on multiple quantities can be performed at the same time.

A lot of there tricks can also be almost directly converted to electronic circuits. However, they are not optimal for electronics.

Absolute value of an integer

To negate a 2's compliment number, toggle all the bits (that is, turn all 0 into 1 and all 1 into 0), then add 1 to the value you get. So how do we turn this into a function that can compute the absolute value? Well, if we can make the toggle and add ops dependent on the sign, then we'll be able to negate negative values, but keep positive values. The algorithm for this is:

int x;  // Value to obtain the absolute value of

int mask = x >> 31;
x = (x ^ mask) - mask;

The MSB contains the sign, so when you do the arithmetic (signed) shift to the right, you'll either get 0xFFFFFFFF if the number was negative, or 0 if the number was positive. XORing this mask with the original number will either toggle all the bits if the number was originally negative, or not do anything if the number was positive. The last step is to add 1 if the number was negative, or zero otherwise. We could use (mask & 1), however, we notice that if mask is 0xFFFFFFFF, then mask is also -1 in 2's complement notation. So to add 1, we just subtract -1!

Note that this code assumes signed right-shifts, which aren't guaranteed in C.

The assembly version is just as simple:

 //Assuming %eax has the input,
 // 2.5 cycles
 cdq               // Expand sign of %eax into all of %edx
 xorl %edx, %eax;
 subl %edx, %eax;

If you don't want to/can't use the cdq instruction, the code can be rewritten as:

 movl %eax, %ebx;
 shrl $31, %eax;
 xorl %eax, %ebx;
 subl %eax, %ebx;
 movl %ebx, %eax;

This code snippet uses 3.5 cycles to execute. If you don't care about the destination register being %eax, you can always remove that last movl instruction.

As long as we're using assembly, we can always use conditional instructions to perform the operation:

 movl %eax, %ebx;
 negl %eax;
 cmpl $0, %ebx;
 cmovge %eax, %ebx;

This version will only run on the Pentium Pro and up, and is also faster than the above versions. However, it is unusable with MMX or SSE registers.

Next Power of 2

It's often necessary to obtain the next power-of-2 of some integer, for example to scale non-power-of-2 textures. The most obvious way of doing this is to go through the C library's math functions.

int y = 1 << (int)(ceil(log(x) / log(2)));

This code is slow for many reasons. It uses the natural logarithm (base e), which is anything but natural to a binary computers. This also means that it needs to be rescaled to base-2 (this involves a multiplication at best). Next, we need to take the ceiling of the result, and then convert the floating point result to an integer. The CPU isn't very good at these.

We'll propose a much better algorithm. Notice that power-of-2 numbers are of the form: 10000..., that is, they start with a 1 and are followed by some 0s. Non-power-of-2 will have other 1s right of left-most 1.

So, how do we turn a number like 100101 into 1000000?
First, consider the numbers of the form (2^n) - 1. They're all of the form 111111... If we add 1 to them, they turn into powers-of-2. So, if we can turn our number into the form 111111.... we'd just need to add 1 to it to turn it into a power-of-2. Well, notice that if we shift-right and then OR the result with the previous number, we get more (or equal) 1s to the right of the first 1, but the left of the first 1 stays at 0. Doing this repeatedly will end up turning our number to 11111..., which is what we wanted!

Here's the algorithm:

int x;

x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x++;

Now x is a power-of-2 greater then the input. You can check yourself that it also works with numbers that are already powers-of-2; for example, 2 will become 4 after passing through this algorithm. You can adapt it to work with 64-bit numbers by adding an extra step before the increment operation:

x |= (x >> 32);

This code also has the property of turning all negative numbers to 0, and 0 to 1.

Most Significant bit

A small modification of the last algorithm allows us to obtain the mask for the most significant bit of an integer: we just need to shift the result right by one!

Powers-of-2 are increased to the next power-of-two, then brought back to their original number by the right-shift. Since the number is a power-of-2, then its value is already the MSB mask.

For non-powers-of-2 however, the number is increased to the next power-of-two, thus shifting the most significant bit left by one. Since the number becomes a power-of-2 by the algorithm, shifting it to the right by one will yield the MSB mask.

Negative numbers are brought down to zero.

Round to next power-of-2 if not a power-of-2

Sometimes, you only want to move up to the next power-of-2 only if the original number was NOT a power-of-2 to start with. That is, 2 will become 2, and 3 will become 4.

Effectively, we want to get pow(2, ceil(log2(x)))

Well, consider that if we subtract 1 from the input before running it through the algorithm to get the next power-of-two, then we turn powers-of-2 into (2^n) - 1, which will then be turned back into 2^n (try it!). If the number isn't a power-of-2 however, subtracting 1 will never give you 2^n - 1, so that number will be turned into the next power-of-2 just as before.

The algorithm becomes:

int x;

x--;
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x++;

There is a small catch however: 0 will remain 0. Negative numbers will still map to 0. INT_MIN (0x80000000) will retain its value.

Checking that a number is a power-of-2

Powers-of-2 have the property of being in the form 10000..., that is, a single 1 followed by only 0s. Notice that if we subtract 1 from the number, then we end up with 1111..., or a 1 followed by only 1s.

If we AND those two numbers, we get 0 since there are no 1s that are in common position.

   10000000
 & 01111111
 ----------
   00000000

However, if the number is not a power-of-2, then subtracting 1 from it will NOT move the left-most 1! This means that if you AND the original number with its decrement, you'll get a non-0 number since at the left-most 1s are in common.

Notice that 0 will be taken as being a power-of-2, since (0 & (-1) == 0)

So there you go: You can now use the following code in your program:

if (x & (x - 1)) {
  // Not a power-of-2
}

The assembly version is quite simple too:

movl %ebx, %eax;
decl %ebx;
andl %eax, %ebx;
jnz label;
An alterative is to AND the value with it's negative. Because of the properties of 2's complement numbers, the result of this operation will be the absolute value of x if and only if x is a power-of-2. The code then becomes:
if ((x & -x) == x) {
   // positive power-of-2
}

Notive that 0 is still considered as a power-of-2 by this code

For negative powers-of-2, you can use this code instead:

if (((-x) >> 1) & x) {
  // Not a negative power-of-2
}

Or this code:

if ((x & -x) == -x) {
	// Negative power-of-2
}

It works on a similar principle as the one for positive integers.

A combination of both tests can be made if you first compute the absolute value of x, then use the test for positive integers.

Swapping two values

The traditional algorithm for doing a swap is to create a third object to hold a temporary value, then put one of the numbers we'd like to swap in that temporary space, put the other number in the first one's, then move the number from the temporary space to the second number.

int x, y;

int z = x;
x = y;
y = z;

The assembly version takes an extra register, but executes in 1.5 cycles.

movl %eax, %ecx;
movl %ebx, %eax;
movl %ecx, %ebx;

If the speed of this algorithm can be sacrificed, then we can save one register by using the XOR operation. In fact, saving that extra register might prove to increase the speed of the code around it.

int x, y;

x = x ^ y;
y = x ^ y;
x = x ^ y;

And there we have it, now x and y are swapped, regardless of their values!

The assembly code becomes:

xorl %eax, %ebx;
xorl %ebx, %eax;
xorl %eax, %ebx;

This takes 2.5 cycles, but can be interleaved with other operations and reduced to an effective 1.5 cycles.

Please note that you should NOT implement this as a macro, as the macro will have flaw: if you try to swap a variable with itself, the XOR code will zero out everything.

Some people have recommended using stack push/pops to achieve the same thing:

push %eax;
push %ebx;
pop %eax;
pop %ebx;

Notice we pop them in the reverse order. On the PPlain and PMMX, this code takes only two cycles. However, this method takes 4 cycles to complete on the PPro and later CPUs, with an additional 4 cycle latency. If you lack the additional register, the XOR method is superior.

nother method, which is only available through an assembler (or a really good optimizing compiler is the xchg opcode on the x86. It can swap two registers in just two cycles (i686). However, you should not typically use xchg on memory, because of the implicit lock prefix, which will greatly slow down the code.

Adding with saturation

Have you ever seen this piece of code in your program?

z = x + y;
if (z > 255)
   z = 255;

This is an add-with-saturation; we add two numbers, then cap the result at a certain value. This code is going to be really slow in inner loops, simply because of the branching involved. Well, we'll show you how to achieve this without branching at all! However, we'll only be able to cap on values that are powers-of-2 minus one.

First thing we do is add the numbers; We then look at the overflow bit, and base all our computations on that. If there is no overflow, then we mustn't do any operation.

z = x + y;
int overflow = z & 256;
int mask = overflow - (overflow >> 7);
z ^= overflow;
z |= mask;

This works only if we have a free bit for the overflow detection. If not, then we'll need to be more creative.

Let's assume that we have 4 bytes packed into one unsigned int. We'd like to add those of those packed byte vectors together, with saturation at 255 for each value independently.

First, we'll need to do the addition on all bits except for the most significant ones. Then, we'll manually compute the carries, and determine overflow ourselves.

/* TODO */

Subtract with saturation

/* TODO */

Faster rounding

Matthew Leverton proposed this trick to speed up rounding of integers to some multiple of another (arbitrary) integer.

This piece of code rounds an integer x to the nearest lower multiple of y.

x = (int)(x / y) * y

Surpisingly, this code is faster than computing a modulo and subtracting it.

It can be slightly modified to round to the nearest multiple of y (for positive y):

x = (int)((x + y/2) / y) * y;

Or even to the nearest larger multiple of y (for positive y):

x = (int)((x + y - 1) / y) * y;

Clamping integers

Ben Davis brought up to my attention this neat way of clamping integer values around zero. That is, if a value is lesser than zero, then set it to zero, and otherwise leave it unchanged. He also had an algorithm to the reverse: if a number was greater than zero then set that number to zero.

char x;

x &= x >> 7;    // clamp to [-128..0]
x &= (~x) >> 7; // clamp to [0..127]

This can easily be extended to work with 32-bit integers instead of 8-bit ones simply by using 31 instead of 7 as the shift value. This code assumes signed right-shifts, which aren't guaranteed in C.

The assembly version is simply:

// Assume %al has the value to clip
movb %al, %ah;
srab  $7, %al;
andb %ah, %al;

Or, for 32-bit integers:

// Assume %eax has the value to clip
cdq;                 // Copy sign of %eax into all bits of %edx
andl %edx, %eax;

This trick can be extended to clamp on an (almost) arbitrary range. This piece of code, for example, will clamp the variable x to the range [a..b]. That is, if x is smaller than a, then it gets set to a. Similarly, if it's larger than b, then it is set to b. However, if x happens to be in between a and b, then it remains unchanged.

Of course, if you set b to be smaller or equal to a, then x gets clipped to b

int x, a, b;

x -= a;
x &= (~x) >> 31;
x += a;
x -= b;
x &= x >> 31;
x += b;

Which translates into:

// Assume %eax has the value to clamp, and %ecx, %ebx contain a and b respectively
// 8 cycles
subl %ecx, %eax; // Subtract a
subl %ecx, %ebx; // Compute b - a
cdq;             // Expand the sign of %eax into %edx
not %edx;        // Compute first mask
andl %edx, %eax; // Apply first mask
subl %ebx, %eax; // Subtract b - a (or add a - b)
addl %ecx, %ebx; // Recover b
cdq;             // Expand the sign of %eax into %edx
andl %edx, %eax; // Apply second mask
addl %ebx, %eax; // Add b

strong warning however: You must make sure that your values are not too far appart to cause an overflow. Consider the case where you'd like to bound INT_MAX to the range [INT_MIN..0]. This can be handled by making all values require 1-bit less than that of the data type used to store them.

If you know (or are not absolutely sure) that your values will result in an overflow, then using conditional ops would be a better alternative

Maximum and Minimum

Similarly, we can use the above trick to compute MAX(a, b) and MIN(a, b):

int a, b;

max:      // Will put MAX(a,b) into a
  a -= b;
  a &= (~a) >> 31;
  a += b;

And:

int a, b;

min:     // Will put MIN(a,b) into a
  a -= b;
  a &= a >> 31;
  a += b;

Again, the C code depends on having right-shifts be signed, which aren't guaranteed in C, and again, you must take care that your values do not overflow.

If you know (or are not absolutely sure) that your values will result in an overflow, then using conditional ops would be a better alternative

Division by constant powers-of-2

You may think that by shifting right you are always dividing by powers of 2. This, however, does not hold for negative numbers. Division and shifting round differently. In fact, if you shift right, you'll get one less than if you divided by the corresponding amount, provided the dividend was not a negative power-of-2. This is why the compiler will, unlike for multiplication, insert division instructions even though you are dividing by some power-of-2.

We can however compensate for the shift error by adding the missing 1 if the original number was negative. Note that we need to add first, and shift after.

int x;

if (x < 0)
  x++;
x >>= k;     // k is some integer constant.

Becomes;

int x, y;

y = ((unsigned int)x) >> 31;
x += y;
x >>= k;

We simply grab the sign bit, shift it to the least significant position, then add it to the dividend before the shift. That's all there is to it!

//Assume %eax contains the value to divide
movl %eax, %ebx;     // Make a copy of the dividend
slrl $31, %eax;      // Get the sign into the LSB
addl %ebx, %eax;     // Add it to the dividend
sarl $k, %eax;       // Do the shift by a constant

Average of two integers

You might think that computing the average of two integers is as simple as adding them, and dividing the sum in half. It's not that simple, unfortunately. There are problems when dealing with large values, such that the sum will overflow. If the sum overflows then the result of the average will be incorrect, even though the result will be in the normal integer range.

Take, for example, the average of 2000000000 and 1000000000. We want the result to be 1500000000. However, on a 32-bit signed machine (such as the Pentium family of CPUs), the result will be -852516352. Not quite what you'd expect.

We need to find a way to average the numbers without overflowing. To acheive that, let's apply a little boolean logic:

Consider that a + b is the same as (a & b) + (a | b), which is the same as (a & b) + ((a ^ b) + (a & b)), which in turn becomes 2 * (a & b) + (a ^ b).

So, by dividing by two, we get: (a & b) + ((a ^ b) >> 1)

Links

Changelog