Boyer Moore Voting Algorithm — In plain English

Satya Deep Maheshwari
7 min readJul 31, 2021
Photo by Jennifer Griffin on Unsplash

We’d be looking at ways of solving the problem of finding the majority item in a list in the most efficient manner. Formally, the problem is as follows — Given a list of ‘n’ items of ‘k’ different types, find out if there’s an item type in majority i.e. it occurs more than n/2 times in the list.

In order to solve this, we can take the naive approach and count the number of occurrences of each item type. Once we know that, we can tell whether there’s a majority of a particular type of item. Let’s look at a concrete example for this. Consider a battlefield with multiple armies fighting each other. Each army has a fixed number of soldiers on the battleground. A soldier can kill at most one soldier of a different army and in the process, also gets killed himself. Let’s visualize this. Below we have a battleground with 4 armies standing — blue, green, yellow and red.

Now our task is to predict which army can win this battle. The army with the last standing soldiers remaining is the winner. Quite instinctively we can say that if an army has more than half the total number of soldiers, they are going to win.

Let’s take the naive approach first to deduce the winner. What we could do is simply count the number of soldiers of each color. If a particular army has more than half of the total number of soldiers, we can declare it the winner.

Naive approach — Tabulating Data

Here’s a tabular list of the number of soldiers in each army —

As we can see, we have 22 soldiers in total. And the ‘yellow’ army has more than half the soldiers. So we can declare that the ‘yellow’ army is going to be the winner in all circumstances.

The naive approach for finding the winner works. But requires extra space for tabulating the number of soldiers per army. Algorithmically speaking, if there are ‘K’ different armies on the battleground, you need to maintain a table with at-most ‘K’ entries and hence it has a space requirement proportional to the value of ‘K’ i.e. O(K). And you’d need to go over the complete list of all the soldiers once to tabulate this data. So if there are ’n’, soldiers, this has a computational time requirement proportional to the value of ‘n’ i.e. O(n)

Another Naive approach — Sorting Data

Let’s see if we can overcome this tabulation work. What if we arranged all the soldiers according to their army color before doing the counting?

Now we can start from the beginning of this list and maintain a counter of how many soldiers of an army we have seen so far. So this counter would proceed as follows:

count=1 //1st Green soldier
count=2 //2nd Green soldier
count=3 //3d Green soldier
count=1 //Reset count. 1st Yellow soldier
count=2 //2nd Yellow soldier
...
...
count=12 //12th Yellow soldier. Count indicates majority. Stop here.

In the above process, we go on incrementing the counter till we keep getting soldiers of the same army. And as soon as we get one of a different army, we reset the counter and start counting again. Since all the soldiers of the army are placed together (as the list is pre-sorted), it is guaranteed that if there’s a majority (count > n/2), we are going to find that and as soon as that happens, we can stop further counting.

For determining the computational time complexity of this approach, we need to factor in the time taken to sort the list of soldiers. If there are ‘n’ soldiers, we can use an efficient sorting algorithm and do it in O(n.log(n)) time. And then going through the list itself is O(n) in time complexity which gives us an overall upper bound of O(n.log(n)). This is worse than the first approach (tabulating data) that we saw earlier. But in terms of space requirements, it is better as we just need a single counter here to keep track of the count. Hence the space complexity is a constant O(1).

Up to now, we saw a trade-off between the space and time requirements for solving this problem. Is there a way of getting the best of both (space and time) worlds? And the answer is ‘yes’. This can be done using the clever ‘Boyer Moore’s Voting Algorithm’ to find the majority element. Let’s see how we can use it.

Clever Approach — Boyer Moore’s Voting Algorithm

Our metaphor of a battleground and soldiers belonging to different armies to understand this approach is going to turn handy here. As we know that a soldier of a particular army can kill the soldier of another army and in the process, also gets killed himself. So let’s say the goal of each army is to get control of a post on the battlefield.

Let’s call this process as the Voting Phase and see it in action below.

Soldiers of various armies arrive one-by-one on the battleground with the intent of capturing the post. If there’s a soldier of another army already there on the post, the incoming soldier kills it and dies in the process. And if there’s a soldier of the same army there, it just joins him to strengthen their control on the post.

At the end of this process, the army controlling the post is the majority army if a majority existed in the first place. As seen in the illustration above, it's the yellow army that has control of the post in the end which indicates that it may be the majority army if there’s one. We’ll now see how we can confirm this observation.

Verification Phase — When no one’s in majority

The example that we saw above had an army (yellow) in majority i.e. more than half of all the soldiers belonged to this army. In such cases, Boyer Moore’s Voting algorithm correctly finds out the majority army. But what if there’s no army in majority. Something as below:

Red=3 Green=3 Yellow=3

As can be seen, now we have 3 soldiers of each army and there’s no army in complete majority i.e. has more than half the total number of soldiers. Let’s see what happens now when the soldiers fight to gain control of the post.

The fight rules are the same as we saw before i.e. if there’s a soldier of another army already there on the post, the incoming soldier kills it and dies in the process. And if there’s a soldier of the same army there, it just joins him to strengthen their control on the post.

As can be seen at the end of the process, we have a green soldier controlling the post. But was the green army in majority — No! So to rule out such situations, the voting algorithm requires a verification phase to check whether the result is indeed in majority. And for this, we need to go through our list of soldiers and count the number of the soldiers belonging to the green army which was indicated to be the one in majority. When we do this, we see that their count is just 3 out of the total strength of 9 soldiers, which isn’t a majority. At this point we can confidently say that this list of soldiers has no particular army in majority because if there were one, we would have found it.

Space and Time requirements of Boyer Moore’s Voting Algorithm

As we saw, we have to go through the list of soldiers twice. First during the voting phase and then again during the verifications phase. So if there are ‘n’, soldiers, this has a computational time requirement proportional to the value of ’n’ i.e. O(n) * 2 which is the same as O(n).

In terms of space, we do not require much. In the voting phase, just a counter of how many soldiers of an army are manning the post suffices. And during the validation phase, just a counter for counting the number of soldiers of the army indicated by the voting phase is enough. Thus we just need a single location to store this information which translates to a constant space requirement of O(1).

Comparative analysis of the approaches

To recap, let’s do a comparative analysis of the various approaches we discussed.

k=Number of distinct items n=Total number of items

So as we see above, Boyer Moore’s Voting algorithm is a clear winner overall as it has optimal space and time requirements.

If algorithms pique your interest, you may also want to read some of my other articles on Kahn’s algorithm for topological sorting and Kadane’s Algorithm. Thanks!

--

--