My *nix world

Finding the lexicographic order of permutations

Given a finite alphabet (e.g. {A,B,C,...,Z}) how many different words can you create from these letters knowing that each letter can be used only once?

This is just a typical problem of combinatorics and the answer it's quite simple: n! where n=26 for the Latin alphabet but might be different for your language.

Ok, fair enough. Let's suppose now that we sort these words lexicographically (i.e. exactly as they are in your dictionary) such that the first word is ABCDE...Z, the second is ACBDE...Z, and so forth.

Now, what is the order of the following word YIZNWBVAXTLSCJUGKHODMEPFQR ? That is the real question. This is what we call it lexicographic order of permutations.

I really don't know if there is a formal mathematical solution for this problem but if you understand the concept of recursion and algorithm then perhaps you will find even a not-so-formal mathematical solution.

Example

Let's assume that our alphabet contains the following letters {A,F,K,N,O,T}. So it contains n=6 elements.

  • How many distinct words can we create from these letters knowing that each letter can be used only once? n!=6!=720 distinct words.
  • What is the order of the word ANKFOT that was created by using these letters? What word exists at the index 711 ? These are the real questions!

The simplified model

I couldn't see directly the solution so I've tried to come up with a simple model that will also work generally. To find a solution to this problem I've started with a simple alphabet like {A,B,C,D}. So my ideal alphabet would contain only n=4 letters which means that we can create 4! different words, i.e. 24.

Note: I choose only four letters because for simplicity (5 or more letters complicate the problem). Why not two or three letters? - because I wanted to avoid those situations when the problem becomes to simple such that you cannot see anymore the general solution. So four letters sounded just ideal to me.

Let's generate these 24 distinct words:

starting with A ABCD ABDC ACBD ACDB ADBC ADCB
starting with B BACD BADC BCAD BCDA BDAC BDCA
starting with C CABD CADB CBAD CBDA CDAB CDBA
starting with D DABC DACB DBAC DBCA DCAB DCBA

Let's suppose we want to determine the position of the word BADC.

We can see that we have 6 words starting with A, the same number of words starting with B, and so forth. In general we would have k=\frac{n!}{n}={(n-1)!} words on each row where n is the number of letters in alphabet.

So if the word would start with B its position would between 7 and 12. If the word would start with C its position would be between 13 and 18, and so on. In general the word's position p(word) would be at the position given by the order of the first letter multiplied by the k defined above + the position in the current row (E1).

The first thing we note is that we need to assign an index to each letter in our alphabet. To make our algorithm simpler we would assume that the alphabet is 0-indexed, i.e. the first letter has the index 0. So, For our simplified model the index of each letter are : A=0, B=1, C=2 and D=3. (E2)

The second thing we note is that at this step we still need to compute the position at the current row. But wait a minute! We could imagine that we have a new problem with a new alphabet and a new word, namely a problem where the word just become one letter short likewise its alphabet. So we have to compute another p(new word) and to add this result to the previous one. We do this systematically(recursively) while the word's length is greater than 0.

Step 1: According with the statement (E1) above the position of the word BADC would be at the position of B (which is 1) multiplied by the k=(4-1)!=6, i.e. the position should be p(BADC)=1\times 6=6.

Step 2: Let's imagine now that the new word is BADC and so is the new alphabet {A,C,D} where A=0, C=1, D=2 (note that we have to re-index the orders according to their 0-index position in the alphabet). So the position of this word ADC would be p(ADC)=0\times (3-1)!=0

Step 3: Let's imagine now that the new word is BADC and so is the new alphabet {C,D} where C=0, D=1 (note that we have to re-index the orders according to their 0-index position in the alphabet). So the position of this word DC would be p(DC)=1\times (2-1)!=1

Step 4: Let's imagine now that the new word is BADC and so is the new alphabet {C} where C=0 (note that we have to re-index the orders according to their 0-index position in the alphabet). So the position of this word C would be p(C)=0\times (1-1)!=0

Step 5: Let's imagine now that the new word is BADC which is the word of length zero so we stop here!

Now let's add up all these sums from all steps: 6+0+1+0=7. Let's check if the word is at position 7. We count from top to bottom and left to right (where the first word has by convention order zero) and we see that it works!

Another related question was "What word exists at the index k?". Well, we should use the same algorithm but backwards. (E3)

Given the alphabet {A,B,C,D} and its order: A=0,B=1,C=2,D=3 what is the word at the index k=7?

Step 1: divide the index k=7 at the size of the alphabet minus one, i.e.\frac{7}{(4-1)!} which gives the quotient q=1 and the rest r=1. Because the q=1 it means that the first letter of the searched word has the index 1, i.e. it is B. The other letters are {A,B,C,D} which is our newly alphabet. The new index became k=r=1.

Step 2: divide the index k=1 at the size of the new alphabet minus one, i.e.\frac{1}{(3-1)!} which gives the quotient q=0 and the rest r=1. Because the q=0 it means that the first letter of the searched word has the index 0 (in the new alphabet), i.e. it is A. The other letters are {A,B,C,D} which is our newly alphabet. The new index become k=r=1.

Step 3: divide the index k=1 at the size of the new alphabet minus one, i.e.\frac{1}{(2-1)!} which gives the quotient q=1 and the rest r=0. Because the q=1 it means that the first letter of the searched word has the index 1 (in the new alphabet), i.e. it is D. The other letters are {A,B,C,D} which is our newly alphabet. The new index become k=r=0.

Step 4: divide the index k=0 at the size of the new alphabet minus one, i.e.\frac{0}{(1-1)!} which gives the quotient q=0 and the rest r=0. Because the q=0 it means that the first letter of the searched word has the index 0 (in the new alphabet), i.e. it is C. The other letters are {} and thus our algorithm stops here!

So the searched word is BADC.

Back to our example
Let’s now try our original example, that one with the word ANKFOT.Our alphabet is {A,F,K,N,O,T} with the following indexes: A=0, F=1, K=2, N=3, O=4, T=5.

Note that I'm not going to generate a 6x5! matrix to prove you that its position will be the one I compute but you can do it if you like.

Step 1: According with the statement (E1) above the position of the word ANKFOT would be at the position of A (which is 0) multiplied by the k=(6-1)!=120, i.e. the position should be p(ANKFOT)=0\times 120=0.

Step 2: Let's imagine now that the new word is ANKFOT and so is the new alphabet {F,K,N,O,T} where F=0, K=1, N=2, O=3, T=4. So the position of this word NKFOT would be p(NKFOT)=2\times (5-1)!=2 \times 24=48

Step 3: Let's imagine now that the new word is ANKFOT and so is the new alphabet {F,K,O,T} where F=0, K=1, O=2, T=3. So the position of this word KFOT would be p(KFOT)=1\times (4-1)!=6

Step 4: Let's imagine now that the new word is ANKFOT and so is the new alphabet {F,O,T} where F=0,O=1, T=2. So the position of this word FOT would be p(FOT)=0\times (3-1)!=0

Step 5: Let's imagine now that the new word is ANKFOT and so is the new alphabet {O,T} where O=0, T=1. So the position of this word OT would be p(OT)=0\times (2-1)!=0

Step 6: Let's imagine now that the new word is ANKFOT and so is the new alphabet {T} where T=0. So the position of this word T would be p(T)=0\times (1-1)!=0

Step 7: Let's imagine now that the new word is ANKFOT i.e. the empty word. So here we stop!

Now, if we add up all the sum from all steps we'll get 0+48+6+0+0+0=54 and this is our final answer: the word ANKFOT would be the 54th word in the lexicographically sorted list of all 720 words that we can create with the letters from the alphabet {A,F,K,N,O,T}.

Remember that the other question was "What word exists at the index 711". We'll use the backward algorithm described at (E3).

Our alphabet is {A,F,K,N,O,T} with the following indexes: A=0, F=1, K=2, N=3, O=4, T=5. What is the word at the index k=711? Note that the first word would have index=0.

Step 1: divide the index k=711 at the size of the alphabet minus one, i.e.\frac{711}{(6-1)!} which gives the quotient q=5 and the rest r=111. Because the q=5 it means that the first letter of the searched word has the index 5, i.e. it is T. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index became k=r=111.

Step 2: divide the index k=111 at the size of the new alphabet minus one, i.e.\frac{111}{(5-1)!} which gives the quotient q=4 and the rest r=15. Because the q=4 it means that the first letter of the searched word has the index 4 (in the new alphabet), i.e. it is O. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index become k=r=15.

Step 3: divide the index k=15 at the size of the new alphabet minus one, i.e.\frac{15}{(4-1)!} which gives the quotient q=2 and the rest r=3. Because the q=2 it means that the first letter of the searched word has the index 2 (in the new alphabet), i.e. it is K. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index become k=r=3.

Step 4: divide the index k=3 at the size of the new alphabet minus one, i.e.\frac{3}{(3-1)!} which gives the quotient q=1 and the rest r=1. Because the q=1 it means that the first letter of the searched word has the index 1 (in the new alphabet), i.e. it is F. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index become k=r=1.

Step 5: divide the index k=1 at the size of the new alphabet minus one, i.e.\frac{1}{(2-1)!} which gives the quotient q=1 and the rest r=0. Because the q=1 it means that the first letter of the searched word has the index 1 (in the new alphabet), i.e. it is N. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index become k=r=0.

Step 6: divide the index k=0 at the size of the new alphabet minus one, i.e.\frac{0}{(1-1)!} which gives the quotient q=0 and the rest r=0. Because the q=0 it means that the first letter of the searched word has the index 0 (in the new alphabet), i.e. it is A. The other letters are {A,F,K,N,O,T} which is our newly alphabet. The new index become k=r=0. But hey, our new alphabet is empty so we can stop here.

So the word at the index 711 would be TOKFNA. If you would searched the index 710 it would be TOKFAN 🙂

Now, if you think that this article was interesting don't forget to rate it. It shows me that you care and thus I will continue write about these things.

The following two tabs change content below.
Finding the lexicographic order of permutations

Eugen Mihailescu

Founder/programmer/one-man-show at Cubique Software
Always looking to learn more about *nix world, about the fundamental concepts of math, physics, electronics. I am also passionate about programming, database and systems administration. 16+ yrs experience in software development, designing enterprise systems, IT support and troubleshooting.
Finding the lexicographic order of permutations

Latest posts by Eugen Mihailescu (see all)

Tagged on: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *