### Friday, July 09, 2004

## Some interview questions from "Programming Pearls"

1. Vector Rotation

The Problem

Rotate vectorx[n] left bydpositions.

For n=8 andd=3,

changeabcdefghtodefghabc.

Constraints:O(n) time,O( 1 ) extra space.

Pricey Solutions

Storedin intermediate vector, shift the rest, store back. [O(n) extra space.]

Rotate by 1dtimes. [O(n) time.]

A Juggling Algorithm

The Code

for i = [0, gcd(d, n))

/* move i-th values of blocks */

t = x[i]

j = i

loop

k = j + d

if k >= n

k -= n

if k == i

break

x[j] = x[k]

j = k

x[j] = t

The Block-Swap Algorithm

The Idea: Change abtoba

If a is shorter, divide b into bl and br.

Swap a and br to change a bl br into br bl a.

Recur on pieces of b.

The Code

if d == 0 || d == n

return

i = p = d

j = n - p

while i != j

if i > j

swap(p-i, p, j)

i -= j

else

swap(p-i, p+j-i, i)

j -= i

swap(p-i, p, i)

The Reversal Algorithm

The Code /* rotateabcdefghleft three */

reverse(0, d-1) /* cbadefgh */

reverse(d, n-1) /* cbahgfed */

reverse(0, n-1) /* defghabc */

Best is the Block-Swap Algorithm, second is The Reversal Algorithm.

2. Anagrams

Definition.Two words are anagrams if one can be formed by permuting the letters in the other.

A 6-element anagram set:

opts pots post stop spot tops

The Problem.How would you compute all anagram sets in a dictionary of 230,000 English words? (Related problem: how to find all anagrams of an input word?)

Two Slow Algorithms

Examine All Permutations. ~~1.1*10^9 seconds

Examine All Pairs of Words. ~~14.7 hours

A Fast Algorithm

The Idea.Sign words so that those in the same anagram class have the same signature, and then collect equal signatures.

The Signature.Sort letters within a word. The signature of deposit is deiopst, which is also the signature of dopiest.

Collecting the Classes.Sort the words by their signatures.

signin C

int charcomp(char *x, char *y)

{ return(*x - *y); }

#define WORDMAX 100

int main(void)

{ char word[WORDMAX], sig[WORDMAX];

while (scanf("%s", word) != EOF) {

strcpy(sig, word);

qsort(sig, strlen(sig),

sizeof(char), charcomp);

printf("%s %s\n", sig, word);

}

return 0;

}

squashin C

int main(void)

{ char word[MAX], sig[MAX], oldsig[MAX];

int linenum = 0;

strcpy(oldsig, "");

while (scanf("%s %s", sig, word) != EOF)

if (strcmp(oldsig, sig) != 0

&& linenum > 0)

printf("\n");

strcpy(oldsig, sig);

linenum++;

printf("%s ", word);

}

printf("\n");

return 0;

}

The Complete Program

Executed by

signgrams

where dictc ontains 230,000 words.

Total run time is 18 seconds: 4 insign, 11 insort and 3 insquash.

Some Output:

subessential suitableness

canter creant cretan nectar recant tanrec trance

caret carte cater crate creat creta react recta trace

destain instead sainted satined

adroitly dilatory idolatry

least setal slate stale steal stela tales

reins resin rinse risen serin siren

constitutionalism misconstitutional

3.

The Problem Definition. Given the real vectorx[n], compute the maximum sum found in any contiguous subvector.

A Cubic Algorithm

Idea.For all pairs of integersiandjsatisfying 0<=i<=jCode.

maxsofar = 0

for i = [0, n)

for j = [i, n)

sum = 0

for k = [i, j]

sum += x[k]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

A Quadratic Algorithm

Idea.The sum ofx[i..j] is close to the previous sum,x[i..j-1 ].

Code.

maxsofar = 0

for i = [0, n)

sum = 0

for j = [i, n)

sum += x[j]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

Another Quadratic Algorithm

Idea.A cumulative array allows sums to be computed quickly. Ifytd[i] contains year-to-date sales through monthi, then sales from March through September are given byytd[sep]-ytd[feb].

Implementation.Use the cumulative arraycumarr.

. . .

Initializecumarr[i]=x[ 0 ]++x[i]. The sum of the values inx[i..j] iscumarr[j]-cumarr[i-1 ].

Code for Algorithm 2b.

cumarr[-1] = 0

for i = [0, n)

cumarr[i] = cumarr[i-1] + x[i]

maxsofar = 0

for i = [0, n)

for j = [i, n)

sum = cumarr[j] - cumarr[i-1]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

AnO(nlogn) Algorithm

The Divide-and-Conquer Schema.To solve a problem of sizen, recursively solve two subproblems of sizen /2 and combine their solutions.

The Idea.Divide into two subproblems.

Recursively find maximum in subvectors.

Find maximum crossing subvector.

Return max ofma,mbandmc.

Run Time. O(nlogn).

Code for theO(NlogN) Algorithm

float maxsum3(l, u)

if (l > u) /* zero elements */

return 0

if (l == u) /* one element */

return max(0, x[l])

m = (l + u) / 2

/* find max crossing to left */

lmax = sum = 0

for (i = m; i >= l; i--)

sum += x[i]

lmax = max(lmax, sum)

/* find max crossing to right */

rmax = sum = 0

for i = (m, u]

sum += x[i]

rmax = max(rmax, sum)

return max(lmax+rmax,

maxsum3(l, m),

maxsum3(m+1, u))

A Linear Algorithm

Idea.How can we extend a solution forx[ 0..i-1 ]

into a solution forx[ 0..i]? Key variables:

Code.

maxsofar = 0

maxhere = 0

for i = [0, n)

/* invariant: maxhere and maxsofar

are accurate for x[0..i-1] */

maxhere = max(maxhere + x[i], 0)

maxsofar = max(maxsofar, maxhere)

Run Time. O(n).

4.

Longest Duplicated String

The Problem

Input: Ask not what your country can do for you, but what you can do for your country.

Output: can do for you (15 chars)

A Simple Algorithm (at least quadratic)

maxlen = -1

for i = [0, n)

for j = (i, n)

if (l=comlen(&c[i], &c[j])) > maxlen

{

maxlen = l

maxi = i

maxj = j

}

An Important Function

int comlen(char *p, char *q)

i = 0

while *p && (*p++ == *q++)

i++

return i

A Fast Algorithm

Key Data Structures

#define MAXN 5000000

char c[MAXN], *a[MAXN];

Input Suffix Array for banana

a[0]: banana

a[1]: anana

a[2]: nana

a[3]: ana

a[4]: na

a[5]: a

Scan through to find longest duplicated string (ana).

The Suffix Array Code

The Code (usually aboutO(nlogn))

while (ch = getchar()) != EOF

{

a[n] = &c[n]

c[n++] = ch

}

c[n] = 0

qsort(a, n, sizeof(char *), pstrcmp)

for i = [0, n)

if comlen(a[i], a[i+1]) > maxlen

{

maxlen = comlen(a[i], a[i+1])

maxi = i

}

printf("%.*s\n", maxlen, a[maxi])

4.8 secs on 807,503 characters of Homer's Iliad:

whose sake so many of the Achaeans have died at Troy, far from their homes? Go about at once among the host, and speak fairly to them, man by man, that they draw not their ships into the sea.

The Complete Code

#include

#include

#include

char inputchars[4300000];

char *word[800000];

int nword = 0;

int k = 2;

int wordncmp(char *p, char* q)

{ int n = k;

for ( ; *p == *q; p++, q++)

if (*p == 0 && --n == 0)

return 0;

return *p - *q;

}

int sortcmp(char **p, char **q)

{ return wordncmp(*p, *q);

}

char *skip(char *p, int n)

{ for ( ; n > 0; p++)

if (*p == 0)

n--;

return p;

}

int main()

{ int i, wordsleft = 10000, l, m, u;

char *phrase, *p;

word[0] = inputchars;

while (scanf("%s", word[nword]) != EOF) {

word[nword+1] = word[nword] + strlen(word[nword]) + 1;

nword++;

}

for (i = 0; i < k; i++)

word[nword][i] = 0;

for (i = 0; i < k; i++)

printf("%s0, word[i]);

qsort(word, nword, sizeof(word[0]), sortcmp);

phrase = inputchars;

for ( ; wordsleft > 0; wordsleft--) {

l = -1;

u = nword;

while (l+1 != u) {

m = (l + u) / 2;

if (wordncmp(word[m], phrase) < 0)

l = m;

else

u = m;

}

for (i = 0; wordncmp(phrase, word[u+i]) == 0; i++)

if (rand() % (i+1) == 0)

p = word[u+i];

phrase = skip(p, 1);

if (strlen(skip(phrase, k-1)) == 0)

break;

printf("%s0, skip(phrase, k-1));

}

return 0;

}

The Problem

Rotate vectorx[n] left bydpositions.

For n=8 andd=3,

changeabcdefghtodefghabc.

Constraints:O(n) time,O( 1 ) extra space.

Pricey Solutions

Storedin intermediate vector, shift the rest, store back. [O(n) extra space.]

Rotate by 1dtimes. [O(n) time.]

A Juggling Algorithm

The Code

for i = [0, gcd(d, n))

/* move i-th values of blocks */

t = x[i]

j = i

loop

k = j + d

if k >= n

k -= n

if k == i

break

x[j] = x[k]

j = k

x[j] = t

The Block-Swap Algorithm

The Idea: Change abtoba

If a is shorter, divide b into bl and br.

Swap a and br to change a bl br into br bl a.

Recur on pieces of b.

The Code

if d == 0 || d == n

return

i = p = d

j = n - p

while i != j

if i > j

swap(p-i, p, j)

i -= j

else

swap(p-i, p+j-i, i)

j -= i

swap(p-i, p, i)

The Reversal Algorithm

The Code /* rotateabcdefghleft three */

reverse(0, d-1) /* cbadefgh */

reverse(d, n-1) /* cbahgfed */

reverse(0, n-1) /* defghabc */

Best is the Block-Swap Algorithm, second is The Reversal Algorithm.

2. Anagrams

Definition.Two words are anagrams if one can be formed by permuting the letters in the other.

A 6-element anagram set:

opts pots post stop spot tops

The Problem.How would you compute all anagram sets in a dictionary of 230,000 English words? (Related problem: how to find all anagrams of an input word?)

Two Slow Algorithms

Examine All Permutations. ~~1.1*10^9 seconds

Examine All Pairs of Words. ~~14.7 hours

A Fast Algorithm

The Idea.Sign words so that those in the same anagram class have the same signature, and then collect equal signatures.

The Signature.Sort letters within a word. The signature of deposit is deiopst, which is also the signature of dopiest.

Collecting the Classes.Sort the words by their signatures.

signin C

int charcomp(char *x, char *y)

{ return(*x - *y); }

#define WORDMAX 100

int main(void)

{ char word[WORDMAX], sig[WORDMAX];

while (scanf("%s", word) != EOF) {

strcpy(sig, word);

qsort(sig, strlen(sig),

sizeof(char), charcomp);

printf("%s %s\n", sig, word);

}

return 0;

}

squashin C

int main(void)

{ char word[MAX], sig[MAX], oldsig[MAX];

int linenum = 0;

strcpy(oldsig, "");

while (scanf("%s %s", sig, word) != EOF)

if (strcmp(oldsig, sig) != 0

&& linenum > 0)

printf("\n");

strcpy(oldsig, sig);

linenum++;

printf("%s ", word);

}

printf("\n");

return 0;

}

The Complete Program

Executed by

sign

where dictc ontains 230,000 words.

Total run time is 18 seconds: 4 insign, 11 insort and 3 insquash.

Some Output:

subessential suitableness

canter creant cretan nectar recant tanrec trance

caret carte cater crate creat creta react recta trace

destain instead sainted satined

adroitly dilatory idolatry

least setal slate stale steal stela tales

reins resin rinse risen serin siren

constitutionalism misconstitutional

3.

The Problem Definition. Given the real vectorx[n], compute the maximum sum found in any contiguous subvector.

A Cubic Algorithm

Idea.For all pairs of integersiandjsatisfying 0<=i<=j

maxsofar = 0

for i = [0, n)

for j = [i, n)

sum = 0

for k = [i, j]

sum += x[k]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

A Quadratic Algorithm

Idea.The sum ofx[i..j] is close to the previous sum,x[i..j-1 ].

Code.

maxsofar = 0

for i = [0, n)

sum = 0

for j = [i, n)

sum += x[j]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

Another Quadratic Algorithm

Idea.A cumulative array allows sums to be computed quickly. Ifytd[i] contains year-to-date sales through monthi, then sales from March through September are given byytd[sep]-ytd[feb].

Implementation.Use the cumulative arraycumarr.

. . .

Initializecumarr[i]=x[ 0 ]++x[i]. The sum of the values inx[i..j] iscumarr[j]-cumarr[i-1 ].

Code for Algorithm 2b.

cumarr[-1] = 0

for i = [0, n)

cumarr[i] = cumarr[i-1] + x[i]

maxsofar = 0

for i = [0, n)

for j = [i, n)

sum = cumarr[j] - cumarr[i-1]

/* sum is sum of x[i..j] */

maxsofar = max(maxsofar, sum)

Run Time. O(n).

AnO(nlogn) Algorithm

The Divide-and-Conquer Schema.To solve a problem of sizen, recursively solve two subproblems of sizen /2 and combine their solutions.

The Idea.Divide into two subproblems.

Recursively find maximum in subvectors.

Find maximum crossing subvector.

Return max ofma,mbandmc.

Run Time. O(nlogn).

Code for theO(NlogN) Algorithm

float maxsum3(l, u)

if (l > u) /* zero elements */

return 0

if (l == u) /* one element */

return max(0, x[l])

m = (l + u) / 2

/* find max crossing to left */

lmax = sum = 0

for (i = m; i >= l; i--)

sum += x[i]

lmax = max(lmax, sum)

/* find max crossing to right */

rmax = sum = 0

for i = (m, u]

sum += x[i]

rmax = max(rmax, sum)

return max(lmax+rmax,

maxsum3(l, m),

maxsum3(m+1, u))

A Linear Algorithm

Idea.How can we extend a solution forx[ 0..i-1 ]

into a solution forx[ 0..i]? Key variables:

Code.

maxsofar = 0

maxhere = 0

for i = [0, n)

/* invariant: maxhere and maxsofar

are accurate for x[0..i-1] */

maxhere = max(maxhere + x[i], 0)

maxsofar = max(maxsofar, maxhere)

Run Time. O(n).

4.

Longest Duplicated String

The Problem

Input: Ask not what your country can do for you, but what you can do for your country.

Output: can do for you (15 chars)

A Simple Algorithm (at least quadratic)

maxlen = -1

for i = [0, n)

for j = (i, n)

if (l=comlen(&c[i], &c[j])) > maxlen

{

maxlen = l

maxi = i

maxj = j

}

An Important Function

int comlen(char *p, char *q)

i = 0

while *p && (*p++ == *q++)

i++

return i

A Fast Algorithm

Key Data Structures

#define MAXN 5000000

char c[MAXN], *a[MAXN];

Input Suffix Array for banana

a[0]: banana

a[1]: anana

a[2]: nana

a[3]: ana

a[4]: na

a[5]: a

Scan through to find longest duplicated string (ana).

The Suffix Array Code

The Code (usually aboutO(nlogn))

while (ch = getchar()) != EOF

{

a[n] = &c[n]

c[n++] = ch

}

c[n] = 0

qsort(a, n, sizeof(char *), pstrcmp)

for i = [0, n)

if comlen(a[i], a[i+1]) > maxlen

{

maxlen = comlen(a[i], a[i+1])

maxi = i

}

printf("%.*s\n", maxlen, a[maxi])

4.8 secs on 807,503 characters of Homer's Iliad:

whose sake so many of the Achaeans have died at Troy, far from their homes? Go about at once among the host, and speak fairly to them, man by man, that they draw not their ships into the sea.

The Complete Code

#include

#include

#include

char inputchars[4300000];

char *word[800000];

int nword = 0;

int k = 2;

int wordncmp(char *p, char* q)

{ int n = k;

for ( ; *p == *q; p++, q++)

if (*p == 0 && --n == 0)

return 0;

return *p - *q;

}

int sortcmp(char **p, char **q)

{ return wordncmp(*p, *q);

}

char *skip(char *p, int n)

{ for ( ; n > 0; p++)

if (*p == 0)

n--;

return p;

}

int main()

{ int i, wordsleft = 10000, l, m, u;

char *phrase, *p;

word[0] = inputchars;

while (scanf("%s", word[nword]) != EOF) {

word[nword+1] = word[nword] + strlen(word[nword]) + 1;

nword++;

}

for (i = 0; i < k; i++)

word[nword][i] = 0;

for (i = 0; i < k; i++)

printf("%s0, word[i]);

qsort(word, nword, sizeof(word[0]), sortcmp);

phrase = inputchars;

for ( ; wordsleft > 0; wordsleft--) {

l = -1;

u = nword;

while (l+1 != u) {

m = (l + u) / 2;

if (wordncmp(word[m], phrase) < 0)

l = m;

else

u = m;

}

for (i = 0; wordncmp(phrase, word[u+i]) == 0; i++)

if (rand() % (i+1) == 0)

p = word[u+i];

phrase = skip(p, 1);

if (strlen(skip(phrase, k-1)) == 0)

break;

printf("%s0, skip(phrase, k-1));

}

return 0;

}