### Wednesday, May 11, 2005

## Some notes on Algorithm Design Problems - 1

1.

Implement Shuffle given an array containing a deck of cards and the number of cards. Now make it O(n).

An answer in javascript

script language="javascript">

var sequence = new Array(100);

for ( var i=0 ; i < 100 ; i++ )

{

sequence[i] = i+1;

}

alert(Shuffle(sequence));

function Shuffle(ary)

{

for ( var i=ary.length-1 ; i >= 0 ; i-- )

{

var v = parseInt(Math.random()*(i+1));

var tmp = ary[v];

ary[v] = ary[i];

ary[i] = tmp;

}

return ary;

}

/script>

2.

如何遍历一次一个单向链表,均等概率的选出一个node?

其实这就是Reservoir Sampling with a Reservoir的一个实例

J. S. Vitter. Random Sampling with a Reservoir, ACM Transactions on Mathematical Software, 11(1), March 1985, 37-57.

This paper presents an improved and optimized verWe introduce fast algorithms for selecting a random sample of nrecords without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1+log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

The complete algorithm is given below. The current set of n candidates is

stored in the array C, in locations C[O], C[l], . . . , C[n - 1]. Built-in Boolean

function eof returns true if the end of file has been reached. The random number

generator RANDOM returns a real number in the unit interval. The procedure

call READ-NEXT-RECORD(R) reads the next record from the file and stores

it in the record R. The procedure call SKIP-RECORDS(k) reads past (i.e., skips

over) the next k records in the tile.

{Make the first n records candidates for the sample)

for j := 0 to n - 1 do READ-NEXT-RECORD(C[j]);

t := n; {t is the number of records processed so far}

while not eof do (Process the rest of the records1

begin

t := t + 1;

M := TRUNC (t x RANDOM( )); {M is random in the range 0 <= M <= t - 1}

if M < n then {Make the next record a candidate, replacing one at random)

READ-NEXT-RECORD(C[d])

else

SKIP-RECORDS(l)

(Skip over the next record)

end;

When the end of file has been reached, the n candidates stored in the array C

form a true random sample of the N records in the file.

If the internal memory is not large enough to store the n candidates, the

algorithm can be modified as follows: The reservoir is stored sequentially on

secondary storage; pointers to the current candidates are stored in internal

memory in an array, which we call I. (We assume that there is enough space to

store n pointers.) Suppose during the algorithm that record R’ is chosen as a

candidate to replace record R, which is pointed to by I[k]. Record R is written

sequentially ont,o secondary storage, and I[k] is set to point to R. The above code

can be modified by replacing the initial for loop with the following code:

for j := 0 to n - 1 do

begin

Copy the jth record onto secondary storage;

Set I[j] to point to the jth record;

end,

Program statement “READ-NEXT-RECORD(C[M])” should be replaced by

begin

Copy the next record onto secondary storage;

Set I[M] to point to that record

end

Retrieval of the final sample from secondary storage can be sped up by retrieving

the records in sequential order. This can be done by sorting the pointers I[1],

I[2], ... , I[n]. The sort should be very fast because it can be done in internal

memory.

3.

Dynamic Generation of Discrete Random Variates

Y. Matias, J. S. Vitter and W.-C. Ni, ``Dynamic Generation of Discrete Random Variates,'' Theory of Computing Systems, to appear. A shortened earlier version appears in Proceedings of the 4th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA '93), Austin, TX, January 1993, 361-370.

http://www.cs.duke.edu/~jsv/Papers/catalog/node112.html

Implement Shuffle given an array containing a deck of cards and the number of cards. Now make it O(n).

An answer in javascript

script language="javascript">

var sequence = new Array(100);

for ( var i=0 ; i < 100 ; i++ )

{

sequence[i] = i+1;

}

alert(Shuffle(sequence));

function Shuffle(ary)

{

for ( var i=ary.length-1 ; i >= 0 ; i-- )

{

var v = parseInt(Math.random()*(i+1));

var tmp = ary[v];

ary[v] = ary[i];

ary[i] = tmp;

}

return ary;

}

/script>

2.

如何遍历一次一个单向链表,均等概率的选出一个node?

其实这就是Reservoir Sampling with a Reservoir的一个实例

J. S. Vitter. Random Sampling with a Reservoir, ACM Transactions on Mathematical Software, 11(1), March 1985, 37-57.

This paper presents an improved and optimized verWe introduce fast algorithms for selecting a random sample of nrecords without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1+log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

The complete algorithm is given below. The current set of n candidates is

stored in the array C, in locations C[O], C[l], . . . , C[n - 1]. Built-in Boolean

function eof returns true if the end of file has been reached. The random number

generator RANDOM returns a real number in the unit interval. The procedure

call READ-NEXT-RECORD(R) reads the next record from the file and stores

it in the record R. The procedure call SKIP-RECORDS(k) reads past (i.e., skips

over) the next k records in the tile.

{Make the first n records candidates for the sample)

for j := 0 to n - 1 do READ-NEXT-RECORD(C[j]);

t := n; {t is the number of records processed so far}

while not eof do (Process the rest of the records1

begin

t := t + 1;

M := TRUNC (t x RANDOM( )); {M is random in the range 0 <= M <= t - 1}

if M < n then {Make the next record a candidate, replacing one at random)

READ-NEXT-RECORD(C[d])

else

SKIP-RECORDS(l)

(Skip over the next record)

end;

When the end of file has been reached, the n candidates stored in the array C

form a true random sample of the N records in the file.

If the internal memory is not large enough to store the n candidates, the

algorithm can be modified as follows: The reservoir is stored sequentially on

secondary storage; pointers to the current candidates are stored in internal

memory in an array, which we call I. (We assume that there is enough space to

store n pointers.) Suppose during the algorithm that record R’ is chosen as a

candidate to replace record R, which is pointed to by I[k]. Record R is written

sequentially ont,o secondary storage, and I[k] is set to point to R. The above code

can be modified by replacing the initial for loop with the following code:

for j := 0 to n - 1 do

begin

Copy the jth record onto secondary storage;

Set I[j] to point to the jth record;

end,

Program statement “READ-NEXT-RECORD(C[M])” should be replaced by

begin

Copy the next record onto secondary storage;

Set I[M] to point to that record

end

Retrieval of the final sample from secondary storage can be sped up by retrieving

the records in sequential order. This can be done by sorting the pointers I[1],

I[2], ... , I[n]. The sort should be very fast because it can be done in internal

memory.

3.

Dynamic Generation of Discrete Random Variates

Y. Matias, J. S. Vitter and W.-C. Ni, ``Dynamic Generation of Discrete Random Variates,'' Theory of Computing Systems, to appear. A shortened earlier version appears in Proceedings of the 4th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA '93), Austin, TX, January 1993, 361-370.

http://www.cs.duke.edu/~jsv/Papers/catalog/node112.html