Wednesday, May 11, 2005


Some notes on Algorithm Design Problems - 1

Implement Shuffle given an array containing a deck of cards and the number of cards. Now make it O(n).

An answer in javascript

script language="javascript">
var sequence = new Array(100);
for ( var i=0 ; i < 100 ; i++ )
sequence[i] = i+1;

function Shuffle(ary)
for ( var i=ary.length-1 ; i >= 0 ; i-- )
var v = parseInt(Math.random()*(i+1));
var tmp = ary[v];
ary[v] = ary[i];
ary[i] = tmp;
return ary;


其实这就是Reservoir Sampling with a Reservoir的一个实例

J. S. Vitter. Random Sampling with a Reservoir, ACM Transactions on Mathematical Software, 11(1), March 1985, 37-57.

This paper presents an improved and optimized verWe introduce fast algorithms for selecting a random sample of nrecords without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O(n(1+log(N/n))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.

The complete algorithm is given below. The current set of n candidates is
stored in the array C, in locations C[O], C[l], . . . , C[n - 1]. Built-in Boolean
function eof returns true if the end of file has been reached. The random number
generator RANDOM returns a real number in the unit interval. The procedure
call READ-NEXT-RECORD(R) reads the next record from the file and stores
it in the record R. The procedure call SKIP-RECORDS(k) reads past (i.e., skips
over) the next k records in the tile.

{Make the first n records candidates for the sample)
for j := 0 to n - 1 do READ-NEXT-RECORD(C[j]);
t := n; {t is the number of records processed so far}
while not eof do (Process the rest of the records1
t := t + 1;
M := TRUNC (t x RANDOM( )); {M is random in the range 0 <= M <= t - 1}
if M < n then {Make the next record a candidate, replacing one at random)
(Skip over the next record)
When the end of file has been reached, the n candidates stored in the array C
form a true random sample of the N records in the file.

If the internal memory is not large enough to store the n candidates, the
algorithm can be modified as follows: The reservoir is stored sequentially on
secondary storage; pointers to the current candidates are stored in internal
memory in an array, which we call I. (We assume that there is enough space to
store n pointers.) Suppose during the algorithm that record R’ is chosen as a
candidate to replace record R, which is pointed to by I[k]. Record R is written
sequentially ont,o secondary storage, and I[k] is set to point to R. The above code
can be modified by replacing the initial for loop with the following code:

for j := 0 to n - 1 do
Copy the jth record onto secondary storage;
Set I[j] to point to the jth record;
Program statement “READ-NEXT-RECORD(C[M])” should be replaced by
Copy the next record onto secondary storage;
Set I[M] to point to that record

Retrieval of the final sample from secondary storage can be sped up by retrieving
the records in sequential order. This can be done by sorting the pointers I[1],
I[2], ... , I[n]. The sort should be very fast because it can be done in internal

Dynamic Generation of Discrete Random Variates

Y. Matias, J. S. Vitter and W.-C. Ni, ``Dynamic Generation of Discrete Random Variates,'' Theory of Computing Systems, to appear. A shortened earlier version appears in Proceedings of the 4th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA '93), Austin, TX, January 1993, 361-370.

<< Home

This page is powered by Blogger. Isn't yours?