Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
February 14, 2022 09:58 pm GMT

How good is the shuffleArray from 7 killer one liners ?

As many of us may like this post about 7-killer-one-liners, we all know that shuffling looks not very promising, compared with the "correct" way, Fisher-Yates and its variants.

const shuffleArray = (arr) => arr.sort(() => Math.random() - 0.5)

But, how bad can it be? Basically it depends on the sorting algorithm. It is usually some kind of intro-sort, with is usually some mixture of quick sort, insertion sort and heap sort. The randomness makes it hard to predict the result. So, let's do some experiments instead.

Firstly it is the shuffle function:

declare global {  interface Array<T> {    shuffle: () => T[]  }}Array.prototype.shuffle = function <T>(this: T[]) {  return this.sort(() => Math.random() - 0.5)}export {}

And now we can:

const experiment = (N: number, times?: number) => {  times = times ?? N ** 2  const original = [...Array(N).keys()]  const samples = Array.from(Array(times), () => [...original].shuffle())}

We now have so many shuffled samples, but how can we assess them ?

Here we gonna calculate the frequency each number may appear on each position.

const NumberPosition = (numbers: number[], samples: number[][]) => {  return numbers.map(    n => samples.map(sample => [n, sample.indexOf(n)] as const)    // (n, k) => samples.map(sample => [sample[k], k] as const)  ).flat(1)}const experiment = (N: number, times?: number) => {  times = times ?? N ** 2  const original = [...Array(N).keys()]  const samples = Array.from(Array(times), () => [...original].shuffle())  const pairs = NumberPosition(original, samples)}

Both methods work. The former one seems more "understandable", and we don't care about performance at all.

Here we gonna count the pairs. We need a Map<[number, number], number> for that. But here is a problem:

const m = new Map<[number, number], number>()m.set([0, 0], 1)m.set([0, 0], 2)console.log(m)> Map(2) { [ 0, 0 ] => 1, [ 0, 0 ] => 2 }

To make things cool, we use a pool, which is a [number, number][][], to keep the reference unique.

  const map = new Map<readonly [number, number], number>()  const pool = original.map(    n => original.map((_, k) => [n, k] as const)  )  const keyOf = (pair: readonly [number, number]) =>    pool[pair[0]][pair[1]]  for (const pair of pairs) {    const key = keyOf(pair)    map.set(key, (map.get(key) ?? 0) + 1)  }

All these as const are there for a reason.

Now we have the stats. We gonna sort it by counts.

  return Array.from(map.entries())    .sort(([, a], [, b]) => b - a)

Now the whole script looks like:

declare global {  interface Array<T> {    shuffle: () => T[]  }}Array.prototype.shuffle = function <T>(this: T[]) {  return this.sort(() => Math.random() - 0.5)}const experiment = (N: number, times?: number) => {  times = times ?? N ** 2  const original = [...Array(N).keys()]  const samples = Array.from(Array(times), () => [...original].shuffle())  const pairs = original.map(    n => samples.map(sample => [n, sample.indexOf(n)] as const)    // (n, k) => samples.map(sample => [sample[k], k] as const)  ).flat(1)  const map = new Map<readonly [number, number], number>()  const pool = original.map(n => original.map((_, k) => [n, k] as const))  const keyOf = (pair: readonly [number, number]) => pool[pair[0]][pair[1]]  for (const pair of pairs) {    const key = keyOf(pair)    map.set(key, (map.get(key) ?? 0) + 1)  }  return Array.from(map.entries()).sort(([, a], [, b]) => b - a)}export { }

So now let's try sth easy:

console.table(experiment(3, 65536))

and the result:

 (index)     0        1       0     [ 1, 1 ]  45117     1     [ 2, 2 ]  32746     2     [ 0, 0 ]  28609     3     [ 0, 2 ]  24666     4     [ 2, 0 ]  24632     5     [ 1, 0 ]  12295     6     [ 0, 1 ]  12261     7     [ 2, 1 ]  8158      8     [ 1, 2 ]  8124  

[1, 1] 45117 and [2, 2] 32746 vs [1, 2] 8124 and [2, 1] 8158, That means some elements are more likely to stay where they originally are: and it is 45117/65536, not a very good one.

Let's try a larger array. For larger ones, we care only about first few and last few records, so let's do a filter:

const endN = 4console.table(  experiment(40, 100000)    .filter(      (_, k, a) => k < endN || a.length - k < endN))
 (index)      0        1       0      [ 0, 0 ]   7031     1      [ 0, 1 ]   6308     2     [ 30, 39 ]  4650     3      [ 3, 0 ]   4624     4     [ 1, 37 ]   772      5     [ 1, 38 ]   579      6     [ 1, 39 ]   378  

10 times, but it is 0.07, seems better. And it means "there are a possibility of 0.07 that 0 stays on position 0".

Things are kept near where they were, typical insertion sort. This is how intro-sort looks like when N is low.

And a larger one, 1000. I have to do less iterations (down to 10000) or there will be no enough address space for node.js to use.

 (index)       0        1      0      [ 441, 0 ]   55     1       [ 0, 4 ]    53     2      [ 315, 1 ]   52     3       [ 0, 3 ]    52     4      [ 252, 2 ]   49     5      [ 0, 10 ]    48     6      [ 0, 13 ]    48     7      [ 63, 4 ]    47     8       [ 0, 9 ]    47     9      [ 189, 3 ]   46    10     [ 190, 999 ]  1     11     [ 134, 999 ]  1     12     [ 887, 999 ]  1     13     [ 946, 999 ]  1     14     [ 63, 999 ]   1     15     [ 632, 999 ]  1     16     [ 883, 999 ]  1     17     [ 71, 999 ]   1     18     [ 889, 999 ]  1  

No much data but a stable one. 55/10000 is not a very big issue, but 55:1 is still bad.

At the end let's try a real Fisher-Yates and see how good it is:

Array.prototype.shuffle = function <T>(this: T[]) {  for (let i = this.length - 1; i > 0; i--) {    const j = Math.floor(Math.random() * (i + 1));    [this[i], this[j]] = [this[j], this[i]]  }  return this}

You can tell from above I don't like semis, but I have to keep this one :-).
and

 (index)     0       1       0     [ 2, 0 ]  3370     1     [ 1, 2 ]  3369     2     [ 0, 2 ]  3360     3     [ 2, 1 ]  3359     4     [ 0, 1 ]  3344     5     [ 1, 0 ]  3334     6     [ 1, 1 ]  3297     7     [ 0, 0 ]  3296     8     [ 2, 2 ]  3271 

Looks good.

and 40

 (index)      0        1       0     [ 39, 11 ]  2638     1     [ 11, 11 ]  2636     2     [ 38, 34 ]  2634     3     [ 4, 36 ]   2633     4     [ 20, 21 ]  2348     5     [ 27, 25 ]  2348     6     [ 32, 20 ]  2345 

and 100

 (index)      0        1       0     [ 74, 70 ]  2168     1     [ 55, 2 ]   2167     2     [ 68, 74 ]  2164     3     [ 50, 20 ]  2157     4     [ 35, 54 ]  1830     5     [ 3, 92 ]   1823     6     [ 27, 69 ]  1794 

The GC becomes unhappy when I increase the size, due to the address space limitation, and I am unhappy to make the code GC friendly :), but this is enough.


Original Link: https://dev.to/yw662/how-good-is-the-shufflearray-from-7-killer-one-liners--5h46

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To