Statistical Question

Blarghy · September 2013

Hey folks,

I'm currently working on a custom random loot generator for my DnD group. I want my loot drops to work on a bell shaped curve, so I'm using an RNG to generate 0 to 50 twice, and 1 to 50 twice, add all four numbers together and divide by 2 (to normalize the distribution). From this I know I can calculate the standard deviation (about 15) and use the 68-95-99.7 percent rule to spit-ball about how often a range of numbers will come up.

However, I'd like to get a little bit better idea about how the probabilities are likely to break down more finely (say, how often a 10 will come up). Anyone know how to do this (or point me in the direction of a utility that can help me do this)? It doesn't have to be anything super accurate, a rough estimate is good enough for my purposes.

tarnok · September 2013

To get more precise you're going to want the error function. Check the wikipedia article for the normal distribution. You can find the error function under the heading "Cumulative distribution." The error function gives you the probability of a random variable occurring between two given values. Integral calculus will be required.

There are programs that can work this out for you but the only ones I am aware of are too expensive to consider for this unless you're dealing with heavy mathematical lifting every day. If you have a friend who's good with math you might be able to convince them to section your bell curve up in 10% increments in exchange for lunch.

edit: It might actually be easier to use the CDF. Depends on how you look at it I guess.

Rend · September 2013

The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

tarnok · September 2013

Rend wrote: »

The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

I think I see how this works, but that only works if your random variable is actually or effectively a whole number, yes?

Rend · September 2013

tarnok wrote: »

Rend wrote: »

The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

I think I see how this works, but that only works if your random variable is actually or effectively a whole number, yes?

Well, no.

You can measure the height of a graph at any point, whether that point is an integer or not. The X-axis is a number line, and the Y-axis are the probabilities of each of those possibilities. The only difference between whole numbers and a continuous spectrum on the x-axis is that you need calculus to confirm that the area under the curve is equal to 1. Measuring the height of the graph is a simple operation either way.

GenlyAi · September 2013

I've got two points for you:

First, if you just want a bell-shaped curve, and don't specifically need it to behave like the sum of four uniformly distributed integers, you can use the NORMDIST function in excel to estimate the probabilities. You need to set the "cumulative" argument to TRUE. Then, to find out what fraction of the time you will get, say, between 10 and 11, you can do '=NORMDIST(11,mean,stdev,TRUE)-NORMDIST(10,mean,stdev,TRUE)'. This is in excel 2003.

Second, if you do want it to behave like integers, as you specified, just simulate it. You can use excel to approximate this, by putting a formula on each row to reflect your sum (ie '=(INT(RAND()*51)+INT(RAND()*51)+INT(RAND()*50)+1+INT(RAND()*50)+1)/2'), filling it down for 10,000 rows, then just counting what fraction of the time you have a 10 or 10.5. This kind of thoughtless simulation is generally my solution to all statistical problems.

tarnok · September 2013

I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

Rend · September 2013

tarnok wrote: »

I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

Though mathematically true, you can still simply measure the height of the graph to get the information the OP wants.

For one thing, the OP is not asking about calculus, just how to get the probability of a given event occurring, and measuring the height of the graph at the point he's looking for will work regardless of whether he's dealing with a discrete or continuous variable.

Secondly, he's dealing with a discrete variable, so even if he wanted to know literally everything there was to know about this particular problem, continuous variables would be irrelevant anyway.

Zombie Hero · September 2013

How often are you giving out loot?

In the long term, a normal distribution will be relatively predictable, but if you only hand out loot less than a dozen times you still might get wild results. If you hand out loot at least 25 to 30 times it should be fine, though.

Zombie Hero · September 2013

Ran a simulation for you. 10,000 trials and i just used a round function to make it discrete.

Number Frequency Proportion
4 1 0.0001
5 6 0.0006
6 18 0.0018
7 19 0.0019
8 35 0.0035
9 40 0.004
10 68 0.0068
11 81 0.0081
12 93 0.0093
13 139 0.0139
14 158 0.0158
15 224 0.0224
16 254 0.0254
17 282 0.0282
18 355 0.0355
19 391 0.0391
20 438 0.0438
21 478 0.0478
22 469 0.0469
23 485 0.0485
24 519 0.0519
25 560 0.056
26 535 0.0535
27 506 0.0506
28 508 0.0508
29 469 0.0469
30 439 0.0439
31 401 0.0401
32 388 0.0388
33 306 0.0306
34 279 0.0279
35 249 0.0249
36 193 0.0193
37 161 0.0161
38 120 0.012
39 96 0.0096
40 85 0.0085
41 54 0.0054
42 33 0.0033
43 27 0.0027
44 15 0.0015
45 15 0.0015
46 6 0.0006
47 2 0.0002

SUM 10000 1

tarnok · September 2013

Rend wrote: »

tarnok wrote: »

I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

Though mathematically true, you can still simply measure the height of the graph to get the information the OP wants.

For one thing, the OP is not asking about calculus, just how to get the probability of a given event occurring, and measuring the height of the graph at the point he's looking for will work regardless of whether he's dealing with a discrete or continuous variable.

Secondly, he's dealing with a discrete variable, so even if he wanted to know literally everything there was to know about this particular problem, continuous variables would be irrelevant anyway.

So the answer to the question I was asking is yes.

Rend · September 2013

tarnok wrote: »

So the answer to the question I was asking is yes.

No. Measuring the area under the curve is how you gain the probability of an outcome, regardless of whether it's continuous or discrete. The big difference is a discrete variable has a pre-set width, but a continuous variable has a variable width. Either way, you are still measuring the area under the curve though.

tarnok · September 2013

Then I have done a very poor job of asking the question because what you just said is exactly what I meant in my original question.

MrTLicious · September 2013

So I'm a bit late to the party but I thought I'd stop by with a solution to the exact probabilities of the given question because it's a pretty interesting combinatorics problem that I did because of this thread and gosh darn it you will all see my results! Blarghy, to get rough estimates, you should probably just simulate it, but this will give you an exact answer for getting a certain sum if you care to be that precise. Solution details are spoilered, and the general outline/formulas are unspoilered:

Important note: I'm going to give you the number of ways to get sums between 4 and 202 (i.e., the dice go from 1-50 and 1-51 instead of 1-50 and 0-50, and I'm not dividing by 2). This is just to make the formulas cleaner. To get a probability, translate my sum to yours (subtract 2 then divide by 2), then divide by the total number of outcomes (50^2)(51^2)

For x between 4 and 53, the number of outcomes is (x-1)(x-2)(x-3)/6

This is actually a pretty well-known problem. The question is equivalent to the following: many ways are there for 4 positive integers to sum to x, with order mattering. For more details, look up Stars and Bars. These problems are the same because with x no more than 53, we don't run into the situation where one of the dice will be capped at the die max, so every outcome of the restated problem translates to exactly one roll of the dice with the same sum.

For x = 54, the number of outcomes is (x-1)(x-2)(x-3)/6 - 2 = 23424

Here we have the same formula except for a strange -2 at the end. The reason for this is that the original formula assumes there are 4 ways to get one die with a 51 and 3 with a 1: {51,1,1,1}, {1,51,1,1}, {1,1,51,1}, and {1,1,1,51}. In reality, there are only 2, because only 2 of your dice only go to 50.

For x = 55 to 103, the number of outcomes is (x-1)(x-2)(x-3)/6 - (x-52)(x-53) - 4*(x-52)*(x-53)*(x-54)/6

Now we have the same formula again, but we need to remove a bunch of the outcomes. First, we need to remove any outcomes where one of the numbers is larger than 51, second we need to remove half of the outcomes where one of the dice is exactly 51 (because we only have 2 dice that go up to 51, instead of 4 which the formula assumes).

To figure out how many outcomes have a 51, just note that if one of the dice is a 51, then the rest of the dice have to sum to x-51. Thus, the question becomes: How many where are there for 3 dice to sum to x-51. Again, using the stars and bars method, we see that this is (x-52)(x-53)/2. Normally, we would multiply this number by 4 to get the number of outcomes that have a 51 (because there are 4 dice that could get a 51), and that number is included in the positive part of the formula. However, since 2 dice do not have a 51, we need to subtract out that number times 2, or (x-52)(x-53). This is a generalization of the step where we subtracted 2 in the formula where x = 54.

To figure out how many outcomes have a dice with a number higher than 51, we again translate the problem. Instead of asking, for example, how many ways there are for 4 positive integers to sum to 58 with exactly one being higher than 51, instead ask how many ways there are for 4 positive integers to sum to 7 (58-51), and multiply this number by 4. These turn out to be the same because for every outcome that sums to 7 (for example, {1, 3, 2, 1}), there are 4 ways to turn it into an outcome that sums to 58 by adding 51 to one of the dice ({52,3,2,1},{1,54,2,1},{1,3,53,1}, and {,1,3,2,52}). As before, the number of ways for 4 positive integers to sum to 7 is (7-1)(7-2)(7-3)/6, so the number of ways to get 58 with at least 1 being higher than 51 is 4(58-52)(58-53)(58-54)/6.

For x > 103, find the value for (206-x) and it will be the same. For example, the number of ways to get 202 is the same as the number of ways to get (206-202) = 4, or exactly 1 way.

This arises from the symmetry of the problem. You can think of the original solutions as the number of ways you can increase the values from the minimum possible roll to some higher roll. From the top, the number of ways to decrease your dice to get the symmetric value has to be the same. So, for example, the number of ways you can increase from 4 to 8 has to be the same as the number of ways to decrease from 202 to 198.

MrBlarney · September 2013

Hmm, the thread title caught my eye, so consider me interested. Blarghy, if you can read a table of standard normal probabilities (or can use Excel to obtain them) and you can calculate a standard score, you can estimate the probability of outcomes.

Based on what your OP has, I'm going to guess that you have 100 items in your table that you want to draw from, and are approximating the normal distribution from the sum of four discrete uniform draws (see: central limit theorem). The distribution of (U[1,50]+U[1,50]+U[0,50]+U[0,50])/2 (where Ui]a[/i],[i]b[/i is a uniform draw of an integer between a and b inclusive) has a mean of 50.5 and a variance of 14.58. To estimate the probability of an outcome x (the set of outcomes being the integers from 1 to 100 inclusive and all half values in between - note that this is 199 values) then you want to calculate the z-scores for x-.25 and x+.25, evaluate the cumulative distribution function (CDF) on a standard normal distribution or look them up on the table, then take the difference between them

Example: What is the approximate probability that I roll a 40? My z-scores are (39.75-50.5)/14.58 = -0.703 and (40.25-50.5)/14.58 = -0.734. The CDF values are .2398 and .2306, and the difference, .0092, is the probability of rolling a 40. This value is going to be a little inaccurate since there's about .0006 in the tails that aren't covered, but it's a fair approximation if you want something quick and dirty.

I'll be happy to elaborate a bit more if necessary.

Savant · September 2013

I'm not sure exactly what the OP wants to do, but if you want to generate random samples from a standard normal distribution you can use a regular RNG that gives numbers between 0 and 1 and plug them into the Box-Muller transform.

Since it sounds like he is trying to do something discrete I'm not sure that's what he would want to do, and maybe he wants doesn't even need to deal with the normal distribution. A binomial distribution might suffice. Perhaps someone else has a better idea what he is using this for to figure out what exactly sort of setup he would want.

Blarghy · September 2013

Thanks guys!

What I'm doing is that I have an array of 100 items (labelled 1-100), and my formula spits out a whole number between 1 and 100 (any fractions are just dropped), and then reads that number from the array. I want the numbers around the center of the table (50) to occur much more frequently than the numbers around the outsides of the table (1,100), so I can position common loot in the center and more exotic and rare stuff along the outsides.

I do have access to higher-level statistical software (intercooled stada), but the suggestion to just run 10k simulations and use the outcome to guesstimate probabilities for each discrete number worked out well enough for my purposes. Good stuff.

MrBlarney · September 2013

If you want draws to be as normal as possible, then using the binomial approximation as Savant suggests will be the best option. If you want to do something else where you're summing a smaller number of rolls (which will usually provide a flatter distribution), simulations are an easy way to approximate probabilites if you don't want to go into the detailed combinatorics like MrTLicious did.

tinwhiskers · September 2013

Just remember, people don't actually like random loot. Go post in the WoW thread: "Has anyone ever had their raid group go through an entire tier of content and not see a single X--2 handed sword, tanking shield, healer weapon etc--drop?". And you'll be able to feel the shockwave all the bursting arteries will generate.

Penny Arcade

Quick Links

Statistical Question

Posts