The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Statistical Question

BlarghyBlarghy Registered User regular
Hey folks,

I'm currently working on a custom random loot generator for my DnD group. I want my loot drops to work on a bell shaped curve, so I'm using an RNG to generate 0 to 50 twice, and 1 to 50 twice, add all four numbers together and divide by 2 (to normalize the distribution). From this I know I can calculate the standard deviation (about 15) and use the 68-95-99.7 percent rule to spit-ball about how often a range of numbers will come up.

However, I'd like to get a little bit better idea about how the probabilities are likely to break down more finely (say, how often a 10 will come up). Anyone know how to do this (or point me in the direction of a utility that can help me do this)? It doesn't have to be anything super accurate, a rough estimate is good enough for my purposes.

Posts

  • tarnoktarnok Registered User regular
    edited September 2013
    To get more precise you're going to want the error function. Check the wikipedia article for the normal distribution. You can find the error function under the heading "Cumulative distribution." The error function gives you the probability of a random variable occurring between two given values. Integral calculus will be required.

    There are programs that can work this out for you but the only ones I am aware of are too expensive to consider for this unless you're dealing with heavy mathematical lifting every day. If you have a friend who's good with math you might be able to convince them to section your bell curve up in 10% increments in exchange for lunch.

    edit: It might actually be easier to use the CDF. Depends on how you look at it I guess.

    tarnok on
    Wii Code:
    0431-6094-6446-7088
  • RendRend Registered User regular
    The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

  • tarnoktarnok Registered User regular
    Rend wrote: »
    The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

    I think I see how this works, but that only works if your random variable is actually or effectively a whole number, yes?

    Wii Code:
    0431-6094-6446-7088
  • RendRend Registered User regular
    tarnok wrote: »
    Rend wrote: »
    The probability of a given outcome is the height of the curve at that point. So, if you actually have your curve plotted out, you can just check your graph and away you go.

    I think I see how this works, but that only works if your random variable is actually or effectively a whole number, yes?

    Well, no.

    You can measure the height of a graph at any point, whether that point is an integer or not. The X-axis is a number line, and the Y-axis are the probabilities of each of those possibilities. The only difference between whole numbers and a continuous spectrum on the x-axis is that you need calculus to confirm that the area under the curve is equal to 1. Measuring the height of the graph is a simple operation either way.

  • GenlyAiGenlyAi Registered User regular
    edited September 2013
    I've got two points for you:

    First, if you just want a bell-shaped curve, and don't specifically need it to behave like the sum of four uniformly distributed integers, you can use the NORMDIST function in excel to estimate the probabilities. You need to set the "cumulative" argument to TRUE. Then, to find out what fraction of the time you will get, say, between 10 and 11, you can do '=NORMDIST(11,mean,stdev,TRUE)-NORMDIST(10,mean,stdev,TRUE)'. This is in excel 2003.

    Second, if you do want it to behave like integers, as you specified, just simulate it. You can use excel to approximate this, by putting a formula on each row to reflect your sum (ie '=(INT(RAND()*51)+INT(RAND()*51)+INT(RAND()*50)+1+INT(RAND()*50)+1)/2'), filling it down for 10,000 rows, then just counting what fraction of the time you have a 10 or 10.5. This kind of thoughtless simulation is generally my solution to all statistical problems.

    GenlyAi on
  • tarnoktarnok Registered User regular
    I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

    Wii Code:
    0431-6094-6446-7088
  • RendRend Registered User regular
    tarnok wrote: »
    I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

    Though mathematically true, you can still simply measure the height of the graph to get the information the OP wants.

    For one thing, the OP is not asking about calculus, just how to get the probability of a given event occurring, and measuring the height of the graph at the point he's looking for will work regardless of whether he's dealing with a discrete or continuous variable.

    Secondly, he's dealing with a discrete variable, so even if he wanted to know literally everything there was to know about this particular problem, continuous variables would be irrelevant anyway.

  • Zombie HeroZombie Hero Registered User regular
    edited September 2013
    How often are you giving out loot?

    In the long term, a normal distribution will be relatively predictable, but if you only hand out loot less than a dozen times you still might get wild results. If you hand out loot at least 25 to 30 times it should be fine, though.

    Zombie Hero on
    Steam
    Nintendo ID: Pastalonius
    Smite\LoL:Gremlidin \ WoW & Overwatch & Hots: Gremlidin#1734
    3ds: 3282-2248-0453
  • Zombie HeroZombie Hero Registered User regular
    edited September 2013
    Ran a simulation for you. 10,000 trials and i just used a round function to make it discrete.

    Number Frequency Proportion
    4 1 0.0001
    5 6 0.0006
    6 18 0.0018
    7 19 0.0019
    8 35 0.0035
    9 40 0.004
    10 68 0.0068
    11 81 0.0081
    12 93 0.0093
    13 139 0.0139
    14 158 0.0158
    15 224 0.0224
    16 254 0.0254
    17 282 0.0282
    18 355 0.0355
    19 391 0.0391
    20 438 0.0438
    21 478 0.0478
    22 469 0.0469
    23 485 0.0485
    24 519 0.0519
    25 560 0.056
    26 535 0.0535
    27 506 0.0506
    28 508 0.0508
    29 469 0.0469
    30 439 0.0439
    31 401 0.0401
    32 388 0.0388
    33 306 0.0306
    34 279 0.0279
    35 249 0.0249
    36 193 0.0193
    37 161 0.0161
    38 120 0.012
    39 96 0.0096
    40 85 0.0085
    41 54 0.0054
    42 33 0.0033
    43 27 0.0027
    44 15 0.0015
    45 15 0.0015
    46 6 0.0006
    47 2 0.0002

    SUM 10000 1

    Zombie Hero on
    Steam
    Nintendo ID: Pastalonius
    Smite\LoL:Gremlidin \ WoW & Overwatch & Hots: Gremlidin#1734
    3ds: 3282-2248-0453
  • tarnoktarnok Registered User regular
    Rend wrote: »
    tarnok wrote: »
    I meant the height of the graph corresponding to the probability of that outcome. In a continuous probability space the probability of any given outcome is zero. Only the probabilities of ranges of outcomes can be calculated and those are equal to the area under the curve in that range. Which is equal to the height of the curve if your range covers a distance of one, ie, one particular whole number outcome in a discrete probability space.

    Though mathematically true, you can still simply measure the height of the graph to get the information the OP wants.

    For one thing, the OP is not asking about calculus, just how to get the probability of a given event occurring, and measuring the height of the graph at the point he's looking for will work regardless of whether he's dealing with a discrete or continuous variable.

    Secondly, he's dealing with a discrete variable, so even if he wanted to know literally everything there was to know about this particular problem, continuous variables would be irrelevant anyway.

    So the answer to the question I was asking is yes.

    Wii Code:
    0431-6094-6446-7088
  • RendRend Registered User regular
    edited September 2013
    tarnok wrote: »
    So the answer to the question I was asking is yes.

    No. Measuring the area under the curve is how you gain the probability of an outcome, regardless of whether it's continuous or discrete. The big difference is a discrete variable has a pre-set width, but a continuous variable has a variable width. Either way, you are still measuring the area under the curve though.

    Rend on
  • tarnoktarnok Registered User regular
    Then I have done a very poor job of asking the question because what you just said is exactly what I meant in my original question.

    Wii Code:
    0431-6094-6446-7088
  • MrTLiciousMrTLicious Registered User regular
    So I'm a bit late to the party but I thought I'd stop by with a solution to the exact probabilities of the given question because it's a pretty interesting combinatorics problem that I did because of this thread and gosh darn it you will all see my results! Blarghy, to get rough estimates, you should probably just simulate it, but this will give you an exact answer for getting a certain sum if you care to be that precise. Solution details are spoilered, and the general outline/formulas are unspoilered:

    Important note: I'm going to give you the number of ways to get sums between 4 and 202 (i.e., the dice go from 1-50 and 1-51 instead of 1-50 and 0-50, and I'm not dividing by 2). This is just to make the formulas cleaner. To get a probability, translate my sum to yours (subtract 2 then divide by 2), then divide by the total number of outcomes (50^2)(51^2)

    For x between 4 and 53, the number of outcomes is (x-1)(x-2)(x-3)/6
    This is actually a pretty well-known problem. The question is equivalent to the following: many ways are there for 4 positive integers to sum to x, with order mattering. For more details, look up Stars and Bars. These problems are the same because with x no more than 53, we don't run into the situation where one of the dice will be capped at the die max, so every outcome of the restated problem translates to exactly one roll of the dice with the same sum.

    For x = 54, the number of outcomes is (x-1)(x-2)(x-3)/6 - 2 = 23424
    Here we have the same formula except for a strange -2 at the end. The reason for this is that the original formula assumes there are 4 ways to get one die with a 51 and 3 with a 1: {51,1,1,1}, {1,51,1,1}, {1,1,51,1}, and {1,1,1,51}. In reality, there are only 2, because only 2 of your dice only go to 50.

    For x = 55 to 103, the number of outcomes is (x-1)(x-2)(x-3)/6 - (x-52)(x-53) - 4*(x-52)*(x-53)*(x-54)/6
    Now we have the same formula again, but we need to remove a bunch of the outcomes. First, we need to remove any outcomes where one of the numbers is larger than 51, second we need to remove half of the outcomes where one of the dice is exactly 51 (because we only have 2 dice that go up to 51, instead of 4 which the formula assumes).

    To figure out how many outcomes have a 51, just note that if one of the dice is a 51, then the rest of the dice have to sum to x-51. Thus, the question becomes: How many where are there for 3 dice to sum to x-51. Again, using the stars and bars method, we see that this is (x-52)(x-53)/2. Normally, we would multiply this number by 4 to get the number of outcomes that have a 51 (because there are 4 dice that could get a 51), and that number is included in the positive part of the formula. However, since 2 dice do not have a 51, we need to subtract out that number times 2, or (x-52)(x-53). This is a generalization of the step where we subtracted 2 in the formula where x = 54.

    To figure out how many outcomes have a dice with a number higher than 51, we again translate the problem. Instead of asking, for example, how many ways there are for 4 positive integers to sum to 58 with exactly one being higher than 51, instead ask how many ways there are for 4 positive integers to sum to 7 (58-51), and multiply this number by 4. These turn out to be the same because for every outcome that sums to 7 (for example, {1, 3, 2, 1}), there are 4 ways to turn it into an outcome that sums to 58 by adding 51 to one of the dice ({52,3,2,1},{1,54,2,1},{1,3,53,1}, and {,1,3,2,52}). As before, the number of ways for 4 positive integers to sum to 7 is (7-1)(7-2)(7-3)/6, so the number of ways to get 58 with at least 1 being higher than 51 is 4(58-52)(58-53)(58-54)/6.

    For x > 103, find the value for (206-x) and it will be the same. For example, the number of ways to get 202 is the same as the number of ways to get (206-202) = 4, or exactly 1 way.
    This arises from the symmetry of the problem. You can think of the original solutions as the number of ways you can increase the values from the minimum possible roll to some higher roll. From the top, the number of ways to decrease your dice to get the symmetric value has to be the same. So, for example, the number of ways you can increase from 4 to 8 has to be the same as the number of ways to decrease from 202 to 198.

  • MrBlarneyMrBlarney Registered User regular
    edited September 2013
    Hmm, the thread title caught my eye, so consider me interested. Blarghy, if you can read a table of standard normal probabilities (or can use Excel to obtain them) and you can calculate a standard score, you can estimate the probability of outcomes.

    Based on what your OP has, I'm going to guess that you have 100 items in your table that you want to draw from, and are approximating the normal distribution from the sum of four discrete uniform draws (see: central limit theorem). The distribution of (U[1,50]+U[1,50]+U[0,50]+U[0,50])/2 (where Ui]a[/i],[i]b[/i is a uniform draw of an integer between a and b inclusive) has a mean of 50.5 and a variance of 14.58. To estimate the probability of an outcome x (the set of outcomes being the integers from 1 to 100 inclusive and all half values in between - note that this is 199 values) then you want to calculate the z-scores for x-.25 and x+.25, evaluate the cumulative distribution function (CDF) on a standard normal distribution or look them up on the table, then take the difference between them

    Example: What is the approximate probability that I roll a 40? My z-scores are (39.75-50.5)/14.58 = -0.703 and (40.25-50.5)/14.58 = -0.734. The CDF values are .2398 and .2306, and the difference, .0092, is the probability of rolling a 40. This value is going to be a little inaccurate since there's about .0006 in the tails that aren't covered, but it's a fair approximation if you want something quick and dirty.

    I'll be happy to elaborate a bit more if necessary.

    MrBlarney on
    4463rwiq7r47.png
  • SavantSavant Simply Barbaric Registered User regular
    edited September 2013
    I'm not sure exactly what the OP wants to do, but if you want to generate random samples from a standard normal distribution you can use a regular RNG that gives numbers between 0 and 1 and plug them into the Box-Muller transform.

    Since it sounds like he is trying to do something discrete I'm not sure that's what he would want to do, and maybe he wants doesn't even need to deal with the normal distribution. A binomial distribution might suffice. Perhaps someone else has a better idea what he is using this for to figure out what exactly sort of setup he would want.

    Savant on
  • BlarghyBlarghy Registered User regular
    edited September 2013
    Thanks guys!

    What I'm doing is that I have an array of 100 items (labelled 1-100), and my formula spits out a whole number between 1 and 100 (any fractions are just dropped), and then reads that number from the array. I want the numbers around the center of the table (50) to occur much more frequently than the numbers around the outsides of the table (1,100), so I can position common loot in the center and more exotic and rare stuff along the outsides.

    I do have access to higher-level statistical software (intercooled stada), but the suggestion to just run 10k simulations and use the outcome to guesstimate probabilities for each discrete number worked out well enough for my purposes. Good stuff.

    Blarghy on
  • MrBlarneyMrBlarney Registered User regular
    edited September 2013
    If you want draws to be as normal as possible, then using the binomial approximation as Savant suggests will be the best option. If you want to do something else where you're summing a smaller number of rolls (which will usually provide a flatter distribution), simulations are an easy way to approximate probabilites if you don't want to go into the detailed combinatorics like MrTLicious did.

    MrBlarney on
    4463rwiq7r47.png
  • tinwhiskerstinwhiskers Registered User regular
    Just remember, people don't actually like random loot. Go post in the WoW thread: "Has anyone ever had their raid group go through an entire tier of content and not see a single X--2 handed sword, tanking shield, healer weapon etc--drop?". And you'll be able to feel the shockwave all the bursting arteries will generate.

    6ylyzxlir2dz.png
Sign In or Register to comment.