The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Statistics and sample size

Alistair HuttonAlistair Hutton DrEdinburghRegistered User regular
I am a computer programmer

I am writing a a new service to replace an old service.

I want to be confident that my new service works like the old service.

The old service gets called 30,000,000 times a day.

If I was to randomly sample 1000 of those requests from the last day and hit my new service with them how confident would I be that my new service matches the old service for that days load of queries if it came back 100% matching.

How confident would I be that it was 50% fucked if 50% of the results came back not-matching.

I feel this is close to survey margin-of-error calculation but I know that my intuitive grasp of statistics is often completely wrong so would appreciate guidance.

(Addendum: We are planning on doing a full traffic replication as well (so replaying all thirty million requests) but want a quick comparison we can be confident in for doing iterative changes during development)

I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

I made a game, it has penguins in it. It's pay what you like on Gumroad.

Currently Ebaying Nothing at all but I might do in the future.

Posts

  • EncEnc A Fool with Compassion Pronouns: He, Him, HisRegistered User regular
    You probably want a sample size closer to ~2500

  • Dis'Dis' Registered User regular
    Is the check a binary match/not-match?

    Without sampling everything you'll never be 100% confident. You need to decide a confidence level you're happy with (most shorthand 95% but you can choose 99% or 99.9% etc).

    You'll then need to determine your margin of error i.e how close you need to be to the true value (say you want to be 99% confident you're in 1% margin of error)

    From the sounds of it you want a very close margin of error and high confidence so I plugged 99% confident of a 0.1% or less margin of error into a quick online calculator https://www.surveymonkey.co.uk/mp/sample-size-calculator/ and got a sample size of 1.5 million.

  • CauldCauld Registered User regular
    Make sure your sample is representative of the population. ie. all types of calls are covered

  • Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    If you are performing a full replication test at the end, then your iterative tests are really a matter of testing efficiency vs. risk. Rather than focusing on a number for a random sample, I'd focus more on actually writing out the scenarios you intend to cover and explicitly testing them. In particular, what you need to pay attention to are not just happy path calls but the trickier or more esoteric ones that may have caused problems in the past but absolutely need to be covered. For example, if the vast majority of calculations involve integers but you have to support decimals, how it handles non-terminating values or DIV0 errors.

    Otherwise all you are doing is fishing, so you might as well run as many sample comparisons as you can afford as often as possible. Which actually isn't a bad idea...

  • schussschuss Registered User regular
    Do you have a full matrix of possible transactions? You should be testing the conditions, with a random sample as an end just in case test. If you haven't already researched and mapped the conditions to tests, do that now. Random sample will most likely just get 90% of the same sort of transaction and a few edge cases.

  • Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    schuss wrote: »
    Do you have a full matrix of possible transactions? You should be testing the conditions, with a random sample as an end just in case test. If you haven't already researched and mapped the conditions to tests, do that now. Random sample will most likely just get 90% of the same sort of transaction and a few edge cases.

    The full matrix of possible queries is large 16,000,000*250*125*400

    In practice we are most concerned about the ones that happen most often and can afford a few failures around the edge edge cases to ensure the common use cases are covered.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • schussschuss Registered User regular
    Ok. One thing to look up is the allpairs method. It's an algorithm that pairs conditions as the most likely failures are from a simple pair of conditions. It's a great tool to help narrow in conditions like that. Seriously 16million variables with different handling for each value, or is it just a numeric?

  • Alistair HuttonAlistair Hutton Dr EdinburghRegistered User regular
    schuss wrote: »
    Ok. One thing to look up is the allpairs method. It's an algorithm that pairs conditions as the most likely failures are from a simple pair of conditions. It's a great tool to help narrow in conditions like that. Seriously 16million variables with different handling for each value, or is it just a numeric?

    2 variables with 4000 categorical inputs for each.

    I have a thoughtful and infrequently updated blog about games http://whatithinkaboutwhenithinkaboutgames.wordpress.com/

    I made a game, it has penguins in it. It's pay what you like on Gumroad.

    Currently Ebaying Nothing at all but I might do in the future.
  • Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    Are these meaningful categories (e.g., reflective of different logic flows in the code) or are they just labels?

Sign In or Register to comment.