# Statistics and sample size

DrEdinburghRegistered User regular
I am a computer programmer

I am writing a a new service to replace an old service.

I want to be confident that my new service works like the old service.

The old service gets called 30,000,000 times a day.

If I was to randomly sample 1000 of those requests from the last day and hit my new service with them how confident would I be that my new service matches the old service for that days load of queries if it came back 100% matching.

How confident would I be that it was 50% fucked if 50% of the results came back not-matching.

I feel this is close to survey margin-of-error calculation but I know that my intuitive grasp of statistics is often completely wrong so would appreciate guidance.

(Addendum: We are planning on doing a full traffic replication as well (so replaying all thirty million requests) but want a quick comparison we can be confident in for doing iterative changes during development)

I made a game, it has penguins in it. It's pay what you like on Gumroad.

Currently Ebaying Nothing at all but I might do in the future.

## Posts

• A Fool with Compassion Pronouns: He, Him, HisRegistered User regular
You probably want a sample size closer to ~2500

• Registered User regular
Is the check a binary match/not-match?

Without sampling everything you'll never be 100% confident. You need to decide a confidence level you're happy with (most shorthand 95% but you can choose 99% or 99.9% etc).

You'll then need to determine your margin of error i.e how close you need to be to the true value (say you want to be 99% confident you're in 1% margin of error)

From the sounds of it you want a very close margin of error and high confidence so I plugged 99% confident of a 0.1% or less margin of error into a quick online calculator https://www.surveymonkey.co.uk/mp/sample-size-calculator/ and got a sample size of 1.5 million.

• Registered User regular
Make sure your sample is representative of the population. ie. all types of calls are covered

• 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
If you are performing a full replication test at the end, then your iterative tests are really a matter of testing efficiency vs. risk. Rather than focusing on a number for a random sample, I'd focus more on actually writing out the scenarios you intend to cover and explicitly testing them. In particular, what you need to pay attention to are not just happy path calls but the trickier or more esoteric ones that may have caused problems in the past but absolutely need to be covered. For example, if the vast majority of calculations involve integers but you have to support decimals, how it handles non-terminating values or DIV0 errors.

Otherwise all you are doing is fishing, so you might as well run as many sample comparisons as you can afford as often as possible. Which actually isn't a bad idea...

• Registered User regular
Do you have a full matrix of possible transactions? You should be testing the conditions, with a random sample as an end just in case test. If you haven't already researched and mapped the conditions to tests, do that now. Random sample will most likely just get 90% of the same sort of transaction and a few edge cases.

• Dr EdinburghRegistered User regular
schuss wrote: »
Do you have a full matrix of possible transactions? You should be testing the conditions, with a random sample as an end just in case test. If you haven't already researched and mapped the conditions to tests, do that now. Random sample will most likely just get 90% of the same sort of transaction and a few edge cases.

The full matrix of possible queries is large 16,000,000*250*125*400

In practice we are most concerned about the ones that happen most often and can afford a few failures around the edge edge cases to ensure the common use cases are covered.

I made a game, it has penguins in it. It's pay what you like on Gumroad.

Currently Ebaying Nothing at all but I might do in the future.
• Registered User regular
Ok. One thing to look up is the allpairs method. It's an algorithm that pairs conditions as the most likely failures are from a simple pair of conditions. It's a great tool to help narrow in conditions like that. Seriously 16million variables with different handling for each value, or is it just a numeric?

• Dr EdinburghRegistered User regular
schuss wrote: »
Ok. One thing to look up is the allpairs method. It's an algorithm that pairs conditions as the most likely failures are from a simple pair of conditions. It's a great tool to help narrow in conditions like that. Seriously 16million variables with different handling for each value, or is it just a numeric?

2 variables with 4000 categorical inputs for each.