# Excel help: calculating variance of conditionally selected cells

Registered User regular
edited April 2008
I have a massive array of data points that can have both positive and negative values, so a snapshot of the data may look like this:
0.043 0.121 -0.213 -0.331 0.227
0.121 0.242 0.121 -0.003 -0.057
-0.211 -0.243 -0.001 -0.002 0.278


The entire dataset consists of a bit over 100 columns and 30k rows. Each column represents a case in the sample set, and the rows are measurements. For each row, I need to calculate the mean and the variance of all negative values, and the mean and the variance of all positive values. Separately. In motherfucking Excel.

Mean is easy, because I can use the SUMIF() and COUNTIF() functions. But there is no conditional variance function in Excel. So basically I need to write a formula to calculate the variance of all cells that fulfill given criteria (either "<0" or ">0"). Any advice on how to do this would be greatly appreciated, especially if I get the advice fast.

edit: I realize a workaround to this is to split the table into two separate tables, one where all positive values are replaced with empty cells, and one where all negative values are similarly removed. I don't want to resort to this, however, because it's really not a feasible way to do what I'm doing in the long run.

Bliss 101 on

## Posts

• Registered User regular
edited April 2008
Use the DVAR function. It calculates variance based on criteria.

http://techonthenet.com/excel/formulas/dvar.php

Edit: though on second thought that may not be very feasible with your data either since you have so many columns. As far as I can tell from quick testing it only works on one column at a time.

Daenris on
• Registered User regular
edited April 2008
Yeah, to use DVAR I would have to transpose the entire table so that I could use each measurement as the field parameter for DVAR, and that is impossible because Excel can't handle 30k columns.

Bliss 101 on
• Registered User regular
edited April 2008
Well... I just found something interesting:
http://www.excelbanter.com/archive/index.php/t-119575.html

Apparently if you enter something like:
=SUMPRODUCT(D2:D47,D2:D47,--(D2:D47>0))

It will give you the sum of the products of the two columns where the condition is true. So you can get the sum of squares on a column/range only where they're positive, get the sumif only where they're positive, get the number with countif, and calculate variance. Possibly... haven't actually tested it out yet but the conditional bit at least definitely works, which I've never seen before.

Edit: And I just did a quick test using this method and calculating the variance with all values in one range, and just using Excel's variance function and calculating it based on two separate columns of positive and negative, and the result is identical.

So
sum (s) =sumif(range,">0")
sum of squares (ss) =sumproduct(range,range,--(range>0))
sample size (n) =countif(range,">0")
true variance = (ss-(s*s/n))/n
estimated variance (same as Excel's var function) = (ss-(s*s/n))/(n-1)


Daenris on
• Registered User regular
edited April 2008
if this is still a problem, i can maybe write a function for you tonight some time. I spend oodles of time in vba

Zonkytonkman on
• Registered User regular
edited April 2008
Thanks! I'm gonna try Daenris's suggestion first thing in the morning. I'll whine and moan here if I can't get it to work.

Bliss 101 on
• Registered User new member
sum (s) =sumif(range,">0")