The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Excel help: calculating variance of conditionally selected cells

Bliss 101Bliss 101 Registered User regular
edited April 2008 in Help / Advice Forum
I have a massive array of data points that can have both positive and negative values, so a snapshot of the data may look like this:
0.043 0.121 -0.213 -0.331 0.227
0.121 0.242 0.121 -0.003 -0.057
-0.211 -0.243 -0.001 -0.002 0.278

The entire dataset consists of a bit over 100 columns and 30k rows. Each column represents a case in the sample set, and the rows are measurements. For each row, I need to calculate the mean and the variance of all negative values, and the mean and the variance of all positive values. Separately. In motherfucking Excel.

Mean is easy, because I can use the SUMIF() and COUNTIF() functions. But there is no conditional variance function in Excel. So basically I need to write a formula to calculate the variance of all cells that fulfill given criteria (either "<0" or ">0"). Any advice on how to do this would be greatly appreciated, especially if I get the advice fast.

edit: I realize a workaround to this is to split the table into two separate tables, one where all positive values are replaced with empty cells, and one where all negative values are similarly removed. I don't want to resort to this, however, because it's really not a feasible way to do what I'm doing in the long run.

MSL59.jpg
Bliss 101 on

Posts

  • DaenrisDaenris Registered User regular
    edited April 2008
    Use the DVAR function. It calculates variance based on criteria.

    http://techonthenet.com/excel/formulas/dvar.php

    Edit: though on second thought that may not be very feasible with your data either since you have so many columns. As far as I can tell from quick testing it only works on one column at a time.

    Daenris on
  • Bliss 101Bliss 101 Registered User regular
    edited April 2008
    Yeah, to use DVAR I would have to transpose the entire table so that I could use each measurement as the field parameter for DVAR, and that is impossible because Excel can't handle 30k columns.

    Bliss 101 on
    MSL59.jpg
  • DaenrisDaenris Registered User regular
    edited April 2008
    Well... I just found something interesting:
    http://www.excelbanter.com/archive/index.php/t-119575.html

    Apparently if you enter something like:
    =SUMPRODUCT(D2:D47,D2:D47,--(D2:D47>0)) 
    
    It will give you the sum of the products of the two columns where the condition is true. So you can get the sum of squares on a column/range only where they're positive, get the sumif only where they're positive, get the number with countif, and calculate variance. Possibly... haven't actually tested it out yet but the conditional bit at least definitely works, which I've never seen before.

    Edit: And I just did a quick test using this method and calculating the variance with all values in one range, and just using Excel's variance function and calculating it based on two separate columns of positive and negative, and the result is identical.

    So
    sum (s) =sumif(range,">0")
    sum of squares (ss) =sumproduct(range,range,--(range>0))
    sample size (n) =countif(range,">0")
    true variance = (ss-(s*s/n))/n
    estimated variance (same as Excel's var function) = (ss-(s*s/n))/(n-1)
    

    Daenris on
  • ZonkytonkmanZonkytonkman Registered User regular
    edited April 2008
    if this is still a problem, i can maybe write a function for you tonight some time. I spend oodles of time in vba

    Zonkytonkman on
  • Bliss 101Bliss 101 Registered User regular
    edited April 2008
    Thanks! I'm gonna try Daenris's suggestion first thing in the morning. I'll whine and moan here if I can't get it to work.

    Bliss 101 on
    MSL59.jpg
  • wtpdwtpd Registered User new member
    This site has a great download that shows Daenris's step 2 http://www.sumproduct.com/thought/multiple-criteria OR http://www.sumproduct.com/wp-content/uploads/2010/03/SP-Multiple-Criteria-Examples.xls
    Daenris wrote: »
    sum (s) =sumif(range,">0")
    sum of squares (ss) =sumproduct(range,range,--(range>0))
    [/b]sample size (n) =countif(range,">0")
    true variance = (ss-(s*s/n))/n
    estimated variance (same as Excel's var function) = (ss-(s*s/n))/(n-1)
    

This discussion has been closed.