Bliss 101
Registered User regular

I have a massive array of data points that can have both positive and negative values, so a snapshot of the data may look like this:

The entire dataset consists of a bit over 100 columns and 30k rows. Each column represents a case in the sample set, and the rows are measurements. For each row, I need to calculate the mean and the variance of all negative values, and the mean and the variance of all positive values. Separately. In motherfucking Excel.

Mean is easy, because I can use the SUMIF() and COUNTIF() functions. But there is no conditional variance function in Excel. So basically I need to write a formula to calculate the variance of all cells that fulfill given criteria (either "<0" or ">0"). Any advice on how to do this would be greatly appreciated, especially if I get the advice fast.

edit: I realize a workaround to this is to split the table into two separate tables, one where all positive values are replaced with empty cells, and one where all negative values are similarly removed. I don't want to resort to this, however, because it's really not a feasible way to do what I'm doing in the long run.

0.043 0.121 -0.213 -0.331 0.227 0.121 0.242 0.121 -0.003 -0.057 -0.211 -0.243 -0.001 -0.002 0.278

The entire dataset consists of a bit over 100 columns and 30k rows. Each column represents a case in the sample set, and the rows are measurements. For each row, I need to calculate the mean and the variance of all negative values, and the mean and the variance of all positive values. Separately. In motherfucking Excel.

Mean is easy, because I can use the SUMIF() and COUNTIF() functions. But there is no conditional variance function in Excel. So basically I need to write a formula to calculate the variance of all cells that fulfill given criteria (either "<0" or ">0"). Any advice on how to do this would be greatly appreciated, especially if I get the advice fast.

edit: I realize a workaround to this is to split the table into two separate tables, one where all positive values are replaced with empty cells, and one where all negative values are similarly removed. I don't want to resort to this, however, because it's really not a feasible way to do what I'm doing in the long run.

0

This discussion has been closed.

## Posts

http://techonthenet.com/excel/formulas/dvar.php

Edit: though on second thought that may not be very feasible with your data either since you have so many columns. As far as I can tell from quick testing it only works on one column at a time.

Daenrisonfieldparameter for DVAR, and that is impossible because Excel can't handle 30k columns.Bliss 101onhttp://www.excelbanter.com/archive/index.php/t-119575.html

Apparently if you enter something like: It will give you the sum of the products of the two columns where the condition is true. So you can get the sum of squares on a column/range only where they're positive, get the sumif only where they're positive, get the number with countif, and calculate variance. Possibly... haven't actually tested it out yet but the conditional bit at least definitely works, which I've never seen before.

Edit: And I just did a quick test using this method and calculating the variance with all values in one range, and just using Excel's variance function and calculating it based on two separate columns of positive and negative, and the result is identical.

So

DaenrisonZonkytonkmanonBliss 101on