Feb

23

There's lots of discussion of correlations on this site. In many ways, it's a big data in the financial world.

Yesterday, I was working on a pharmacoepidemiology syllabus and realized that in epidemiology, we now have the technological power and data access (in theory, at least) to follow entire populations of tens of millions of people. It's the age of big data. It's the age of genomics/genomic markers. In such circumstances, do the precepts of what is a cause change? Is it a matter of biological plausibility when the correlation for a genomic market with a disease is moderately strong but there is no known mechanism (and may not be since the market may be a regulatory gene—or it may be close to the gene that has an effect)? With big data, epidemiologists may be able to follow vast populations (tens of millions even). In such circumstances, are the means of making a causal inference unchanged?

The correlations reported have been provided a course to profits for some. But if the precepts of causality haven't changed, then the correlations in the absence of other data don't provide a much of a path; those sustaining losses from using the same correlations (assuming the events themselves are random) would simply be silent. Our perception would then be that the correlations have some meaning, and we build upon them.

I'm sure I'm missing something here, but I'm also confident that those on the list can provide some needed direction.

Steve Ellison writes: 

We have sometimes discussed here the problem of multiple comparisons. If one looks for enough things in the same set of data, the odds of finding something that appears to have p <0.05, but actually occurred only by chance, increase dramatically.


Comments

Name

Email

Website

Speak your mind

Archives

Resources & Links

Search