Medicine by The Numbers

Aaron F Struck, MD

10 September 2013

University of Wisconsin Department of Neurology

 My father, the accountant, once told me, “I don’t get this obsession with graphs in the boardroom—just give me the numbers.” This typifies one way of understanding data. Whether the data is generated in a man-made numerical paradigm for keeping track of resources (accounting) or as set of mostly self-consistent equations derived from theory and experiment (engineering) this understanding of data is at the forefront. If the balance sheet doesn’t add up to zero people lose their jobs. If lift is less than the gravitation force constant multiplied by mass: airplanes fall out of the sky.

 Newton laid out the basic laws of classical mechanics in 1687 in the seminal work Prinipia and the Maxwell published his formulation of electromagnetism in 1862. With these powerful tools in hand the world appeared tractable and numeric. The industry revolution and later the electronic revolution subjugated the natural world to the whim of man. These quasi-divine equations so accurately reflect nature that engineering no longer had to rely on empiricism and rules of thumb. With a few measurements and calculations, a person can determine if a bridge will stand. The success of these equations give numbers an intellectual authority that persists to this day. The danger with numbers comes when they are used out of these contexts.

 The fortuitous finding in the nascent years of science that certain elements of the natural world can be summarized beautifully by mathematical equations, led to the hubris that this finding would hold true with other aspects of our world. To some extent this has been successful. Using self-contained mathematical systems to keep track of large enterprises and populations allows for greater number of people to work together and gave rise to the nation state (money). While other times in continues to falter i.e. economics (predicating how people use money).

 If one takes a quite summer afternoon to reflect on why so much of the world is so easily understood with a few simple equations it seems like evidence of a benevolent creator. But alas no God is that kind and in the later 1800s and early 1900s we find that even classical mechanics is not entirely correct. These findings of incompleteness/incorrectness led to the advent of the post-modern physics of quantum mechanics and relatively. These theories lost the elegance of Newton and Maxwell. They are full of inherent paradoxes and asymmetries and just plain ugliness and intractability. These are the sciences that birthed ideas like dark matter/energy (stuff that is there but is not), duality (everything is a wave and a particle depending on how you squint your eyes), and uncertainty (you can know where you are going or where you are, but not both)[1] Then to really crash the party Godel comes along and says even math in and of itself isn’t self-consistent. Post-modern physics and mathematics have led to an exciting frontiers in epistemology, with paradoxes abound, but still they provide a set of principles and equations, that let the indoctrinated to build lasers and transistors.  Even the properties of unobserved particles can be predicted with great accuracy and then confirmed with experiment. For example quantum electrodynamics can predict the mass of an electron so accurately that the error compared to experiment is akin to a hair’s width in the distance between NYC and LA. This success bred competitors in the biologic, medical, and social sciences. But as of yet the great successes in these fields are not in the forms equations and numbers. Evolution and natural selection do not make direct numerical predications. EvenDNA with its beautiful four base pair data storage system does not function in any easily predictable manner. Changing a single axon in one patient could result in an intractable disease, while in another be asymptomatic. There are so many factors from DNA to phenotype in biology that even just understanding a single step like post-translation protein folding is a complex problem requiring advanced computing to accurately model.  There is another way of understanding data that is not as strictly numerical or algorithmic.  This is the way that biologic sciences and medical sciences have progressed. Numerical data is used, but with the understanding that it is limited, because there is no fundamental mathematic heuristic underpinning the numbers.  Numbers in medicine and biology need to be understood for what they are, an oblique reflection off of a wavy pond, not a machined mirror.

 Data and numbers in medicine need to be viewed not only in the prosaic linear manner of the accountant and engineer, we do not have the luxury of Laws of Motion or Maxwell’s equations. We must see data in a synthetic way. This way of looking at data is not unique to the biologic fields; it is the way of understanding that lead to initial discoveries in classical and modern physics. Einstein did just see a small difference in the calculated and actual orbits of Mercury he saw relativity. Planck did not just see that by classical mechanics the world should be a cold lifeless place, he saw that energy could only exist in finite quantities. This second way of viewing data is to not see data as a set of numbers it is to look past the numbers and see meaning. Example: two chairs facing each other in the center of the room. The type A Joe Friday person sees two chairs facing each other in the center of the room.  The type B absent-minded professor type sees the summit atYalta.

 In medicine both methodologies: the just the facts madam or the jump to meaning approach have utility and need to be master by the astute clinician.  But first one needs to understand the numbers we have available at our disposal. In medicine quantified data is abound. We have lab values, numbers from studies (RTCs, case series ect..), semi-quantitative numbers (pain scales), vital signs, the list of numbers is impressive. But not all numbers are created equal.

 The first category of number is number derived from studies. In this category of numbers, randomized control trials are given supremacy (as they should be). Every resident has experienced the flush of uncertainty as a faculty (who probably was part of the study) asks what percentage of patients benefited from something or other in some trial done 20 years ago. In neurology the study most often questioned about is the NINDS rt-TPA stroke trial. The resident is expected to know the following data modified Rankin Score (0-1) at 3 months post stroke was 26% in placebo and 39% in experimental. [2]  The idea behind knowing this specific piece of data is to provide your patients with a quotable piece of data when in the heat of a new onset stroke. 

 A quotable number can have great advantages in talking with a patient. It is a tangible piece of evidence that the physician has knowledge of the disease. The use of definite numbers presumes an exactness that fills the physician with authority. Numbers give the physician and patient a sense of confidence and control about an event that is by its very nature stochastic and at best partially remediable.  The number is often presented as a little nugget of truth in the swirling chaos of the emergency room. Such as: if you get this clot-busting drug you have a 13% better chance of having a good outcome.

 Unfortunately it is not a true statement the true statement would read more like we are 95% sure that between 1.3% and 26% of people in a theoretic population that does not really exist, but shares some similar characteristics to you, that we think are important, had a better functional outcome at 3 months.

 People in general have a difficult time quantifying risk in the best circumstances and to try and explain the data in a meaningful way is complex and dependent on level of sophistication of physician and patient. In some ways it is more honest to use more general terms like we have found that some patients benefit from this treatment and this has been studied and most experts believe it to be true. Giving an exact percentage is misleading because it suggests a level of precision/exactness of knowledge that does not exist. Just as when on the nightly news we all cringe when the broadcaster gives the one liner “Science shows …..” We know that “Science” does not exist, what exist are various methodologies to test hypotheses all with their own inherent flaws and with incredibly varying levels of credibility. A survey showing correlation of poverty and domestic violence is not science in the way electromagnetism is science.

 Electromagnetism has an underlying set of heuristics that have sustained every experimental challenge thrust its way.  We in medicine do not have the unbending girders of first principles. Medicine has simple hypotheses with limited generalizability and the gift of hyperbolically time-consuming and expensive experiments to test even these anemic hypotheses. The reality is that prior to RTCs medicine was based on well meaning conjecture from astute observations and animal experiments. It had its successes (phenobarbital), and some set backs (post-menopausal estrogen replacement).

 The RTC brought experimentally validated treatments to clinical medicine and has moved medicine into a scientific age, but ultimately it is limited. Without a system of underlying principles every hypothesis (clinical situation) needs to be tested and there are simply not enough patients or time in the world to test every clinical situation to statistical satisfaction. It is also unlikely that medicine will ever have elegant underlying principles that negate the need for overwhelming empiricism. The solution will take the form of computer models of the human body and of populations. These models will allow for the majority of experiments to be in a theoretical space and physical experimentation will be use for validation. Much the same way we try and narrow the hypotheses of human disease with animal models.  Despite all their weakness RTCs provide the most reliable numbers we quote to our patients. Numbers derived from case and surgical series without the benefits of blinding and randomization are even more tenuous.

 The other large set of numbers that physicians encounter are the day to day data points inherent to patient management. These lab values, vital signs, intracranial pressures, visual acuities, pain scale, MRC strength numbers are filled with there own set of problems. The errors in these types of values break down into two general categories, the systemic error and the random error. The systemic error is an error of method or equipment. Like when the arterial line pressure differs from the cuff pressure by some fixed amount, clearly something in the process of measuring the blood pressure by the two methods is different and this results in a fixed difference. One of them is probably closer to the true blood pressure and the other has some fixed systemic error such as a poorly calibrated dynamometer.  Another example would be drawing blood from in IV diluted with normal saline. This will give inappropriate electrolyte values, and is an error of process. These errors are usually apparent to the attentive physician, but can be disastrous to the distracted doctor that is working reflexively off of numbers.

 The other type of error is more insidious. This is the inherent random error in the test. This is the difference in sodium levels that can be obtained from running the test repeatedly on the same sample of serum.  These kind of errors are quantified by the coefficient of variation (CV) defined as the standard deviation/mean. It is usually represented by a percentage. For most laboratory tests this level is <10% and may be <5% for many well standardized tests.[3] While this represents a high level of precision it still means that small or even somewhat large variations in the reported value may be within the margin of random error. This is especially important when trending data such as sodium levels or hemoglobin, or when a result is borderline.  For example from a large study the intraperson CV for hematocrit  was 2.49% [4]. This would mean to obtain a hematocrit that is three standard deviations from the prior test the repeat test has be 3.74 different from a baseline of 50. This means drops less than 3.74 are within the random error of the test. For a test with a CV of 10%, a 30% change from baseline would be needed to have the same level of confidence in an actual change.  The point being that even standardized lab values can have substantial random, not to mention systemic errors (like drawing a drug level not on trough) and need to be interpreted within that context.

 Numbers provide a structure to daily life that is inculcated from a young age. From before grade school we become attached to numbers like when Sesame Street starts. The concept of attaching a numerical value to objects and services is so deeply ingrained in our psyches that money seems as fundamental and physically real as trees or rivers. The success of equations in the physical sciences reinforced the power of numbers in the modern mind, but in medicine we do not deal with these types of numbers. We do not have the luxury of exactness that comes from a man-made valuation system like money, nor do we have the gift of robust quasi-universal equations that can provide answers in the absence of empiricism. In medicine we have a lot of uncertainty, from the empirical basis of our knowledge to the daily measures we take of our patients. This uncertainty does not have to be paralyzing, but it is important to recognize. We cannot make decisions on numbers alone. We must not fail to recognize that things that resist quantification still can have value (take evolution for example, or clinical impression of a “sick child”).  It also means that things that are not naturally quantifiable concepts like quality and pain maybe should avoid false quantifications.  It also means that knowing numbers should not be mistaken for knowledge or expertise. I would not pick my surgeon based only on her quotable knowledge of studies or even her surgical outcomes.  In means that memorizing a patient’s last 8 hemoglobins and last 4 sets of vitals still won’t tell you if the patient is bleeding.

[1] Also transistors, lasers, MRIs, and atomic energy/bombs

[2] N Engl J Med. 1995 Dec 14;333(24):1581-7

[3] Wians FH; “Clinical Laboratory Tests: Which, Why, and What Do The Results Mean?”, Lab Medicine, 40, 105-133, 2009

[4] Lacher DA, Barletta J, Hughes JP; “Biological Variation of Hematology Tests Based on the 1999-2002 National Health and Nutrition Examination Survery” National Health Statistics Reports, 54,  1-12, 2012

This entry was posted in Medical Knowledge, Patient Care, Practice-Based Learning and Improvement, Systems-Based Practice. Bookmark the permalink.

3 Responses to Medicine by The Numbers

  1. Justin A. Sattin says:


    First, thanks for writing! I really was hoping that I wouldn’t be the only one contributing here.

    This is a substantial piece and there are many different aspects to potentially comment on. I’ll make just a few that pertain to tPA data and the consent process.

    First, there are two references that I’ll be using in my forthcoming Continuum piece regarding the ethics of acute stroke care that I think are pertinent here. One is an examination of various decision support tools that can ostensibly be used to decide whether to administer the drug or not and to help explain the pros and cons to patients and families. Per the authors, these tools’ development are lacking in rigor:

    Flynn D, Ford GA, Stobbart L, Rodgers H, Murtagh MJ, Thomson RG. A review of decision support, risk communication and patient information tools for thrombolytic treatment in acute stroke: lessons for tool developers. BMC Health Serv Res. 2013; 13: 225.

    The other is a really interesting qualitative study of how patients and doctors perceive the consent process and what their expectations are. Face to face communication, shaping of the decisions by the physicians (as opposed to “here’s the data–you decide”), incremental provision of information, and communication tailored to the individual patient and circumstances emerged as important values:

    Murtagh MJ, Burges Watson DL, Jenkings KN, et al. Situationally-sensitive knowledge translation and relational decision making in hyperacute stroke: a qualitative study. PLoS One. 2012;7:e37066.

    Your point that “We are 95% sure that between 1.3% and 26% of people in a theoretic population that does not really exist, but shares some similar characteristics to you, that we think are important, had a better functional outcome at 3 months.” is well-taken. I’ll just quibble that the population (sample, actually) from which these results were derived was, in fact, real and not theoretical. It consisted of the ~ 660 subjects in the NINDS tPA trials. I think the larger point you’re making is that the data are derived from a sample of the acute stroke population and so the results apply best to that population; how an individual patient will fare is of course impossible to state with certainty.

  2. Khalid says:

    Aaron ,, amazing writing , although i had to use the dictionary !! but i totally agree that we have to move and look beyond the numbers and i think if you just estimate how many decision you make in your daily practice that is not dependent on RTC or any data per se , i think i will be right if i say more than 80% . which i think is what gives the field of medicine its ” Art ” and that human and life can never be judged or calculated in numbers, adding spirituality and values to that , which essentially cannot be measured !

  3. Aaron Struck says:

    I agree maybe theoretic is not the best term. A RTC is meant to randomly select subjects from a larger population. That larger population can then have the results generalized to them. Why I say theoretic is that patients in RTCs, even well designed ones, are often not the exact same patients to which the results are applied. For example most RTCs are carried out at one or two more maybe even several academic medical centers, generally in urban areas. These patients also willing to consent to be subjects in an experimental study. While these maybe trivial differences to the generalizability of some studies in others they may be not. We for the most part just assume that they are insignificant, but may be sometimes they are not. Maybe a certain group of patients like African Americans that for historic reasons are less likely to particpate in experiments, or Native American’s that are typically not near urban centers respond differently to certain drugs (we know it is true for certain heart failure meds). My point is that a study population is not just defined by inclusion/exclusion criteria it is also where and on who and even when (prior to the advent of a certain standard treatment or not) the study is performed. And these second parameters may have an affect on the generalizability of the study.

Comments are closed.