Thursday, September 23, 2010

1.14 The Scientific Method - What? Why a Need? The Science Civilization

Physician's Notebooks 1 - - see Homepage  
Scientific Method, You, and the Science-Civilization
Update 01 Feb. 2019
The two main sections of this chapter are the just below example of the scientific method in action and the Relevance of the Scientific Method to your life.
Readers should know the Scientific Method of establishing facts. A scientist is anyone who uses the method. A non-scientist believes what seems reasonable when received in writing or in being told by those in authority. Whether it is a belief in religion or ideology or technology or any so-called fact, the approach that separates the scientist from the non-scientist is that the scientist has to be shown and have it proven, not told. The scientist is a doubter of everything until it has been put to test by the Scientific Method.

I put the Scientific Method into 4 steps:
1. Observation and recording.
2. Within particular observation, seeing a pattern and expressing that pattern as tentative fact or rule pending testing. This is called “hypothesis.”
3. Testing the hypothesis by experiment, making predictions that follow from assuming the hypothesis to be correct and comparing the experimental result with what you might expect if the hypothesis is not correct (null hypothesis comparison) and using statistical method to ascertain that the difference between a result based on a hypothesis's being correct and the null hypothesis is a real difference and not one due to chance.
4. Either discarding the hypothesis because it does not test out or confirming it by repeat testing, which may cause the hypothesis to be modified or further added to.

 Once a hypothesis has been firmly proven by testing, it becomes a scientific theory, and after many years of no exception to its rule a scientific law. Take gravity? We observe repeatedly that objects fall from heights. One of us, Sir Isaac Newton, makes a hypothesis, to explain objects falling, that Earth, a massive body, and smaller objects exert a mutual attraction based on their respective mass and distance from each other. This hypothesis gets tested over years by taking various objects of various mass and seeing whether or not they drop to Earth and also by astronomical observations. There are almost no exceptions. Thus the theory of gravity develops. In the early 1900s Albert Einstein modifies that theory to explain small exceptions in the data and his modification, the theory of general relativity proves out and is the one we believe today. I leave out huge detail because I wish to make the point and show the Scientific Method, not to prove the theory of gravity within general relativity. Today, our successful technology depends on the theory of gravity according to general relativity and every day its usefulness is proven as our rockets probe deeper into outer space.
(To the reader: It is not necessary to read the following smaller print section unless you need to know how scientists conduct and interpret experiments)
   Now allow me to go into the nitty-gritty of scientific method so a reader can understand how to use the tools. I take an experiment a reader can perform. The question is: Will drinking a cup of caffeine coffee increase one's heartbeat rate per minute or not? The non-scientist may say “Of course it will! Everyone knows that!” But the scientist is a doubting Tom who has to be shown by proof.
   Let us take an easily determined measure, heart rate – the number of heartbeats in 60 seconds (HR)?  The question is sharpened to: “What affect after one cup of caffeine coffee?
   Before plunging into the experiment we want to consider our method of obtaining data so as to really answer the question posed by the experiment. First, we do not want to depend on one or two HR measurements to draw a conclusion. (Also not to depend on only one experiment as I do here, but I am just using this as one example) This is basic to every experiment because errors may occur in each series of the same measurement. In measuring HR the examiner may lose accurate count over 60 seconds, a heart may skip beats, an HR may vary. If we rely on just a few HR measurements, the probability of error is high. So we do multiple same-measurements – here 5 repeated HR at one go. 
   To get the average, or mean value of the set of measurements, sum 66 + 67 + 70 + 66 + 65 = 334 and divide by number of measurements (5). The mean = 334/5 = 66.8.  Since heartbeat is a countable unit, it cannot be expressed as a decimal fraction so we round off the mean to 67 as the baseline mean of HR under the resting condition against which all other HR will be compared.

The Control: In setting up experimental conditions, we want to be sure the difference between the average HR measurement after drinking caffeine coffee compared to after not drinking caffeine coffee is due to the caffeine in the coffee and not to other factors. To mention a few: the act of drinking any fluid, especially, hot; the effect of other substance in the coffee; the psychological effect of just knowing one is drinking a brown fluid that may be coffee with caffeine even though it contains little or no caffeine (decaf). We could be very rigorous and set up a series of observations to control for the effects of all factors but to keep it simple I describe 3 experimental conditions that will answer the experimental question and include a control for the most important confounding factor.

Baseline Measurement of HR at Rest (Its data already shown above has given an average HR of 67 bpm): Our baseline measurement is most important because it will serve for comparison. So we want to set baseline where the experimental subject will not be affected by the many factors other than caffeine in coffee that may influence HR. We want the subject to be at rest without worry or tension. And we want no other food or drink or drug in his body. So we choose early part of the day, shortly after awakening and while still in bed when his mind will be at ease and stomach empty for several hours.

Setting proper conditions for measuring baseline HR is the experimental protocol. (It is the rule of conditions for measurement, how often you shall measure HR and the interval between each measurement) The idea is to reproduce the same conditions with each measurement so that if you have a confounding effect it will be canceled out because it is always the same. As part of the protocol, the set of HR is taken twice: at 2 minutes and again 30 minutes after drinking 1 cup of whatever the hot fluid. The 2-minute measurement controls for the effect purely of the act of drinking of hot fluid while its contents are mostly in the stomach and not in the blood circulation, and after 30 minutes measurement should show the effect from full absorption of caffeine into the blood circulation.
   Assuming there is little or no difference at 2 minutes, the 30-minutes-after is used as data for the experimental conclusion. If a significant effect on HR is observed at 2 minutes, the experiment needs to be re-thought. Assuming this is not the case, follow the protocol to do 5 consecutive 60-second counts of heartbeat heard by stethoscope over left front of the chest and spoken into a tape recorder. (Assuming you do it on yourself, speaking into tape removes the potential effect due to the exercise of writing) The protocol calls for your 5 consecutive HR counts to be initiated after at least 5 minutes lying flat and thinking calming thoughts (eg, “I am in the best, safest place I could be and doing a good thing”).
   Adhering to this protocol of obtaining data, next set up the 3 measurement situations

1) Baseline Heart Rate after drinking 1 cup pleasantly hot water. (Shows effect of the act of drinking and also of hot water on an empty stomach)  For simplicity, we shall use the above control data for the hot water drinking part of the experiment.

2) Heart Rate after drinking 1 cup of hot decaf coffee. (Shows the effect of the non-caffeine substances in coffee and the psychology of the effect)

3) Heart Rate after caffeine coffee.

Note that the experimental situations cannot in practice immediately follow one another but must be done on different days (if you wish complete control for time of day). To also control for the potential effect of each different day of the experiment, we should repeat the experiment several times over days and average results and we will assume doing it.

Also to comment here, for simplicity I am describing this experiment as being done on 1 person. But, in an actual experiment, one should have a test population of 10 or more persons to control for individual response to caffeine.

Before dealing with the experimental result and interpretation let us consider what the range of measurement tells us. Recall we counted 66, 67, 70, 66 and 65 heartbeats for baseline HR and calculated the average rounded off to 67. It is always useful to at a glance analyze a series of measurements to get an idea of the stability of the data. A glance at the five measurements shows a range of 65 to 70. The range tells more about the accuracy of measurement than the mean alone. Note that a similar mean – 67 – could have been obtained by either of two extremes of range (eg, very similar single measurements 66, 67, 66, 67 and 67 with range 66 to 67, or very dissimilar single measurements 60, 75, 64, 73 and 63 with range 60 to 75). The wider the range, the less useful the data because a wide range in a series of experimental measurements suggests error or a confusing effect from an unknown factor. A narrow range suggests a good set of baseline measurements. Most scientists, presented with a wide range of baseline measurements, will stop the experiment and rethink the protocol, assuming that somewhere an error factor has entered.

In dealing with sets of measurements when an experiment is relatively simple and informal and the result is not meant to be published, the experimenter may choose to discard an extreme value in a series of measurements, assuming it involved an error. Thus, with 6 HR measurements 66, 67, 70, 66, 65 and 92, you might be justified in assuming that the 92 was due to some error you could not control.

Now let us go to the experiment, compare data, and draw conclusions.

1)   Warm water only HR: 66, 67, 70, 66 and 65. Range 65 to 70.  Mean 67.

2)   Decaf coffee HR: 69, 70, 72, 65, and 74. Range 65 to 74. Mean 70.

3)   Caffeine coffee HR: 78, 79, 78, 80 & 81. Range 78 to 81. Mean 79.

The preliminary conclusion seems to show that drinking decaf coffee ups HR from a mean 67 to a mean 70, and then adding caffeine further raises the mean to 79. But a scientist would not accept this conclusion without statistical testing. Her argument would be: How do we know that these increases in HR are not due to chance variation in heartbeat rate due to fluctuation in rhythm within the heart muscle, not due to the drinking? Actually, were the experiment to be repeated many times, each time would give a different set, range and mean, and once in a rare case, even without drinking or without the caffeine, the mean HR might go as high as 79. A test of statistical significance will answer the question: What are the odds against the difference between the baseline control mean and the experimental mean being due to chance alone? The simplest test for this is the Standard Deviation (SD).

Computation of SD of the mean of a set of measurements will now be demonstrated using our HR baseline data, and its use on drawing a conclusion on cause and effect shown. How to obtain SD from the set of measurements?
   a) Obtain the measurements by experiment, record the numbers in a row or column, sum them and obtain the rounded-off mean.
   b) Referring to the same set of measurements, subtract the mean from each measurement, square each remainder, sum the squares, and obtain the mean of the sum of the squares by dividing it by the number of measurements.
   c) Obtain the square root of the mean of the sum of the squares. This number is the Standard Deviation of the mean of the measurements, and it has both a positive and negative sign.
   d) Finally, write the mean of the measurements with a + and – (Usually printed ±) followed by the SD. The extreme range is then written by adding and subtracting the SD to the mean after rounding it to the same number of digits.

Let us now run the numbers. In condition 1, we do the computation step by step so that it may serve as the model to learn its general application to all experimental numbers. In 2 and 3, we give only the numerical answers.

   a)   Measurements summed to the rounded mean: 66 + 67 + 70 + 66 + 65 = 334 and 334/5 = 66.8 = 67 mean of the five HR.
   b)   Mean subtracted from each measurement, squared, and the squares summed and means of the squares obtained: (66 – 67)2 + (67 – 67)2 + (70 – 67)2 + (66 – 67)2 + (65 – 67)2 = 1 + 0 + 9 + 1 + 4 = 15, and 15/5 = 3 for the mean of the sum of the squares.
   c) The square root of the mean of the sum of the squares (Use electronic calculator or computer). The square root of 3 is 1.732 and rounded off is 2 and sign it as ±2 for 1-SD of the mean of the given set of measurements.
   d)  Now you may write the mean and SD of the given set of measurements and by extension the range of 1-SD, 2-SD, and 3-SD. The mean and 1 SD is 67±2, or a 1-SD range 65 to 69; a mean and 2 SD is 67±4, or a 2-SD range 63 to 71 and a mean and 3 SD is 67±6, or a 3-SD range 61 to 73.

2.  DECAF COFFEE: Mean and 1 SD is70±3, range 67 to 73; and 2 SD, 64 to 76; and 3 SD, 61 to 79.

3.  CAFFEINE COFFEE: Mean and 1 SD is 79±1 for range 78 to 80; for 2-SD, range 77 to 81; and for 3-SD, 76 to 82

How to Use SD for Making Scientific Conclusion? In this case How to decide whether or no the data show a significant difference between the number result obtained under different conditions? Here we want to know: Do the mathematical means of 70 heartbeats a minute after drinking decaf coffee and the 79 after caffeine coffee compared to the mean of 67 after only hot water allow us to draw conclusion that drinking decaf coffee is the cause of the increased HR obtained and that the caffeine in coffee has an additional effect to further increase the HR?  The mean number data: 67 for the warm water only; 70 for the decaf coffee; and 79 for the caffeine coffee – would seem to suggest such an additive effect. But is it a true effect due to substances in decaf coffee and caffeine coffee, or is it just chance difference that happened to occur during this particular experiment, just as you may happen to occasionally throw a very unlikely 6 and 6 on a pair of dice, once in a great while without its being a result of any particular cause other than the luck of the throw?
   In trying to answer this question, the scientist always starts off with a “null hypothesis” and his experiment is an attempt to disprove it, and by doing so to prove the opposite. “Null hypothesis” means we assume that the differences in the average measurements between the various experimental conditions in our protocol (Here warm water vs. decaf coffee vs. caffeine coffee) are not due to an effect of any of the factors as a cause of the differences in the numerical results, and that these differences can be most probably explained as being due to chance. Stated this way the question is reduced to a probability comparison. (What are the odds, based on the numerical differences, against the null hypothesis?)
   Recalling that the mean for the control HR (warm water only) was 67, it is obvious that had we obtained a mean of 67 after the experimental conditions are in effect, the odds against a null hypothesis would be so extremely small as to make an increased HR due to caffeine an impossibility and to essentially prove the conclusion that the HR differences are due to chance rather than effect. At the opposite extreme, if the experimental conditions resulted in a mean HR 2X the control (134 per minute), the odds against a null hypothesis would be so huge as to not support it and to make almost certain the conclusion that the caffeine caused the HR increase. In fact, at such extremes, we do not really need to waste effort on statistical testing because the numbers speak for themselves. (In actual experiment statistical tests are always done even when results appear obvious in order to assure a high certainty for conclusion) But in the usual case we are presented with numbers between the extremes that may impress an unsophisticated, unscientific mind but to a scientific mind need statistical confirmation. Here is where statistical analysis is best used and the question “What are the odds against the null hypothesis?” must be answered by the statistical method.
   By general agreement, it is considered that if statistical method shows the odds against the null hypothesis is 20 to 1 or greater, the experimental difference is considered statistically significant in favor of the experimental effect. This is usually expressed as the Probability or P value of 0.05 or less, meaning an extremely low probability that the null hypothesis is valid.
   Here is where the Standard Deviation is so useful. Probability mathematicians have shown by experiment with dice that each SD from the mean has a particular P value such that 1 SD includes 2/3rds or 67% of possible throws of the dice, 2 SD includes 19/20 or 95% (the P value 0.05 of statistical significance) and 3 SD includes 997/1000 or 99.7%. Thus the importance of calculating the SD! Since scientific significance starts at 95%, a difference of 2 SD or more becomes virtual proof of a causative factor. (And the greater the SD above that, the more absolute the proof)

CONCLUSION OF OUR HR EXPERIMENT: Now with the knowledge of the tools to obtain SD from measurements and knowledge of its statistical meaning we can draw scientific conclusions.

DRINKING PLEASANTLY HOT WATER: For 2 SD, 67±4, or a range of 63 to 71 heartbeats per minute includes 95% of a chance variation in HR

DECAF COFFEE: For 2 SD, 70±6, or a range of 64 to 76.  Conclusion: Since there is obviously a wide overlapping of the ranges for hot water and decaf coffee at 2 SD, we can say the null hypothesis is not contradicted by the experimental result and we have not shown an effect of decaf coffee on HR that is any different from drinking the hot water that would occur by its own chance variation. Therefore we conclude we have not shown an effect of decaf coffee on HR.

CAFFEINE COFFEE: for 2 SD, 79±2, a range of 77 to 81, which falls completely outside the 2-SD range of both the hot water and the decaf coffee result on HR. This means a P value less than 0.05 and we may conclude that the null hypothesis has been successfully contradicted and we have shown that caffeine coffee significantly speeds the heart rate and that the effect is almost certainly due to the caffeine and not coffee taste or color or heat.

The Relevance of the Scientific Method to Your Life: A knowledge of and facility in the Scientific Method can improve your life. I do not mean you must personally test every question. My advice is to start thinking scientific. Instead of believing what you read on Internet, in newspaper, in magazine, in book, or what other authority figure says on radio, TV or in conversation; rather, start off with a null hypothesis (ie, you do not believe that such and such is the cause of so and so until the null hypothesis has been shown to be highly improbable). Sometimes the showing is easy and does not need statistical analysis (eg, Gambling in Las Vegas is going to cause you unhappiness unless you are very lucky) but sometimes there will be an important question of cause (eg, Is human fossil fuel-burning causing global warming or climate change?) whose answer may impact on the quality and length of your life or of our civilization and that lends itself to testing. Many of these questions, of course, have been answered scientifically by other experimenters and published. Here is where the scientific literature comes in. The literature may be on the Internet, it may be a scientifically oriented, popular magazine like Scientific American or National Geographic, or, increasingly often for you as you become more knowledgeable and sophisticated, it will be a scientific journal you can find in University library. In using experiments done by another, still keep a seed of doubt: An experimenter can be wrong and occasionally fake. The key here is: no belief should be based on one experiment(er). If truth is a truth, and if it is important, the experiments should have been repeated and result confirmed by others repeatedly. The principle also goes for a negative experiment that seems to disprove an old tested belief. That is Science, which is to say Doubt, and you should be the Thomas.

Let me end with the Science-Civilization. This would be an advanced society but not so different from our present, except there would be no limits to our thinking (Dogmatic, ideological). For example, a government that is run purely on scientific principle. The starting basis of the laws and regulations and ideas of good behavior would be as today, our historical experience, picking out the best and discarding the worst. Some of the Ten Commandments would be a useful basis, not on religious grounds or unchangeable, but, rather, practical. (Does a particular law or rule work for the good of people in our advanced society, or not work?)
   Take capital punishment – the death penalty? It would be possible and practicable, in a Science-Civilization, to carry out an experiment in the USA in which the law for the punishment of what is now a capital crime could be alternately changed at 10-year intervals in a control population. The data, in the prevention of the crime, rehabilitation of the criminal, and the overall good or harm that such change of the law would do to the persons living in the society, would then be published with a recommendation for change in the law. Then the recommendation would be voted on by the electorate and, if at least 60% approved, the changes would be expressed as a new law.

END OF CHAPTER. To read next, click 1.15 Money! Money! Money! How to Make/To Spend/To ...

No comments: