INTRODUCTION TO STATISTICS
Presented by Dave Burhoe
January 25, 2001
Statistics and Social Fact (Truth)
The British statesman, Benjamin Disraeli is often quoted as saying about a hundred years ago that there are three kinds of lies: lies, damned lies, and statistics. How many of you have heard the axiom? “Figures don’t lie, liars figure!” and also, “If you torture the data long enough, they’ll admit to anything!”
One of the reasons for this is, statistics are gathered because of a particular question that a researcher wants answered. He or she chooses characteristics of that question to ask a “sample” of people about. This is one reason why statistical results differ, because the characteristics or variables are selective – they look at certain aspects of the problem and ignore other aspects, characteristics or variables. This will be elaborated on below.
Thus, you can find statistics that validate or “prove” particular points of view: that the average U.S. family is becoming richer and that it is becoming poorer; or that “prove” that incidents of crime are increasing in society or in a selection of schools or that they are decreasing; one can show that the traditional family is both disappearing and returning, etcetera (Maier, 1999. 3rd ed.). None of these surveys have considered all the variables. They have looked at particular issues only, or a particular period of time, or a particular region…
Here are a few statistics on recent disasters and their results. What message do you get from these statistics?
|Date||Hazard||No. of Deaths||$ damage|
|1992||Hurricane Andrew hits USA||74||$ 30B|
|1991||Floods in S. China||3074||$ 15B|
|1991||Cyclone Bangladesh||140,000||only $ 3B|
|1991||Civil uprising in Rwanda||500,000||what $ value?|
|Source: Frances & Hengeveld 1998|
The social science student must always be critical and ask himself/herself questions like: What do these statistics reveal, to whom, and why? To an engineer, what matters from the wind storm damage is that structures were not well built. He/she will look at the characteristics of structure and of building codes. The economist will look at the economic damage and characteristics of where money was spent and how to avoid such major costs. Strangely, disaster relief is economically considered a healthy contribution to the GDP (Gross Domestic Product).
Perhaps to a sociologist, the above statistics reveal that 22 % of the loss of life shown results from natural disasters (which is horrendous), but even worse is the statistic that 78 % of the loss of life is from civil strife.
Another important message one can conclude from these calculated figures, ($ and deaths), is that the MDCs (More Developed Countries) have both the institutional and financial assets to better manage growth, construction and death rates than the LDCs. A look at some population statistics and demographic characteristics again illustrate this same discrepancy.
Show Population Age Structure Pyramid
One important use of statistics is to give a visual reminder of the issues we, as creatures that share this planet, are facing. The statistics of the MDCs demonstrate clearly that something works to control the problems, and if these secrets were shared equitably in the world, we could have the same results around the globe. It doesn’t matter if that sounds utopic. What is important is that this is a critical consideration for research in the social sciences. The statistics are valuable to illustrate the problem, but they are extremely valuable as the starting point for finding or identifying the solutions.
Statistics are NOT as useful in themselves, however, to effect social change. For ex. Showing youth the effects of smoking will not necessarily reduce youth smoking. From a purely statistical point of view, we could conclude, why waste the time and money and just let nature takes its course. So, statistics are just that, figures to give us a picture of the intensity of the problem so that action is taken (or to hide it so that action is not taken).
Historian Andrew Lang quipped that people use statistics like a “drunken man uses lampposts –
for support rather than illumination.” And about 100 years ago, statesman benjamin Disraeli said, “There are three kinds of lies: lies, damned lies, and statistics!”
For instance, the publicity claim that “Fallout toothpaste is recommended by 7 out of 10 dentists” is useless if they only asked 10 dentists out of the tens of thousands of dentists in NA.
Alberta Transportation found that drivers aged 18-19 had an accident rate of 27.3/1000 as compared to seniors with a rate of 10.3/1000. This should be a better comparison because the survey considers 1000 cases. However, a U of A psychologist, Dobbs, said the statistics are still misleading because seniors generally drive far less kilometers, avoid rush-hour traffic, winter snowstorms and longer trips.” If one considers these factors (variables) in the equation, seniors and 18-19 year-olds have the same accident rate for kilometres driven, and the rate actually rises as one gets older. With the demographic shift to greater and greater numbers of aging “baby-boomers”, Dobbs actually recommends stricter testing of seniors.
So, the conclusion is?? Statistics DO have visual and educational impact. It might be useful to determine if the general public tends to accept statistical results as valid since they are produced by reputable firms Gallup, Angus Reid, ECOS, and others.
Census Statistics and Demographic (Population) Characteristics
The largest sample size used for the study of population characteristics is from the Federal Census Bureau. It is written in the Constitution of each country that the federal government is responsible for Census statistical data on the population. The reasons are obvious:
- for purposes of taxation – so that taxation can be made equitably based on income and number of persons per family
- to keep records for health, school and infrastructure services
- to monitor the population for various reasons
The federal governments of the U.S. and Canada conduct a full decennial census of the entire population. In each country, there are two questionnaires used for this purpose:
- the Basic Questionnaire is distributed to 4 out of 5 (or 5 out of
- of the homes in the country. It asks questions such as: How many children, how many wives, husbands, etc., age of each person, income sources, mother tongue, languages spoken, etc.?
- the Detailed Questionnaire is distributed to every fifth (Canada) or sixth (U.S.) household in the country and thus represents a “sample” of the population, in this case 1/5 (20%) or 1/6 (16.7%) of the entire population. The “detailed” study goes more in-depth into population and social characteristics and issues. But, the emphasis on social issues changes with time and the census questionnaire evolves in order to keep abreast of evolving issues. Many government policies (e.g. Immigration and economic policy) are linked and are determined based on census results. The fact that the questions asked change from one census to the next makes it somewhat difficult for social scientists to study trends because the questions may appear or disappear from census to census. Usually detail is added, including new countries as sources, new religious affiliations, or grouping countries together. Future censuses must deal with multi-racial “ethnicity” or “ethnic origin” questions. Former questions such as “race” have been shown to have no scientific value, but they do have social value to people.Research Results Statistics
The following stats on selected North Country Counties were gathered by a research group at State University of Plattsburgh. If you are interested, for ex., in the number of Native Americans with private businesses and how many employees they have, you would be frustrated by this grouping. This happens a lot in research and often results in the research having to conduct his/her own survey.
Add Table 1 here
Because social issues appear to be changing at a much faster rate than in the past, Canada and the U.S. have begun (1985) conducting a mini-Census to update the census on a five-year basis. This process involves selecting a “random sample” – statistical results using a “randomly generated sample” can give “Confidence Levels” of between 95% and even 99% accurate results for the entire population depending on the “sample size.” What this means is that, if the sample is selected properly and is representative of the distribution of the target population, both governments and private companies can conduct “surveys” on most important issues, population characteristics, and even voting and election results using a “random sample” to actually predict results with a 0.03 (or 3%) Margin of Error (Triola et al 1999). This has meant enormous $ M savings for hundreds of private companies, governments departments, charitable and other organizations, but also:
provided researchers with accurate data on which to justify and validate both Quantitative and Qualitative research.
There are a number of formats available to illustrate statistical comparison. A number of computer programs can be used: Excel, Quattro Pro, Statistica, SPSS, SAS, NSDStat. Sociologists should be familiar with some of these names. SAS (Statistical Analysis Software) and SPSS (Statistical Program for the Social Sciences) have been around for a number of years and are the most frequently used. Actually census data is voluminous and only recently have PCs had the drive space and RAM to be able to process this. Many still choose SAS or SPSS in UNIX to process statistical data. Universities in Canada and the U.S.A. cooperate on data-sharing initiatives.
Types of Surveys
We can now go on the Internet and see some of the sources and surveys, a typical Survey Questionnaire that the researcher could design and how involved a survey questionnaire really is. Typical problem areas that can invalidate statistical research are explored in a number of books on the subject:
- Self-selected surveys – in which respondents themselves decide whether to be included (i.e. it is no longer a “random sample” and is, therefore, not representative of all the target population).
- Push-polling – the order in which questions are asked leads the respondent to answer in a favourable/ unfavourable way. The poll attempts to “push” voters away from a particular candidate by asking loaded questions designed to discredit them e.g. “Would your opinion change if you were told that no major improvements have been made in edicuation for over 20 years?”
- Survey fatigue – the increase in researchers and pollsters using phone surveys and questionnaires has resulted in increasing refusal rates which is making it more difficult to obtain truly “random samples.”
University Libraries are excellent sources of data and are interlinked. A link to the University of Ottawa then links to the Carleton University Data Centre, the U.S. Census Bureau, Statistics Canada (with a much-improved search capability), ICPSR (a large consortium of universities and their U.S. and international data), and instructions on how to get and use data:
- Go to Website http://www.uottawa.ca/library
- Choose: Services: “Data Services”
- and then “CD-ROM and UNIX”,
- then “CD-ROM Products”
- and click on the yellow “access” face to the right of “Dimensions” and later,
- do the same for the “Profiles” product.
These links ask you if you want to “Open the file” or “Save it to disk”. They are usually very large files and take quite a bit of time to download. You can read them without downloading by choosing “Open the file” if you have already downloaded the B2020 browser as specified at the top of this Webpage. This browser program is free for Carleton and University of Ottawa students, but is about 5Megabytes in size and can take an hour to download. You save it to your own compter hard drive in your personal or “temp” directories and launch it by either “Run”, or simply double-clicking on the file in Windows Explorer after you have downloaded it. Now you can choose any of these data files and simply choose “Open the file.”
Go to back to Website http://www.uottawa.ca/library
Under Services: “Data Services” this time click on “Links to Data”, and choose “Carleton University.” From this site, choose “Surveys of special interest” scroll down to “surveys”..
Next, a look at the Survey of Violence Against Women illustrates the sort of survey that Social Workers design and that is useful to many different groups of students and researchers. Scrolling way down toward the bottom shows “variables” that the researchers were looking at. The WGTH or “weight” variable means sample size and not person’s weight. It is used for multiplying the sample results to estimate from the “random sample”, how many of the entire target population were victims or perpetrators of violence.
Then, go back to the list of special interest surveys and look at the 1995 Survey of Work Arrangements. Click on Questionnaire and one can see the detail involved in questionnaire design. This gives an example of how you might design your own research questions.
Next, go to Website http://www.uottawa.ca/library
- Choose: “Dataservices”
- and then “Links to Data Sources”
- and explore the “Statscan”, “U.S. Census Bureau”, and “ICPSR” links.
This is a social science method for analysing the content of text where the researcher is looking for the statistical occurrence of particular words or themes in books, reports, or other documents. If the text is available from a website or CD-ROM, the Content Analysis can be done with your wordprocessor. But, if the text is simply written documents, it has to be done manually by reading. There is also a website that provides a program called VBPro created and shared by the author that will do a content analysis of a text or ASCII file. Many searches are available on-line at this site.
VBPro is a set of computer programs written to do content analysis of ASCII text. They run under DOS on personal computers. The programs are menu driven and come with user guides. All output is in ASCII format compatible with most word processors and statistical packages.