Social Data: Getting the Most of Your Polls, Questionnaires, and Surveys

Economical stock market graph

Self Report Studies:

Self-report studies; you see them all the time online: surveys, polls, and questionnaires. You even see them on your receipts after you order a coffee. Your opinion counts. People want your social data. Some of these are used for marketing, and others are just done for the fun of it.

The reason they are called Self-Report Studies is that they rely on that group to give their own interpretation or information on the subject. The information that the group provides (or self-reports) is called data. Data, especially social data, is important to a lot of people because it gives a voice to a target audience. Many social media networks (Facebook, Twitter, etc) now give their users the ability to do their own self-report studies with only a few clicks of a button. Here are some tips to make sure yours are made with a scientific understanding of the principles!

Sample!

A sample is the selection of a population that you are using for your study (poll, questionnaire, etc). These are the respondents to your self-report study. They are the ones providing your data. Since they make up the entirety of the feedback you will be receiving in your study, you probably want to get the most out of your sample, right?

There are two main ways you can do this. Scope of your study, and size/diversity of your sample.

Scope: The scope of your study, or poll, is the breadth of information you are looking for. It is the net you are casting to catch the information on your topic. [1,2]

You can use a narrow scope: using a topic that is relevant only to the select population you are targeting “Hello, Members of the Elephant Watching Club! Which is your favorite type of elephant?”.

Or, you can use a broad scope: using a topic that has broader relevance on a larger topic: “Hello, Members of the Elephant Watching Club! What do you think about Politician A?”

You might have noticed something in those examples. The survey is only able to track the Members of the Elephant Watching Club. The Elephant Watching Club is the extent of their sample. If the person wanted to get an answer that applied specifically to members within that club, then they would be fine with either scope; just so long as they did not interpret that as an example of a broader population. This leads us to the next one: size/diversity. [1,2]

Sample Size/Diversity:  This refers to the size of your sample, and the diversity of people/opinions within it. If you want to, say, get a representative sample for the United States of America, would you only sample the Elephant Club? Probably not! They most likely do not have the diversity or size to be of use. You would want a sample that is representative, or represents, the diversity in opinion of the target audience you are inferring from.

This may sound difficult for polling. How would you do it? Many researchers use what is called a “Random Sample”, which is a sampling method that gives every member of a population being studied an equal chance of being selected for that sample. It gives broader reach, as well as less hand-picking by the researcher which could lead to bias. If this is something that your current self-report study media does not do, try and adjust your topics to account for Scope, and Sample Size/Diversity! [1,2]

 

Tip! Target your self-report study to fit closely to your sample! Know the population you are studying and gear what you are looking for to fit within that group.

 Desk office business financial accounting calculate, Graph analy

Types of Questions:

There are two types of questions used in self-report studies: Open Questions, and Closed Questions.

Closed questions are questions which provide a limited choice, especially if the answer is taken from a predetermined list. This provides what is called quantitative (numerical) data, but do not allow the participant to give in-depth insights. They are “closed” because they give the respondents a pre-selected set of options to choose from. Polls often used Closed Questions for numerical data. [1,2]

  • Examples: “Pick your favorite ice cream from the following list: Vanilla, Chocolate, Strawberry”. or   “Do you like apples? Yes or No”.

Open questions are those questions which invite the respondent to provide answers in their own words and provide what is called qualitative data.  These questions give you more in-depth answering in the respondent’s own words, but do not allow you to quantify them as easily and compare them to others. They are “open” because the answer may be anything the respondent writes down or replies with. Questionnaires and Surveys include Open Questions for respondent-detailed replies. [1,2]

  • Examples: “Tell us what you think about our service?” or  “What about the apple grove did you like best?”

 

Social Media Example: A Poll would be Closed Question Data. The replies written by users would be Open Question Data.

Each type of data has its own benefits and drawbacks. You would want Closed Questions to provide you data that you could numerically analyze quickly. Everyone responding has the same answers, so it is like comparing apples to apples (so many apple examples…). But, if you wanted a more nuanced answer for the sake of feedback that did not have the same comparability, you could use Open Data to get more detail from your sample. [1,2]

 

Tip! Use the data that fits the type of output you want most. Want descriptive feedback (which you do not need to represent in a graph)? Use Open Questions. Want to make a graph and get numerical data? Use Closed Questions.

 

That’s Mean! (Median, and Mode):

Now let’s talk about some data analysis we can use for quantitative (numerical) data. You might get many responses from your sample that you’ve tailored both the scope, and sample size/diversity for maximum accuracy! What now? Now that you have the data, it’s time to interpret it. Sometimes the media or software you are using would do this for you. If not, take notice of these three terms: Mean, Median, and Mode.  These are what are called “Measures of Central Tendency”, and are used in statistics. If you want to know what most people in your sample are responding, while avoiding fringes; these might be useful to you. [1,2]

Mean is the average of the group of scores you get back. All numbers/responses being equal, this is taking all of them and finding the average response. You do this by adding up all the scores, and then dividing by the sum of the scores. [1,2] It looks like this:

  • Scores: 1, 3, 5, 3, 8. Mean (Average)= 1+3+5+3+8, then dividing by 5 (the number of scores) to get 4.

Median is taking the middle value or score when the responses are arranged from lowest to highest.  [1,2] This gets you a representation of the “middle guy” in the group, and looks something like this:

  • Scores: 1, 3, 3, 5, 8. Median= 3.

Mode is the score that occurs the most within your responses. When you want to see which exact response was chosen over the others, you can look at mode.  [1,2] It looks like this:

  • Scores, 1, 3, 5, 3, 8. Mode= 3. 3 was chosen twice.

 

Tip! Use the measure that gives you the most out of your self-report study! Mean (Average) is the most common you will see and is well liked for easy data output. Mode is when you are curious about the dead center respondents in your sample. Median is what you want if you are curious about the popularity of a specific answer being chosen.

b3rmsmqi4qk-kate-serbin

Oh no! Response Bias:

You’re doing great! You have your sample, you have your scope and representative size/diversity, and now you even have your quantitative measures of central tendency! What could go wrong?

Well, sometimes the people responding. Their bias, or factors that influence how they pick selections in a self-report study, can give us skewed or inaccurate results. Sometimes we are able to adjust our self-report study ahead of time (by wording questions a certain way) to mitigate this, and other times it is simply a part of it. Keep in mind that when ever someone is responding to a survey or poll, it is their interpretation that makes up the data; it is not a direct observation of reality. [1,2] Here are some types of biases that you should be aware of:

Self-Serving Bias: This is when successes are attributed to internal factors (themselves) and failures are attributed to external factors (others). [1,2]

  • Example of a question susceptible to Self-Serving Bias: “Do you feel as though you have been passed over for a job for someone less qualified than you?”

Acquiescence Bias:  This is when respondents say “yes” based not on the question, but rather on the favorability of that response (even though it may be anonymous) to the studier.  [1,2]

  • Example of a question susceptible to Acquiescence Bias: “Look at this pic! Am I pretty today? Yes or No!”

Extreme Responding Bias: This is where respondents prefer to pick the most extreme responses possible from a selection. (ie. Something is “literally the best/worst!”). [1,2]

  • Example of a question susceptible to Extreme Responding Bias: “On a scale from 1 to 10, how good was that episode of TVSHOW?” 10! 10! 10! 1! 10!

Social Desirability Bias: In this one, people respond with the most socially appropriate (or inappropriate) answers that conform to the expected desirability of the group or studier. [1,2]

  • Examples of questions susceptible to Social Desirability Bias:  “Do you give to charity? Yes or No”, “Do you ever have rebellious instincts?? Yes or No!”

Do you see how these biases may affect the interpretation of data? Keep them in mind!

 

Tip! Know your audience, and know your questions. Even one favorable (or unfavorable) word in a question could get your respondents to reply according to these biases. Think about what type of biases may be expressed when answering your questions.

t5bva-q_m_y-luis-llerena.jpg

Get it Valid and Reliable:

There is one more important thing when you are studying a topic in social science: Validity and Reliability. These are factors that the person studying and presenting the self-report study should build in as best as possible before sending out their self-report materials to the world. These factors are what you use to make sure that you are studying something real, and that you are studying it accurately! [1,2]

Validity is the ability for a test/study to measure what it is intended to measure.  An example of this might be, if you are trying to study something like people’s opinions on a specific topic, does your question cover it, and is that question worded to target the topic specifically? [1,2]

  • Example: We are studying whether people enjoy the taste of chocolate. We ask “Do you enjoy the taste of chocolate?” 76% say yes! So long as everything is defined and specific, we could call this valid. 76% of respondents enjoy chocolate.
  • Non-Example: Whether people enjoy the taste of chocolate. We ask “Do you like sweet foods?” 84% say yes!, and in our study we conclude that that 84% of people enjoy chocolate. Wait a minute. Was that our question? Was our question tailored to fit the validity of the study? We used Sweet in our question, but we concluded on the factor of Chocolate. This is not valid.

Reliability is the ability of our test to yield (nearly) the same result each time we test with it. If we are able to test a sample with these questions, and provide an alternative test (on the same topic), we would get similar responses both time. The reason we need this is to be sure that it is not a fault or mistake in the test that is giving an inaccurate conclusion. Sometimes biased-wording, text errors, or jargon, can lead to responses being skewed or erratic. If 2 tests, or the same test twice, can get stead and similar responses from the same population, then we know that variability in responses is based on the respondents, and not our questions. [1,2]

  • Example: If you run the Chocolate Preference Test twice, and the first set of responses equal 80% while the next equals 81%; this is as close to reliable as you might be able to expect.
  • Non-Example: If you run the Chocolate Preference Test twice, and the first set of responses equal 17%, while the next equals 54%; there is something wrong. Assuming this is the same sample or even population, you might want to look at your test as a factor which influenced results incorrectly.

Both of these methods assure that you, the designer of the study, are not including factors that could effect the results you get. You want your results to match the respondents, not artifacts (unrelated data) embedded in your questions.

 

Tip! Test and re-test. If you have an audience, rephrasing questions on the same topic and presenting them again may get you a better picture when you keep validity and reliability in mind.

Questions? Comments?

 

References:

  1. Wood, S. E., Wood, E. R., & Wood, E. R. (1996). The world of psychology. Boston: Allyn and Bacon.
  2. Cooper, J. O., Heron, T. E., & Heward, W. L. (1987). Applied behavior analysis. Columbus: Merrill Pub. Co.

Photo Credits: https://stock.adobe.com, http://www.unsplash.com (Luis Llerena, Kate Serbin)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s