Extra Life Case Study: Massed vs. Spaced Trials in the Acquisition of Skilled Motor (Video Game) Tasks

For this article, we have a special purpose; to bring awareness to a fantastic non-profit organization called Extra Life, whose goal is to raise donations for the Children’s Miracle Network Hospitals, which gives much needed funding to families who need it. (Donation links at the bottom of the page!)

Today, the topic is video games; which is the main focus of Extra Life’s audience. To bring some psychological expertise, and applied behavior analytic focus to this topic, we had two volunteers come up to test their mettle on (arguably) one of the most difficult video games to master and beat: “I Wanna Be The Boshy”. On the surface, a very simple looking game. Move a character with a keyboard or analog stick, along treacherous environments without touching obstacles, enemies, or projectiles. That is, until you realize how impressive the reaction time needs to be in order to progress through the levels; upwards of 2-5 responses per second. Each mistake, has a punishing restart to the beginning of the level or section, relying on the player’s skill to not only learn the pattern of motor responses to complete each section, but enter them reliably with perfect timing and order.

5
I WANNA BE THE BOSHY!

In many cases, this game requires months to beat (rare cases excluded). With this time frame, we were able to watch recordings of our two players (etanPSI & LonestarF1) via a streaming service named Twitch, which provided the video of game-play that could be reliably studied and analyzed for the target behavior skills necessary to master and beat this game.

For this particular study, we chose the target behavior of successive correct responses, and used frequency data as our metric to gauge progress through the levels. For example, one correct response may navigate a particular jump, a second may require maneuvering for a landing, and a third for another jump to a moving obstacle, all within 1.5 seconds, totaling 3 successive correct responses for that particular challenge. On average, during our tracked trials, a particular level or challenge requires a minimum of 43 successive correct responses in one minute of play, order to continue.

3
Analyzing the Players Behavior

If we want to understand the game from a behavior analytic, and psychological point of view, we need to discuss some terms:

Reinforcement: Think of reinforcement as a rewarding stimuli, that has the benefit of increasing the target behavior in the future. A reward which is successful in making a response (game playing, etc) happen more often, is called a reinforcer.

  • In this specific case, success following a trial serves as a conditioned reinforcer for the player, where beating a section, or a boss, is reported as the goal and achievement to be earned.

Responses: This is what the person does. Any behavior that follows a specific target stimulus, is considered a response.

Punishment: These are the opposites of reinforcement. They are consequences that decrease the likelihood of a behavior of occurring in the future.

Frequency (and Rate): Frequency and rate respond to behavior that occurs over a set amount of time. For example, if our general target is 43 correct responses in 1 minute of time, then we would want our rate of successive correct responses to near that amount to give us the greatest chance of success.

Discrete Trials: A discrete trial is often used in a clinical condition where a discriminative stimulus (SD) precedes a response, which is then reinforced when that response is the target behavior. The good thing about video games, is that each level, or screen, can be considered a discrete trial; as correct responses are reinforced with continuing the game, while failures (and punishing stimuli) cause it to be repeated.

Massed Trials: Massed trials refers to the use of discrete trials in close proximity after each other, so that no interrupting behavior occurs between them. In other words, repetition. For our gaming example, this would be restarting immediately after each failure to continue to the original starting point of the previously failed trial.

Spaced Trials: Spaced trials refer to a training condition where each discrete trial is separated by a pause, where various behaviors and stimuli unrelated to the next discrete trial may be engaged with. Think of this like a break condition. The player can take a breather, talk to the fans, take a drink of water. All of these things occur between trials, so that there is a gap between them.

2
The Experiment

Our friendly experiment required our players, etanPSI and LonestarF1 to attempt to engage in 30 trials in both conditions. The first condition would be Massed Trial, which involved 30 complete repeats without any interruption between trials. Successes could continue on to the next section, but repeats would require the trial to begin again without any (controllable) pause or break. The second condition would be Spaced Trials, where our players would be required to take at least a few seconds between trials, to chat, breathe, take a drink of water, or any other free-operant behavior in that gap. We did not limit our players to a specific time limit on these, but on average, they ranged between 10-30 seconds. We would then compare the two to see which appeared to give the players the best improvement benefit.

Our players reported themselves to be motivated to beat the game, and the challenge of proceeding through the game served as conditioned reinforcers. This free-operant preference assessment appeared to have some validity, as these players put themselves through over 60-270 trials per recorded play period, well above our 30 (60 with both conditions) trial requirement for the experiment. The players were free to agree to the conditions of the experiment, or deny them as they felt appropriate. Tracked periods that did not meet the criterion for the experiment were discarded, and the next session which did was counted. We called it “Science Mode”, when the players were agreeing to the experiment terms. Over all, 80% of Massed Trials tracked fit the experimental criterion, and 62% of Spaced Trials tracked fit the criterion. This provided us with a breadth of data to work with in getting a general idea of the factors which may be in play which attribute to their specific learning styles and abilities in completing the game itself. By the end of the tracked periods, both players had successfully completed the game, and beaten the final boss.

During this period, both players went through high rates of failure conditions, where successive conditions of failure within 10 responses were common when they impacted enemy projectiles, environmental hazards, or incorrect landings. This was a common function of the game’s difficulty, which had a degree of punishment effect on responding. In more cases than not, these conditions did not cause either etanPSI or LonestarFI to quit the game completely, but instead lead to a naturally chosen pause between situations to either breathe, react with a verbalization, or take a moment to process. In the conditions where Massed Trials were being tracked, these series of 30 responses were discarded, but when Spaced Trials were being tracked, these series were kept if they held to the same spaced pattern for following responses.

Our target goal for this experiment was to see how they remained within the average number of successive responses (43) per minute, that had been tracked from successful win conditions previously. Our range for their responses were tracked between 20 and 60, on a Standard Celeration Chart. By tracking the average of 30 tracked responses, (some as low as 1, others as high as 77 per minute), we were able to place the average within these intervals on to a chart and compare them to same, or close-proximity day responses from both conditions.

Previous research by Fadler, et al., and others they referenced (Foos, et al (1974), Rea et al, (1987), suggests that Spaced Trial is the superior method of skill acquisition, but it was noticed during etanPSI and LonestarFI‘s play styles that Massed Trials were preferred. Cursory investigations of other players showed the same. Faster restarts appeared to give higher rates of reinforcement, which in turn lead to success within a single day’s time that might not have been possible if play had been delayed or discontinued. It did appear that during this period, higher rates of repetition of these pattern based motor behaviors, did effect the end result of success.

In their article “The acquisition of skilled motor performance: Fast and slow experience-driven changes in primary motor cortex” Karni, et al (1998), suggests that there are different types of learning stages, and that experience driven changes to the brain effect two different types of learners in different ways; “We propose that skilled motor performance is acquired in several stages: “fast” learning, an initial, within-session improvement phase, followed by a period of consolidation of several hours duration, and then “slow” learning, consisting of delayed, incremental gains in performance emerging after continued practice. This time course may reflect basic mechanisms of neuronal plasticity in the adult brain that subserve the acquisition and retention of many different skills.” which they demonstrated in their study as well. We will not go too deeply in to biological factors in this article (since we did no MRI’s on our players), but if you have interest the article is cited below. However, this “fast learning” does appear to coincide with our conceptual Massed Trial format of learning, and the within-session improvement phase may be a factor in what we are seeing in the results of etanPSI and LonestarF1.

7
The Results

The results from our experiment was astounding. We found a clear favor in both the player’s preferred style of trial, and the ability for their skills to improve with it. Both players ranged in similar failure (0-1) and win (~43) successful responses per minute, and both in cases leading to successes against particularly difficult bosses exceeded these by going over 70 successive correct responses per minute!

With etanPSI we were also able to see some situations where both spaced and massed trials, interspersed, had a greater degree of success than when they were split by 30 consecutive trials each. When he was able to engage in repetitive environment/platform based difficulties, Massed Trial was more successful, but when dealing with alternating projectile challenges from game bosses, Spaced Trials were useful to mitigate the punishing effects of failure conditions. Higher volume vocalizations, high intensity percussive maintenance to gaming instruments, and broader vocabulary, appeared to lend a restorative effect to attentiveness and responding rates to the following massed trial conditions.

Capture

A Dpmin-11EC Standard Celeration Chart from our experiment.

In both conditions, we were able to see consistent Acceleration of gained successive correct responses per minute, from Massed Trials, which may have also been in part to the increase in difficulty as the players progressed, requiring higher outputs of responses. Nevertheless, the players did rise to the occasion and appear to hold to improvement in responding and pattern recognition & responding, over the course of 30+ trials per day. Where many had failed and given up, these two players had not only succeeded, but excelled at an incredibly difficult game.

4
The Fun!

Now that you know the story of our fun experiment, here’s where you can donate and thank our amazing players for their time and skill, as well as help the lives of countless children receiving medical services through a hospital on the Children’s Miracle Hospital Network! 100% of all donations go directly to charity, and are tax deductible! Help our player’s team to exceed their goal and change lives!

Donate to our amazing experiment volunteers!

etanPSI’s Extra Life Page

LonestarFI’s Extra Life Page

Like the science? Donate to the behaviorist!

Chris S’s Extra Life Page

References:

  1. Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1998). The acquisition of skilled motor performance: Fast and slow experience-driven changes in primary motor cortex. Proceedings of the National Academy of Sciences, 95(3)
  2. Wimmer, G. E., & Poldrack, R. A. (2017). Reinforcement learning over time: spaced versus massed training establishes stronger value associations
  3. The Precision Teaching Learning Center.- http://www.precisiontlc.com/ridiculus-lorem/

Photo Credits: etanPSI & Lonestar F1 http://www.twitch.tv

Behavioral Science In Video Games

In behavioral science we like to look at things that are concrete and observable. Why do people respond to specific scenarios and stimuli in different ways? How do they differ from one another? How can we adapt what we present in ways that either increase or decrease a person’s responding? These are questions we can apply to our area of interest; Video Games, in order to explore what game designers have put in to their medium to get you hooked and keep you hooked. Video Games require the audience to participate in ways that other art mediums do not. It is the direct responses of the consumer that shape and define their progress through the game and a hallmark trait of video games is using rewards as marks of progress that get people to play longer, increase their own skill at the game, and master objectives that the designers put in place. Let’s discuss some of the behavioral principles that may in play with the games you know and love. See if you can identify these concepts in your own experiences with video games.

arcade-1254453
Reinforcement vs. Rewards

In behavioral science, we use the word reinforcement to define a consequence that strengthens a future behavior, when presented with the same setting/stimulus (antecedent). When a reinforcer is presented after a behavior, we expect to see the probability of that behavior to go up the next time the person is placed in that situation. It is the foundation of learning and operant behavior. Operant Behavior is a large piece of this conceptual puzzle; it is behavior that has been shaped to serve a purpose in the environment, which has been reinforced in the past. How does this differ from rewards? In gaming of all types, there are rewards. These are pre-set consequences or prizes that follow the completion of specific objectives laid out for the player. Some prizes/rewards are interesting to a player and keep them engaged with the game, and others do not, leading to disinterest or a falloff in responding (playing). What makes a reinforcer different from a reward, is that reinforcers are dependent on the individual’s future responding. When we say reinforcer, we are saying with a degree of certainty that this “reward” has effected behavior before and is preferred by the individual, because it has been shown to have worked in the past. Let’s look at this scenario:

Player 1 must press the circle button when presented with a box in order to break the box and gain a prize (100 points).

If Player 1 presses the circle button and breaks the box, and gets the 100 points, they have been “rewarded”.

If Player 1 presses the circle button and breaks the box, gets the 100 points, and presses the circle button when presented with more boxes in the future, they have been reinforced.

It could be said that 100 points was enough to reinforce the behavior. This effects future playing behavior by pairing a preferred stimulus (the points) with an operant behavior (pressing the circle button) in the presence of the box (antecedent). This is also called the Three Term Contingency.

If game designers want their players to learn certain skills specific to their game, or keep people playing it, they need to focus on casting the widest net of reinforcers, rather than just rewards. Anything can be a reward, but only when it’s considered a reinforcer, will we see players use those skills to progress again and again.

p1
Schedules of Reinforcement:

In the example above, we have a single situation, with a single reinforcer. Games are made up of varied scenarios, competing choices for the player to take, and sometimes we see two types of reinforcement used at the same time. How does that work? Sometimes a player is presented with an opportunity to complete two objectives at the same time. This brings a level of challenging complexity that most players enjoy more than a simplistic single system of reward, because it raises the stakes in terms of what they can receive. Let’s take a look at some simple schedules of reinforcement below:
Fixed Ratio Reinforcement:
In this schedule of reinforcement, we see a set rate of responding met with a set amount of reward. So if a player beats 1 adversary and receives 200 points, this is called a FR1 (fixed-ratio 1) ratio. If a player needs to beat 2 adversaries to receive 200 points, this is called a FR2 ratio, and so on. The benefit of this style of reinforcement schedule is that it is consistent and a player can depend on it. If they can predict the amount of points/rewards they receive for each action, they can match their responding to the amount of reinforcement which satisfies them.
Variable Ratio Reinforcement:
Some people know this schedule of reinforcement from RNGs (Random Number Generators) that are put in games to provide variability, and also for some people, a very strong system of reinforcement. Gambling also runs on this principle. With variable ratio, there is percentage that the response will be rewarded. Unlike the Fixed Ratio, prediction of the reinforcer does not follow a fixed series. The Player must rely on chance, or repetition of responses (for more opportunities) in order to receive a reward. Sometimes this can come in the form of an increase in magnitude of the reward (an adversary sometimes is worth 100 points, but may also be worth 500), or frequency (some adversaries reward points, others do not). As we may expect, the chance to receive a large reward for a standard amount of effort can be a very reinforcing contingency.

Looking at these two schedules, we can expect that both have their respective fans. Some players prefer predictability and something that can be planned for. A specific amount of successful responding would equal an expected amount of reward, every time (Fixed Ratio). Others, enjoy the variability; sometimes even a standard amount of responding could pay off in a huge reward (Variable Ratio). When we combine two or more simple schedules, we get the complex schedules:

If you give the player the option between a Fixed Ratio and a Variable Ratio, we call this a concurrent schedule of reinforcement. It would look something like this:
If a player walks down path A to fight the goblins, they can expect 100 points for each goblin adversary beaten, but if the player goes down path B to fight the birds, there is a variable chance of getting 800 points for each bird beaten. Both of these options are available and do not necessarily reduce the option of pursuing the other. A player could fight the goblins for a little while, then choose to fight the birds. The options are both available, thus concurrent. You see these schedules of reinforcement common in games that allow for free exploration, or multiple avenues to the same objective.

If we give the player both a Variable Ratio and Fixed Ratio at the same time, we call that a superimposed schedule of reinforcement. It would look something like this:
A player is set in a scenario where they had to face both goblin adversaries and bird adversaries at the same time. Each goblin adversary that they beat would reward them 100 points (Fixed Ratio), and each bird adversary beaten would give a chance of getting 800 points (Variable Ratio). These two schedules are now running at the exact same time, and the player has the opportunity to pursue each simultaneously.

These are just a few examples of the type of reinforcement schedules you may come across in games. There are no real limits to how many schedules of reinforcement may run concurrently or superimposed. You could run multiple fixed intervals at the same time (An orange is worth 100 every time, an apple is worth 200 points every time), multiple variable ratios (An orange is sometimes worth 100 points, an apple is sometimes worth 200 points). The possibilities are limitless. There even exist schedules of reinforcement that rely on intervals of time, rather than responding (every 3 minutes you receive 100 points, or sometimes every 10 minutes you receive 100, regardless of what responding the player is engaged in).

It stands to reason, however, that the more schedules which run at the same time, and the more complicated the contingencies of reinforcement, the greater the risk that the player will not understand what responses or choices are actually being reinforced. This may lead to some misattribution, or superstitious responding (responding that has been reinforced by a contingency that did not actually exist). When reinforcement schedules are too complex or not clear, they can create confusion with the players, and result in loss of responding or interest in the game.

13010692_1258403490840429_1021608008296563206_n
Complications:

Human behavior is not always easily predicted, and even in video games, game designers can create vast systems of intertwined schedules of reinforcement that keep players enthralled for hours, but there may come a point where the expectations of player responding do not match the predictive models. We have to be aware of some of the other factors in behavioral science and research that influence a decrease in responding (playing) or disinterest. Below are just a few of these that we commonly come across in video games.
Punishment: Punishment is a condition where a stimulus is either presented or removed that decreases the probability a behavior would happen in the future. It serves the opposite purpose of reinforcement. It comes in two variations; positive and negative. These terms do not reflect anything “good” or “bad” but rather an addition or subtraction of stimuli which has a marked effect on the decrease of future behavior when given the same (or similar scenarios). In video games, they look something like this:

• Positive Punishment: A player walks in to a hole. That player receives damage. The hole is the presentation of a stimulus, and assuming damage is aversive to this player’s style or goals, they would be much less likely to walk in to it again.

• Negative Punishment: A player buys an overpriced item in an in-game shop. Assuming the player has lost a significant amount of something that was preferred in exchange for something non-preferred, they are not likely to repeat the buying behavior in the future.
S Δ (S-Delta): S-Delta shares a similarity with Punishment in that it does not strengthen or reinforce a behavior or series of responses. An S-Delta is a stimulus that when present, a particular behavior receives no reinforcement. An example of this might be, if a player is used to running down a path to pick up items/points, the hold down the “Run” button to increase their reinforcement. However, if this same behavior was attempted when in the presence of a wall (S-Delta), that behavior of holding the “Run” button would not receive the same reinforcement. Running behavior is not necessarily punished overall, but it is less likely to be used for reinforcement in the presence of the wall.
Ratio Strain: Ratio Strain is a condition where an increase in response is expected, but the reinforcement is not enough to maintain it. An example of this may be, if a player is used to defeating goblins for 100 points, but is then presented with Super Goblins rewarding 100 points which are much more difficult to defeat, the amount of reward is no longer reinforcing enough to maintain the repetition of responding. This can often be solved by raising the amount of reinforcement to match the effort.
Satiation: Satiation is a common modifying condition for human behavior. There comes a point when a specific reinforcer is acquired to the point where it is no longer a reinforcer anymore. An example of this is, if a player is satisfied with having 10,000 points, and achieves 10,000 points, any future accumulation of points would not reinforce the behavior to continue. The reward condition would remain, but it would no longer be considered reinforcing. This may often be solved by allowing some time to pass to the point where that satiation condition is no longer present, or changing reinforcers.
Response Effort: It is the amount of effort a person has to put forward to complete a target behavior. This is not a barrier to playing in itself, but could denote a change in difficulty. So if we are reinforcing the behavior of defeating ghosts or eating dots, the amount of effort may be how fast a person has to respond to obstacles, or the amount of fine motor skill necessary to navigate to the objective. If the amount of effort exceeds what a player can respond to, we can say the response effort has been set too high to be reinforced.
The Social Factor

We would be remiss in ignoring one of the strongest forms of reinforcement that may not necessarily be provided in the game, but the product of success or even the pursuit of playing could give us; social reinforcement. Sometimes players enjoy the thrill of competition (competitive multiplayer), others enjoy jolly cooperation (cooperative multiplayer). Many find strong reinforcement in sharing their experiences (streaming), or showing off completed objectives (trophies/completion). Bringing other people in to the experience of interacting with video games is by no means a new prospect, but quantitatively measuring social reinforcement in video games is still very much an avenue of research worth pursuing. Some examples that game designers may be able to follow to collect that data may be; how many times multiplayer aspects are utilized, the duration of multiplayer aspects to their game, viewership in streamed media, and of course, consumer demands for specific social aspects that would be feasible in a game. There may also be examples where developed games rely too much on external social reinforcement without providing sufficient contingencies of their own within the game’s design.

m1
Balancing it all

Video Games are rich examples of how human behavior interacts with digital entertainment, and the concepts above are just the tip of the iceberg. Some games employ one or two of those concepts, others employ complex systems of intentional reinforcement and punishment. With different generations we have seen popular features rise and fall but all seem to follow the basic principles; objectives, responses, and rewards. Reading this, you may have some ideas on some other phenomena that might have an effect on the relation between video game and human. The concepts above is in no way exhaustive, but it’s a topic we may be able to explore a deeper in the future. Leave comments below with your thoughts, theories, and opinions.
References:

  • Fantino, Edmund; Romanowich, Paul. (2007) THE EFFECT OF CONDITIONED REINFORCEMENT RATE ON CHOICE: A REVIEW. Journal of the Experimental Analysis of Behavior
  • Magoon, Michael A; Critchfield, Thomas S. (2008) CONCURRENT SCHEDULES OF POSITIVE AND NEGATIVE REINFORCEMENT: DIFFERENTIAL-IMPACT AND DIFFERENTIAL-OUTCOMES HYPOTHESES. Journal of Applied Behavior Analysis
  • Pietras, Cynthia J; Brandt, Andrew E; Searcy, Gabriel D. (2010) HUMAN RESPONDING ON RANDOM-INTERVAL SCHEDULES OF RESPONSE-COST PUNISHMENT: THE ROLE OF REDUCED REINFORCEMENT DENSITY. Journal of Applied Behavior Analysis
  • Pipkin, Claire St Peter; Vollmer, Timothy R (2007). APPLIED IMPLICATIONS OF REINFORCEMENT HISTORY EFFECTS. Journal of Applied Behavior Analysis.
  • Skinner, B. F. (1953). SCIENCE AND HUMAN BEHAVIOR. New York: Free Press.
  • Skinner, B.F. (1938). THE BEHAVIOR OF ORGANISMS. D. Appleton & Company.

Photo Citations:

  1. “Dark Souls 3” – Ethan Russel
  2. “Mario” -Freeimages.com
  3. “Pacman”- Freeimages.com
  4. “Arcade”-Freeimages.com