Operant conditioning, also known as instrumental conditioning, is based on the consequences that follow an organism's behavior. Behaviors that are followed by a reward, or reinforcement, usually increase in frequency, while behaviors that are followed by punishments usually decrease in frequency. The context in which the rewards or punishments are received has an effect on how the association between the behavior and the consequence following the behavior are learned. In addition, how often reinforcement follows any particular behavior has an effect on how well the association is learned.
The Effect of Reward or Punishment on
American psychologist Edward Thorndike's Law of Effect states that depending on the outcome, some responses get weakened while other responses get strengthened, and this process eventually leads to learning. Thorndike noted that when an animal was rewarded for a certain behavior, that behavior became progressively more frequent while behaviors that did not elicit a reward weakened and became sporadic, finally disappearing altogether. In other words, unlike classical conditioning, what follows a behavior or response is what is primarily important.
In his mid-twentieth-century experiments with rats and pigeons, American psychologist B. F. Skinner found that animals use their behaviors to shape their environment, acting on the environment in order to bring about a reward or to avoid a punishment. Skinner called this type of learning operant or instrumental conditioning. A reward or reinforcement is an outcome that increases the likelihood that an animal will repeat the behavior. There are two types of reinforcement: positive and negative. Positive reinforcement is something given that increases the chance that the animal or person will repeat the behavior; for example, smiling or praise whenever a student raises her hand is a form of positive reinforcement if it results in increased hand-raising. Negative reinforcement occurs when something is taken away;
stopping an electric shock to elicit a behavior from a rat is an example, because whatever behavior the rat exhibited to terminate the shock will increase.
A punishment, on the other hand, is an outcome for which the likelihood of a future behavior decreases. For example, spanking or slapping a child is an example of punishment, as is grounding, because all three can be expected to reduce the occurrence of the behavior that preceded them.
There are a number of ways in which someone can manipulate an animal's or a person's behavior using operant or instrumental conditioning. One of these methods is called shaping and involves reinforcing behaviors as they approach the desired goal. Suppose a person wants to train a dog to jump through a hoop. He would first reward the dog for turning toward the hoop, then perhaps for approaching the hoop. Eventually he might reward the dog only for walking through the hoop if it is low to the ground. Finally, he would raise the hoop off the ground and reward the dog only for jumping through the hoop.
Context is extremely important for operant conditioning to occur. Both animals and people must learn that certain behaviors are appropriate in some contexts but not in others. For instance, a young child might learn that it is acceptable to scribble with a crayon on paper but not on the wall. Similarly, Skinner found that animals can discriminate between different stimuli in order to receive a reward. A pigeon can discriminate between two different colored lights and thereby learn that if it pecks a lever when a green light is on it will receive food, but if it pecks when the red light is on it will not receive food.
What is more, animals can discriminate between different behaviors elicited by different contexts. For example, a rat can learn that turning around clockwise in its cage will result in getting food but that in a different cage turning counterclockwise will bring a reward. Animals will also generalize to other stimuli, performing the desired behavior when a slightly different stimulus occurs. For instance, a pigeon that knows that pecking a lever when a green light is on will bring food might also peck the lever when a different-colored light is on. Both generalization and discrimination help animals and people learn which behaviors are appropriate in which contexts.
The rate of reinforcement can also affect the frequency of the desired response. Delaying reinforcement slows learning down, although research shows that humans can learn from delayed reinforcements, and that it is often difficult to forfeit immediately positive outcomes in order to avoid adverse ones later.
The schedule of reinforcement also plays a critical role in affecting response rates. There are two types of reinforcement schedules: interval schedules and ratio schedules. Interval schedules are reinforcement schedules in which rewards are given after a certain period of time. Ratio schedules are schedules in which rewards are given after a specific number of correct responses. As seen below, the time interval or response ratio can either be fixed or variable.
The schedule that elicits the most rapid frequency of responses is the fixed ratio schedule. In this case, the animal knows it will receive a reward after a fixed number of responses so it produces that number as quickly and frequently as possible. This phenomenon also occurs with people; if craftspeople are paid for each object they make, they will try to produce as many objects as possible in order to maximize their rewards.
Generating nearly as rapid a frequency of responses as the fixed ratio schedule is the variable ratio schedule. In this case, the number of responses needed to produce a reward varies so the animal or person will emit the desired behavior frequently on the chance that the next time might bring the reward. Lotteries and slot machines function on a variable ratio schedule, thus inducing people to want to play again.
Interval schedules tend to produce slower frequencies of response. A fixed interval schedule will produce fewer responses early in the interval with an increase as the time for the reward approaches. One example in human behavior is the passing of bills in Congress. As elections approach, the number of bills passed increases dramatically, with a swift decline after the election. A variable interval schedule, on the other hand, produces a slow but steady frequency of response; for instance, a teacher giving ''pop'' quizzes at irregular intervals encourages her students to maintain a consistent level of studying throughout the semester.
Although classical or respondent conditioning involves automatic responses to behavior, operant or instrumental conditioning is a result of the decision to produce a certain behavior in order to receive a reward or avoid a punishment.
Was this article helpful?