Variable Reinforcement is a LIE

What if everything you've been told about variable reinforcement schedules is only half the story?

Today, I want to give you a taste of the kind of topics I go over with my students and hopefully give you something to think about. The schedule of reinforcement is something that gets thrown around a lot in dog training and in psychology. I'm sure you've heard trainers say to switch to a variable reinforcement schedule, because random rewards make the behaviors stick, and then they are going to give you the slot machine example. In fact, probably many of you are doing the same. And sure, there is truth to that. But if you stop there, you are probably missing the bigger picture.

The way I use reinforcement schedules in training goes a little bit beyond slot machine logic. So let me break this down quickly. But first let's start with the basics. In behavioral psychology there are several reinforcement schedules. We have continuous reinforcement in which basically every single correct response is rewarded. And the advice that's always given and its true is that it's the best option for teaching new behaviors. We also have a fixed ratio. That's where the reward is offered after a set number of responses.

For example, like in let's say IGP in the bark and hold, if the trainer rewards every third bark, then the dog most likely is going to bark three times and wait for the reward instead of continuous barking. That's your fixed ratio example. And we have a variable ratio, also known as the random schedule of reinforcement. This is where we reward after unexpectable number of responses. That's where the Las Vegas analogies come. Sometimes we pay after two, sometimes after five responses and so on. Another one: fixed interval. We reward for the first response after a set period of time. For example, the dog only gets reinforced if it sits after 30s have passed. Variable interval, we reward for the first response after unpredictable time intervals. So sometimes it can be after 10s, sometimes after one minute.

That's pretty much the full menu of reinforcement schedules. Now the reality is that in dog training we don't use most of this deliberately. Trainers typically stick with continuous reinforcement when teaching and variable reinforcement when trying to maintain the behavior. That's where the slot machine analogy always comes in, right? But there is also something else to watch for. Sometimes trainers fall into fixed or interval patterns without ever realizing it. Maybe they always reward after exactly three steps of heeling or always at the end of one minute down state. The dog notices those patterns, even if the trainer doesn't intend them. So yes, technically all these schedules exist. But in dog training. In practice, dog trainers lean heavily on continuous and variable reinforcement schedules and that's where most of the conversation really gets stuck. This is why many trainers start on continuous reinforcement and then switch to variable once the dog knows it. Sure, there is science behind this. You have Skinner, you have resistance to extinction research and the whole thing. And yes, if you stop rewarding completely a behavior that is trained on variable reinforcement schedule, it will usually last longer than one trained only on continuous. So in theory it makes total sense.

But let's look closer at that slot machine example. Here is something we kind of forget or don't think of. Most people are not hooked on slot machines. Millions go to Vegas, play a little for fun and walk away. I'm a prime example of this. Only a certain type of person gets trapped in the cycle of gambling addiction. Why? Because variable schedules don't magically hook everyone. Addiction happens when the schedule interacts with the individual, their personality, the brain chemistry, susceptibility, and so on. And dogs are no different. Not every dog is motivated by unpredictability. For many, a variable schedule of reinforcement doesn't necessarily create obsession. It’s just a pattern. So the slot machine analogy is kind of an oversimplification.

Persistence isn't just about randomness. What really matters is when the behavior itself becomes enjoyable, meaningful, and habitual. That's what makes it bulletproof. But here is the real problem dogs aren't pigeons in Skinner boxes, and they're definitely not gamblers pulling levers at the slot machine. So, that kind of logic stops in the laboratory to me. In training, behaviors don't just survive because of randomness. They survive because they become habits. And because the dog actually loves doing them. Once the behavior is a true habit, extinction isn't really a threat. And when a behavior is self-reinforcing, when the dog takes joy in the work itself, no slot machine can compete with that.

I'll give you two quick examples from my own training, like let's take heeling. When I start teaching heeling, of course, I use a continual schedule of reinforcement. So every correct step gets paid. That clarity is essential, right? But over time, something changes. For my dogs, heeling becomes more than just a paycheck. It becomes a habit and a joy. They crave the challenge, the precision, the interaction with me. At that point, a variable schedule of reinforcement isn’t what holds heeling together. It is actually the dog's own enjoyment. Now take the like - let's take the send away. If I want more distance, more speed, sharper precision. Then again, a random reinforcement schedule is not going to get me there. This is where the continuous schedule of reinforcement shines again. I reward every improvement, every sharper effort. That's how I will raise the criteria for the behavior. So you can think of it this way. A variable schedule of reinforcement tends to freeze the behavior where it is. Continuous schedule lets me sharpen it.

Now here is something that goes completely against the way reinforcement schedules are usually taught. Most trainers and most textbooks, if you want, will tell you "use continuous schedule of reinforcement during the acquisition state to teach the behavior, then quickly move to variable schedule of reinforcement to make the behavior more persistent and resistant to extinction. And for the maintenance stage, of course, they would advise to stay with a variable schedule of reinforcement because of that. That's your conventional wisdom. But here is what I found works better. I often continue to use a continuous schedule of reinforcement during the maintenance stage, and I do it for what I call high maintenance behaviors. What do I mean?

Let's take the sit as an example. For most dogs, sitting is a natural and easy response, easy to learn. Once trained, I will get a decent sit every time. But in competition, a decent sit isn't enough. We want something sharper, something faster, more precise. Something better than the dog's natural "This is how I do it." And so to get that extra level, I keep rewarding on a continuous schedule of reinforcement. Why? Because when I am asking for more than the dog's default behavior, I need constant reinforcement to maintain that sharpness.

There is a warning that goes with that. Like, you can ask only for a little more. If we push too far, if we demand more and more and more, eventually we're going to hit a breaking point. And instead of improving the behavior, it's going to start to crumble because the dog literally cannot perform better than its limit. And here is another point. Trainers will say, "Just reward the better attempts on a variable schedule and you will be able to raise the criteria", but it doesn't work that way. If the dog is already on a variable reinforcement schedule, withholding that reward is nothing unusual to them. The dog isn't going to be disappointed and then try harder, or often more. It just assumes, "Maybe next time I will get paid as I always do." That's why I stay with a continuous schedule for reinforcement for high maintenance behaviors. Hope that makes sense. So again, random schedules we can say keeps things alive. But it is the continuous schedule of reinforcement that makes behaviors better.

Here is the part that not many talk about. And for me, it's really the essence of advanced training. We usually hear about extinction bursts in a negative way. A dog barks at the door. We stop reinforcing it. Then for a while, the barking gets worse before it fades out. So a trainer's going to say, yeah, that's a typical extinction burst so just be ready for it. However, extinction bursts aren't just an obstacle. They can be one of the most powerful tools for improving behavior. Here is how. When I'm using a continuous reinforcement schedule, my dog expects a reward every single time. So if I suddenly hold back, the dog is going to what? Protest. Hey, where is my money? And in that protest, the dog exaggerates the behavior that exaggeration is gold. That's how I sharpen the heel, fast send aways, sit, and so on. The extinction burst pushes the dog to offer more. And then I reward that. That's how I raise the criteria.

Picture trying this on a dog already on a variable schedule of reinforcement. It's not going to work. The dog doesn't protest. It just assumes "okay, maybe I get paid next time." There is no reason for bursts There is no reason to make an extra effort. So yes, variable reinforcement helps keep the behavior alive, but it's the continuous reinforcement schedule that gives me the leverage to make behaviors better. And that's a huge difference most trainers completely miss.

Before I wrap up, let me touch on something related that shows up in training. Sometimes dogs will bark, whine or even nip at the handler during performances. Many call this leaking. Some of course call it drive. Others confuse it with disobedience. But most of the time it comes down to two things: lack of clarity about what's expected or mishandling reinforcement schedules. If the dog knows there is a demand to do something but doesn't fully understand what, the frustration of course leaks out. If reinforcement has been mishandled, let's say coming off a continuous schedule of reinforcement too quickly the protest shows up as vocalizing or biting as I just said. So here is the key. That protest isn't always bad. Just like extinction bursts, it's a two-edged sword. If you recognize it, you can harness it. That bark the nip, the little explosion of frustration can be channeled into sharper effort, more speed or more intensity. Whatever you are after. But unfortunately, not everyone sees it this way.

There is a social media influencer that comes to mind who always recommends stopping dogs’ air supply with a dominant collar every time it vocalizes. And the sad part is trainers are hitting the like button and thinking, this is brilliant. It's not. It's horrible advice because barking or whining in training isn't fixed by choking out the dog. It's fixed by clarity and skillful reinforcement handling. The dog isn't misbehaving, it's giving you feedback. And if you read that feedback properly, you can turn it into brilliance instead of a conflict.

Variable schedules are, once again, very useful, but they are not the magic bullet. Continuous schedule of reinforcement is critical for raising criteria and improving precision. Extinction bursts aren't just something to fear. They are one of the best tools for sharpening behavior. The barking, whining, nipping, and so on are often protests born of reinforcement conflict, which can be channeled productively if you know what you are doing. As far as the Las Vegas comparison. Yeah, as I said, it's only half the story. Persistence isn't just about randomness. Real durability comes from habit and joy of the interaction itself. The goal isn't to keep behavior alive by chance. It's to make them so clear, so strong, and so enjoyable that they become a habit. Self-reinforcing and therefore bulletproof.

If you're training your dog, don't obsess over making rewards random. Ask yourself, have I made the behavior crystal clear during the continuous schedule of reinforcement? Am I using reinforcement to raise quality, not just keep things going? Does my dog actually enjoy the work? Of course. Does it enjoy the interaction enough that it becomes a habit? If you focus there, extinction isn't the problem trainers make it out to be. So this is just one of the many interesting things that I teach at my school for dog trainers: Training Without Conflict. Of course we cover reinforcement schedules, but we do go beyond the textbook and beyond what I call obvious content. We teach how to use extension bursts as a tool, how to raise criteria, how to interpret vocalization and conflict behaviors, and how to create actions that become self-reinforcing. Because real training isn't only about slot machines, it's about clarity, joy, precision, connection.