Associative processes in addiction relapse models: A review of their Pavlovian and instrumental mechanisms, history, and terminology

Animal models of relapse to drug-seeking have borrowed heavily from associative learning approaches. In studies of relapse-like behaviour, animals learn to self-administer drugs then receive a period of extinction during which they learn to inhibit the operant response. Several triggers can produce a recovery of responding which form the basis of a variety of models. These include the passage of time (spontaneous recovery), drug availability (rapid reacquisition), extinction of an alternative response (resurgence), context change (renewal), drug priming, stress, and cues (reinstatement). In most cases, the behavioural processes driving extinction and recovery in operant drug self-administration studies are similar to those in the Pavlovian and behavioural literature, such as context effects. However, reinstatement in addiction studies have several differences with Pavlovian reinstatement, which have emerged over several decades, in experimental procedures, associative mechanisms, and terminology. Interestingly, in cue-induced reinstatement, drug-paired cues that are present during acquisition are omitted during lever extinction. The unextinguished drug-paired cue may limit the model’s translational relevance to cue exposure therapy and renders its underlying associative mechanisms ambiguous. We reviewmajor behavioural theories that explain recovery phenomena, with a particular focus on cue-induced reinstatement because it is a widely used model in addiction. We argue that cue-induced reinstatement may be explained by a combination of behavioural processes, including reacquisition of conditioned reinforcement and Pavlovian to Instrumental Transfer. While there are important differences between addiction studies and the behavioural literature in terminology and procedures, it is clear that understanding associative learning processes is essential for studying relapse.

Behavioural paradigms in addiction neuroscience often rely on classic Pavlovian and operant conditioning processes. While theoretical associative learning mechanisms are based on both appetitive and aversive conditioning, in addiction neuroscience they are applied to model appetitive motivation for drugs. In Pavlovian conditioning, animals readily learn to expect the delivery of an appetitive or aversive outcome upon the presentation of a cue that has consistently been paired with outcome delivery. For example, a light (conditioned stimulus, CS) that reliably predicts the occurrence of a food-reward or foot-shock (unconditioned stimulus, US) can evoke conditioned responding such as magazine approach or freezing, respectively, in the absence of that outcome. In instrumental or operant conditioning, animals learn to perform a response such as a lever press or nosepoke to obtain a desired outcome, which is often paired with the presentation of a visual and/or auditory cue. Addiction neuroscience takes advantage of both Pavlovian and operant conditioning through procedures such as conditioned place preference, which studies the development of Pavlovian associations between experimenter-administered drugs and specific contexts [1], and drug self-administration studies, where animals perform an operant response to obtain drug rewards [2]. These conditioning paradigms enable the study of how animals learn about rewards and their associated cues and are often complemented by studies of extinction and reinstatement or recovery-from-extinction which are designed to study how changes in the relationship between the cue and the outcome can alter the behavioural response to that cue. Each type of recovery-from-extinction phenomenon has its own behavioural processes and theoretical explanations, with some of the most complex associative processes occurring during cue-induced reinstatement. As recently reviewed by Konova and Goldstein, there are a number of parallels between extinction in experimental animal models and in humans [3]. Therefore, understanding the behavioural and psychological processes mediating these forms of learning can provide critical insight to the treatment and relapse of problem behaviours observed in patients suffering from psychological disorders that affect appetitive motivation, particularly substance use disorders.

Extinction and its Signature Characteristics
Extinction is the most basic and reliable method for reducing unwanted learned behaviours. In essence, extinction of the behavioural response occurs when the association between the cue and the outcome is weakened through the continual presentation of the conditioned cue alone without the delivery of the expected outcome [4]. Extinction is a fundamental process underlying psychotherapy. Extinction-based addiction therapies include cue exposure therapy which is effective for alcohol use disorder [5] and virtual exposure therapies which are employed for substance use disorders and behavioural addictions [6]. However, cue exposure therapies have also frequently been found to be ineffective in clinical populations [7][8][9] and cue exposure therapy is not often implemented [5]. These shortcomings have led many researchers to examine why cue exposure therapy is often ineffective and to propose various modifications to clinical approaches that might improve its efficacy [3,9,10]. A major clinical limitation of the efficacy of extinction-based treatments is that extinction is transient and the extinguished response can return under a variety of conditions such as spontaneous recovery, rapid reacquisition, resurgence, renewal, and reinstatement [11][12][13][14]. This return of the original behaviour is taken as evidence that extinction does not completely erase the original association between the cue and the outcome. Rather, it produces a new learning that competes with the original memory for behavioural control and is highly dependent on environmental cues for its retrieval.
Behavioural studies have identified several factors which can trigger a recovery of responding after extinction. For instance, a previously extinguished Pavlovian conditioned response can reemerge simply with the passage of time, known as spontaneous recovery [4,15,16]. In rapid reacquisition, re-establishing the cue-outcome association results in a faster rate of acquisition compared to initial conditioning [4, see also 17]. Resurgence occurs when an operant response is extinguished while an alternative behaviour is reinforced; if the alternative behaviour then undergoes extinction the former operant response can recover or resurge [18]. In the case of renewal, recovery of the extinguished response can be triggered by changing contexts after extinction, where a novel context is sufficient to renew responding to that extinguished cue and a return to the acquisition context produces the most robust recovery effects [19][20][21]. Finally, in reinstatement paradigms, unsignalled presentations of the outcome are sufficient to cause a return of responding to an extinguished cue [22]. Similar recovery-from-extinction phenomena have been documented in drug conditioning and self-administration studies [23][24][25][26]. Many relapse models in addiction neuroscience are based on these classic Pavlovian and operant models, with similar recovery mechanisms. However, there are some procedural differences that distinguish recovery phenomena between the two fields. We briefly review the associative basis of spontaneous recovery, reacquisition, and resurgence [for more detail, see 14,27,28] in Pavlovian and instrumental paradigms and argue that the associative mechanisms between them are largely comparable. However, most addiction neuroscience research is centred on the renewal and reinstatement paradigms and so these phenomena are the focus of the current paper. While renewal and the drug-primed and stress-induced variants of reinstatement are also readily comparable with Pavlovian recovery phenomena, a comparison between experimental approaches reveals that cue-induced reinstatement is driven by ambiguous associative processes. We therefore deconstruct cue-induced reinstatement at a behavioural level to further understand the associative mechanisms driving recovery and argue that it may be driven by a combination of (reacquisition of) conditioned reinforcement and Pavlovian to Instrumental Transfer.

Spontaneous Recovery
Spontaneous recovery is one of the most basic recovery-fromextinction phenomena that provides evidence that the original learning survives the extinction procedure. In Pavlov's original studies [4], an extinguished conditioned response spontaneously recovered to a level above the minimum achieved at the end of extinction after a period of time had elapsed [4,[29][30][31]. This level of recovery in behaviour can vary depending on the length of time that intervenes between the extinction and test session such that the longer the period between extinction and test, the higher levels of spontaneous recovery [32]. The spontaneous recovery effect has also been observed in instrumental studies by Skinner and others as early as the 1930s [29][30][31]. Spontaneous recovery has been shown occasionally in the addiction literature and can occur in drug self-administration studies [33][34][35][36][37][38][39] and in Pavlovian conditioning studies with drug reinforcers [26,40].
The current, dominant view of the spontaneous recovery effect in both Pavlovian and operant studies is that a shift in the temporal context triggers a recovery of responding [41]. Skinner has argued that unavoidable cues close to the beginning of the test session such as transport and handling help to promote spontaneous recovery [42]. Bouton has since incorporated Skinner's argument as evidence of contextual effects in spontaneous recovery [41]. In addition to the temporal context view, Rescorla has also reviewed several alternative explanations to describe why spontaneous recovery is observed following extinction of the response [32]. These include local performance effects, such as emotional states that build up during extinction [43] but which may have dissipated by the time of the test, and the effects of response fatigue which reduce responding during the extinction session but dissipate over time allowing for the restoration of responding in a subsequent session [32]. However, Rescorla notes that for several of these alternative explanations, empirical support has been mixed or lacking [32].
An alternative view considers spontaneous recovery as a result of differential rates of decay between the original association and the inhibitory extinction memory [27]. During acquisition, excitatory associations between the cue and the outcome are learned, while during extinction a separate inhibitory association is learned. Over time, the original association decays slowly, while the inhibitory association decays more quickly. A given period of time between the end of extinction and test will therefore involve much greater loss of the extinction memory's inhibitory association than decay in the acquisition memory's excitatory association. Differential decay explains observations that longer periods of time between extinction training and test tend to result in greater spontaneous recovery, while spontaneous recovery is reduced if a delay also occurs between the end of acquisition and beginning of extinction training [44]. In the case of an extended interval between extinction and test, there has been more time for the extinction memory to decay quickly, but the acquisition memory is relatively intact. In the case of delayed extinction and a delayed test, although the extinction memory decays prior to test, the extended period of time since acquisition gives the original memory time to decay as well [27].
Another possible explanation for how the passage of time influences recovery is due to changes in sensitization and habituation that occur over time [45]. Sensitization describes a response that is increased (becomes more sensitive) to a repeatedly presented stimulus, while habituation describes a response decrement that occurs to a stimulus across repetitions. Sensitization and habituation are separate processes that can occur at the same time in response to repeated stimulus presentations. For example, Mc-Sweeney and colleagues have argued that sensitization and habituation both occur within operant sessions, as responding is sensitized and increases early in the session but then declines later in the session as habituation becomes dominant [45]. Moreover, aversive Pavlovian conditioning studies have shown that there is substantial overlap in the behaviour and neuropharmacology of habituation and extinction [46,47], providing further support for the idea that habituation is an inhibitory process distinct from sensitization. According to this view, spontaneous recovery occurs because the habituation decays over time, even when animals remain in the conditioning chamber between sessions, leaving a sensitized response [48].
The idea that sensitization may drive increased responding with the passage of time finds empirical support in addiction models, such as incubation of craving and psychostimulant sensitization studies. The incubation of craving model was based on clinical addiction studies where participants reported the subjective feeling of craving returning in phases or episodes [49,50]. In the laboratory, animals learn to self-administer drugs and are then forced to have a period of abstinence. At test, elevated responding is observed based on the length of the abstinence period. Thus, despite the fact that incubation of craving does not involve instrumental extinction prior to testing [49,[51][52][53], this model draws parallels with spontaneous recovery by demonstrating that the passage of time can increase operant responding. Moreover, like spontaneous recovery, incubation of craving has been observed with a variety of drug and non-drug reinforcers, including cocaine, methamphetamine, and palatable food [49,[51][52][53]. It is thought that sensitization may develop over the course of abstinence to drive an increase in responding [49,54], suggesting that sensitization lasts longer than habituation.
Psychostimulant sensitization studies also support the idea of long-lasting sensitization effects that become more apparent with the passage of time. In psychostimulant sensitization studies, animals receive experimenter-administered drugs (e.g. amphetamine) for several days. At test, they receive a challenge dose and sensitization is seen when animals have an elevated response to the drug compared to controls. A period of time between the end of the experimenter-administered drug exposure phase and test is essential to observe sensitization in these experiments [55]. It is notable that while spontaneous recovery can be observed in minutes with a food reinforcer [48], incubation of craving and psychostimulant sensitization studies require days to weeks to elapse before effects can be observed [49,55]. However, this timeline is consistent with the procedures used to test for spontaneous recovery after extinction in animals trained to self-administer cocaine [33,34], alcohol [35][36][37], and nicotine [38,39]. Taken together, evidence from incubation of craving and psychostimulant sensitization studies show that the increased responding observed after a period of time may be driven by sensitization that persists longer than habituation.

Rapid Reacquisition
Rapid reacquisition in a Pavlovian design involves the restoration of the cue-outcome association after extinction. Pavlov was the first to observe that the fresh application of the US restored conditioned responding after extinction [4]. Similarly, Skinner studied both operant conditioning and reconditioning animals after extinction [31]. In both cases, it is understood that the availability of the outcome promotes the recovery effect and demonstrates that extinction has not completely erased the initial training memory [56]. Classical associative learning models can account for rapid reacquisition if the extinguished cue retains some latent associative strength that facilitates reacquisition, though some models such as Rescorla-Wagner have difficulty explaining the related phenomena of slow reacquisition [27]. For Bouton, the conditions of rapid reacquisition return the animal to the original training context, facilitating retrieval of the original acquisition memory [14]. Consistent with the context retrieval account, Skinner observed that conducting multiple rounds of extinction and reconditioning produced successively more rapid extinction curves [31]. Animals may therefore become more adept at distinguishing between the acquisition or self-administration context and the extinction context. In either case, drug self-administration studies have shown that reacquisition after extinction is more rapid than the initial acquisition of the response [57].
However, if reacquisition is driven by a context effect, then it should be influenced by context manipulations and the evidence for this influence is mixed. Bouton and Swartzentruber found that context had an effect on reacquisition of conditioned suppression, with slower reacquisition in a distinct extinction context [58]. In contrast, Willcocks and McNally found overall performance during reacquisition of alcohol self-administration sessions was not significantly different when tested in the same context as acquisition and extinction, a novel context, or a distinct extinction context [59]. The only context effect that they observed was altered latency to first response in the extinction context during reacquisition testing [59]. Moreover, Willcocks and McNally have also shown distinct neurobiological substrates mediate these effects [60]. They found that inactivation of the prelimbic cortex reduced contextual renewal but increased responding during reacquisition [60]. These neurobiological differences were recently extended to the pattern of activation in the mesolimbic dopamine system, such as the medial ventral tegmental area, which was required for reacquisition but not renewal [61]. While rapid reacquisition is generally an underutilised paradigm in drug self-administration studies [62], these neurobiological findings indicate that contextual renewal and rapid reacquisition are dissociable, suggesting that mechanisms other than renewal contribute to rapid reacquisition.

Resurgence
Resurgence is a recovery phenomenon defined by the reappearance of an extinguished response during the extinction of an alternative response, and is unique to operant paradigms [14,63,64]. For example, during Phase 1, rats are trained to press a lever for food reward. In Phase 2, pressing on the original lever is no longer reinforced and pressing on an alternative lever is reinforced. In Phase 3, the alternative lever is extinguished which causes responding on the original lever to increase. According to Epstein, the earliest reports of resurgence effects were by Hull in 1934 [65]. Hull trained rats to run down a 20-foot runway, performed extinction (then described as "frustration"), then retrained them to run down a 40-foot runway [66]. During extinction with the 40foot runway, rats appeared to run faster in the leadup to the 20foot marker, reminiscent of the original response [66]. Another early report of resurgence was a 1951 conference presentation by Carey, where the effect was described in rats trained to lever press for food pellets and described as "reinstatement" or "regression" [67]. Resurgence was then rediscovered in 1970 by Leitenberg, Rawson, and Bath, who found rats would resume operant responding during the extinction of an alternative behaviour [68]. However, it would not be until Epstein and Skinner began studying the effect in the 1980s that it would take on the name of resurgence [18,65].
Multiple theories have been proposed to explain resurgence. For example, Leitenberg and colleagues originally approached resurgence from the perspective of response competition [68], a theory that Bouton and colleagues argue is contradicted by evidence that resurgence is unaffected by the schedule of reinforcement used for the alternative behaviour [14]. A second explanation is based on behavioural momentum theory, where reinforcement of the alternative response increases the disruption to the original response but simultaneously strengthens the original response by providing additional reinforcement in the same context [69]. However, Shahan, who had previously argued for the application of behavioural momentum theory to resurgence, has more recently argued that this theory has encountered difficulty explaining results such as the original response becoming more persistent during extinction even though alternative reinforcement was available [70]. Shahan and Craig therefore have proposed the "resurgence as choice" model, which argues that animals allocate responses based on the values of those responses over time [70]. As the alternative response is extinguished, the value of the original response becomes relatively higher and therefore receives an increased behavioural allocation [70].
In contrast with Shahan, Bouton views resurgence as a context effect [14]. Bouton and colleagues have previously suggested that the alternative response and its reinforcement represent a distinct extinction context for the original response. When the alternative response is extinguished, this creates a novel context which results in renewal [14]. In response to Shahan and Craig's adoption of the resurgence as choice model, Bouton and colleagues noted that contextual elements are now incorporated into resurgence as choice [28], suggesting that there is growing theoretical convergence in the behavioural understanding of resurgence.
Interest in resurgence as an addiction model arises from its potential relevance to contingency management and related approaches which incentivise abstinence from substance use by providing non-pharmacological rewards, in other words, by reinforcing alternative behaviours [71,72]. The resurgence effect suggests that there is potential for relapse when therapy concludes and incentives are withdrawn [14] which is known to occur [73]. Animal studies have shown that resurgence occurs when drugs such as alcohol or cocaine are used as reinforcers [74][75][76]. One potential point of contention is that Bouton and colleagues have conceptualised resurgence as renewal due to a novel context ('C') that is distinct from both the acquisition context ('A') and the extinction context ('B'), also known as ABC renewal [14]. However, attempts to show ABC renewal in alcohol have not been successful in either an operant design [77] or Pavlovian design [78]. It is therefore possible that resurgence in drug self-administration studies may be driven by factors other than contexts, underscoring the importance of understanding the behavioural mechanisms which drive resurgence as an avenue for further research.

Renewal
Contextual theories of recovery phenomena have been extraordinarily influential. In 1978, Welker and McAuley first demonstrated that contextual stimuli could produce a recovery of operant responding for food [79]. In 1979, Bouton and Bolles published two papers on contextual effects on the recovery of extinguished conditioned suppression [21,80]. In one paper, they showed that rats that received conditioning in one context and extinction in a dis-tinct context, recovered or renewed conditioned suppression when they were returned to the original training context [21]. In another study, they showed that when the US (foot-shocks) was delivered in the conditioning chambers the day after extinction but before test, this only produced reinstatement when the foot-shocks were delivered in the same context as testing [80]. These early papers showed the importance of contexts in extinction learning and recovery phenomena and would form the basis of a decades-long research programme that has since extended the renewal model to include both Pavlovian and operant conditioning, multiple contextual manipulations, and an increasingly broad definition of contexts [14,17,19,21,28,41].
The definition of context has now expanded beyond environments that can be distinguished by various stimuli. As alluded to in sections on spontaneous recovery, reacquisition, and resurgence, Bouton also views the presence or absence of manipulanda and reinforcement as contextual factors, as well as the passage of time [14]. In addition to the spatial contexts defined by environmental stimuli, there are also temporal, interoceptive (e.g. hormonal or physiological states), cognitive, and social or cultural contexts [19,28]. Contextual renewal processes are therefore thought to be common to virtually all forms of relapse-like behaviour [14,28,41].
In classical renewal models, contexts are defined by environmental visual, olfactory, and tactile cues. These cues are used to present animals with distinct contexts during acquisition, extinction, and renewal sessions. In the classic ABA design, a Pavlovian or operant response is acquired in one context ('A'), followed by extinction in a second context ('B'), and then animals are returned to the original training context ('A') for renewal [21]. It has since been shown that renewal can also be triggered simply by removing animals from the extinction context, whether extinction occurs in the same context as acquisition (AAB renewal) or whether extinction occurs in its own distinct context and testing for renewal occurs in a novel context (ABC renewal) [14]. Experiments by Bouton and King showed that contexts tend not to accrue inhibitory associative value during extinction [81], in contrast with the predictions of the Rescorla-Wagner model [82]. Bouton and colleagues have therefore maintained, since the 1980s, that contexts have an occasion setting mechanism which retrieves specific associations [14,28,83]. In other words, acquisition and extinction create separate associations with a specific Pavlovian cue or operant response and context determines which of these associations will be retrieved, driving renewal in various contexts [14,28,83].
In addiction models, interest centres primarily on the ABA renewal paradigm [28,62]. ABA renewal has been observed in selfadministration studies with a variety of drugs, such as alcohol [59,60,77,84], cocaine [85,86], cocaine-heroin mixtures [87], heroin [88], and nicotine [89]. However, other forms of renewal such as AAB and ABC renewal are not so readily observed in studies using drug reinforcers. For example, Zironi and colleagues tested rats trained to self-administer alcohol in a novel context (ABC renewal) and did not observe significant recovery while rats trained to self-administer sucrose did show ABC renewal [77]. Similar results have been seen in an appetitive Pavlovian design, where ABA renewal was observed with both alcohol and sucrose reinforcers [78]. However, neither AAB nor ABC renewal was observed with either reinforcer [78]. Similarly, Crombag and colleagues did not observe AAB renewal following self-administration of a cocaineheroin mixture [87] and Fuchs and colleagues observed ABA but not AAB renewal of cocaine self-administration [90]. Bossert and colleagues also did not observe AAB renewal of heroin selfadministration [88], nor was AAB renewal for nicotine observed by Diergaarde and colleagues [89]. Prior studies that have shown AAB or ABC renewal in operant designs have all used food pellets or sucrose as reinforcers [78,[91][92][93]. In instrumental studies with food reinforcers, manipulations such as conducting acquisition training in multiple contexts have been shown to enhance ABC renewal [94], indicating the potential for improved generalisation of the acquisition memory to promote ABC renewal in addiction studies. However, this possibility has not yet been tested, so it remains unclear why these other forms of renewal are not easily observed using drug reinforcers. Nonetheless, the contextual renewal effect appears to be sensitive to the identity of the reinforcer and this may be important for its applications in addiction neuroscience and clinical practice.
The addiction literature also refers to renewal designs as "context-induced reinstatement". While Bouton and colleagues have consistently used the term renewal to refer to recovery due to context change since 1979 [14,21,28,41,81], there is some historical precedent for this alternative nomenclature. In 1978, Welker and McAuley reported that when contextual and transport cues were "reinstated" after extinction, that operant responding for food returned [79]. Although their work is cited as an example of renewal [41], they were actually interested in extinction and spontaneous recovery and do not use the term renewal [79]. In fact, Welker and McAuley refer to their contextual manipulations as "reinstating responding during the final session of extinction" [79]. By the early 2000s the term renewal had become well-defined and widespread in both the Pavlovian and operant literature [14,41,95,96], including studies involving drug self-administration [25,85,87,89]. However, addiction neuroscientists also began to refer to renewal designs as reinstatement around the same time [88,[97][98][99], with some authors transitioning from using renewal to reinstatement [87,100,101] and others applying both terms interchangeably [84,86,89,102,103]. As will be discussed further below, reinstatement in the addiction literature has become an umbrella term that covers multiple recovery phenomena beyond the classical design of Pavlovian reinstatement studies. These differing recovery phenomena have diverse associative mechanisms due to differences between Pavlovian and instrumental learning, as well as the various processes used to precipitate recovery-from-extinction.

Reinstatement
In Pavlovian experiments, reinstatement of responding to an extinguished cue is typically observed following re-exposure to the aversive or appetitive event, often unsignalled and usually prior to testing [22]. For instance, studies of extinction following Pavlovian fear conditioning show that presentations of the foot-shock can reinstate fear responding [17,21,22,41,56,104,105] as well as other aversive triggers that induce the state of fear such as exposure to stressors (e.g. a milder foot-shock than that used in conditioning; [105]) or to a conditioned context [106,107]. This phenomenon has also been demonstrated in animal studies of reward learning [108], as well as in human studies [109][110][111][112]. The term reinstatement is also used to describe recovery that occurs when the CS is tested with other stimuli that have been separately conditioned with the US [41], although definitions of reinstatement do not always recognise this usage [28]. For example, Halladay and colleagues found that presentations of an equally aversive, unextinguished CS after extinction training reinstated freezing to a different, extinguished CS on test [113]. Moreover, this reinstatement of responding induced in the absence of the US is evident in human research [114].
Reinstatement is thought to depend on the restoration or retrieval of the association between the CS and US. A typical Pavlovian design is shown in Figure 1a-c. During acquisition, an association between the CS and US is acquired (Figure 1a). This association is weakened throughout extinction (Figure 1b). The presentation of the US, which usually occurs the day prior to the reinstatement test [41], then produces a restoration of the CS-US association (Figure 1c). Since Pavlov, it has been thought that extinction involves an inhibitory response that suppresses the condi-tioned response [4,41]. Even researchers who have argued that extinction may involve (partial) erasure do not argue against the survival of the original association and new learning of an inhibitory response [56]. According to one view, the presentation of the US is thought to reactivate the original association with the CS and thus lead to a restoration of conditioned responding [41]. Alternatively, Bouton and colleagues argue that reinstatement depends on the context being associated with the US [28], because if it is presented in a different context, then reinstatement does not occur [80]. Presentations of the US can also strengthen the CS-US association, despite the absence of the CS, via mediated conditioning [104]. The prior presentations of the US may therefore restore its association with the context, which then enables retrieval of the CS-US association during test.

Historical Use of the Term "Reinstatement" in Addiction Studies
In operant drug self-administration studies, what is described as reinstatement differs from the stricter definition used in the Pavlovian and behavioural literature [28,41]. The use of the term "reinstatement" in a manner distinct from but related to the definition used in Pavlovian conditioning began to emerge in the addiction neuroscience literature in the 1970s and early 1980s [115]. These early studies all used the term reinstatement to describe the return of responding that was observed as a result of re-exposure to drugs or drug-associated cues. As early as 1971, Stretch and colleagues reported that instrumental responding for amphetamine could be reinstated by injections of amphetamine which they theorised was caused by "reinstatement of the drug state" [116]. In this 1971 paper, and in two subsequent reports, they also use reinstatement to refer to the restoration of response rates that occurred due to drug injections [116][117][118]. In 1976, Davis and Smith described training a neutral stimulus as a conditioned reinforcer by pairing it with intravenously self-administered morphine [119]. After extinction of the instrumental response, the conditioned reinforcer was described as causing "reinstatement" or "restoration" of the instrumental response [119]. The experimental approach of Davis and Smith is now the basis of the cue-induced reinstatement model in widespread contemporary use.
In the early 1980s, de Wit and Stewart reported "reinstatement" of operant responding for cocaine and heroin following injection of various drugs or presentation of a cue that had previously been paired with drug delivery [120][121][122][123]. These seminal papers are credited with establishing the reinstatement model which has been used, in various forms, by addiction neuroscientists ever since [115]. Numerous other studies have now shown that presentations of a food or drug prime can also reinstate operant responding following extinction of the instrumental response [23,24,116,119,[124][125][126][127][128][129]. Moreover, in addition to the drugprimed and cue-induced reinstatement models already developed, subsequent studies also showed that reinstatement could be precipitated in animals by stressors such as foot-shocks [130,131] or by combining multiple precipitating factors, for example, by using both cue presentation and drug priming [132]. As discussed above, renewal is now also described as context-induced reinstatement in the addiction neuroscience literature [100].
The term "reinstatement" has also been used to describe models of relapse after abstinence. Abstinence models may involve forced abstinence, where animals are not given access to drugs such as in the incubation of craving model, or abstinence that is technically self-imposed due to punishment or the availability of a more desirable alternative [133][134][135]. For example, Panlilio and colleagues suppressed operant responding for opioids by punishment with foot-shocks and then "reinstated" responding using drug-priming [136]. Other studies using punishment-induced abstinence have also referred to relapse-like processes as "re- Drug-prime or stressor (At start of test) instatement" [137]. However, drug-seeking after abstinence is not a recovery-from-extinction phenomenon. Rather, other authors have tended to refer to such relapse-like processes as incubation of craving [138] or simply as relapse [133,139,140].
While an in-depth discussion of the associative and behavioural processes underlying relapse-after-abstinence models is beyond the scope of the present paper it seems possible that punishment and relapse models involve reacquisition, as suggested by Panlilio and colleagues [136], or contextual or occasion-setting effects that reactivate the acquisition memory. Marchant and colleagues have also argued that punishment effects are dominated by response-outcome associations that, like extinction, produce context-dependent suppression of responding [141]. Where abstinence produced by the availability of a desirable alternative, response competition is the most obvious possibility and there have even been studies that show, at least for cocaine, that drugs can suppress the response for the appetitive non-drug alternative [142].

Behavioural Processes in Drug-Primed Reinstatement
Drug-primed reinstatement is consistent with the classical Pavlovian definition of reinstatement, due to its precipitation by presentation of the drug outcome (Figure 1d-f). Associative learning models posit that during acquisition the operant response (R) becomes associated with both the drug-paired cues (S) and with the drug outcome (O), which has both a perceptual identity (O i ) and incentive value (O v ; Figure 1d) [143]. The result is associations between R, S and O i /O v . In drug-primed reinstatement protocols, animals may be trained without discrete drug-paired cues [123] or extinction can involve presentation of the cue, omitting only drug de-livery [130]. For example, extinction procedures for drug-primed reinstatement may be designed to merely withhold drug delivery by substituting drug for saline or disconnecting the syringe pump [123,130], leaving any cues paired with infusions in place. Since extinction is otherwise identical to self-administration training, it is clear in these paradigms that the association between the operant response and drug delivery is extinguished (R-O i /O v ; Figure  1e). Moreover, if cues are present, then their association with the drug outcome (S-O i /O v ) is also be extinguished. Drug-primed reinstatement must therefore rely on the response-drug outcome (R-O i /O v ) association, which follows the classical definition of reinstatement as occurring in response to the US [17,22,41,56]. As with Pavlovian reinstatement, theoretical accounts largely differ with respect to whether the response-drug outcome association is reactivated by contextual associations or whether its associative value is restored.
Bouton and colleagues argue that reinstatement in Pavlovian and fear conditioning designs is a context effect [14,28] and their reasoning clearly applies to drug-primed reinstatement. Since reinstatement is context-dependent in Pavlovian fear conditioning [80], Bouton and colleagues argue that reinstatement relies on the animal expecting the US in that context [14,28]. In Pavlovian designs, this expectation is restored by presenting the US prior to test. In addiction studies, the drug-priming injection serves the same purpose. The subjective effects of the drug produce an interoceptive context and these can influence extinction. For example, alcohol has previously been shown to result in statedependent learning [144]. Citing these and other findings showing that drug-induced interoceptive states can influence behaviour, Bouton and colleagues argue that drugs produce interoceptive contexts [28]. Drug-primed reinstatement is therefore a function of an interoceptive version of ABA renewal, where the priming injection returns the animal to the acquisition context, retrieving the response-drug outcome association which produces recovery of responding.
Another explanation for drug-primed reinstatement can be drawn from the Rescorla-Wagner model, according to Delamater and Westbrook [56]. Delamater and Westbrook argue that the Rescorla-Wagner model [82] predicts reinstatement because the US presentations restore the associative strength of the previously extinguished stimulus. Since the design of drug-primed reinstatement is analogous to Pavlovian extinction and reinstatement, it could also be argued that drug-priming restores the associative strength between the response and the drug outcome (R-O i /O v ), driving a recovery of responding. In other words, both context theory and the Rescorla-Wagner model rely on the response-drug outcome association, but differ with respect to whether this association is reactivated by a drug-induced interoceptive context or restored by drug priming.

Behavioural Processes in Stress-Induced Reinstatement
Similar associative mechanisms may be involved in stressinduced reinstatement. Stress-induced reinstatement experiments are conducted in a manner that is essentially identical to drug-primed reinstatement, except reinstatement is triggered by presentation of a stressor (Figure 1d-f). Stress-induced reinstatement paradigms can use a wide variety of stressors, such as acute food deprivation, foot-shock, and pharmacological stressors such as the anxiogenic drug yohimbine [131]. Stress-induced reinstatement has also been observed for a variety of drugs, including heroin, cocaine, methamphetamine, nicotine, and alcohol [131]. While most studies of food-seeking have found that stressors did not induce reinstatement of food-seeking [131], it has been shown to be possible under certain conditions, such as when rats receive daily exposure to the calorie-dense cafeteria diet [145]. Just as for drug-primed reinstatement, extinction procedures for stress-induced reinstatement can merely withhold drug delivery by substituting drug for saline or disconnecting the syringe pump [130,146]. Some stress-induced reinstatement designs present the cue during reinstatement [147], while others have also omitted drug-paired cues during reinstatement testing, despite their presence during acquisition and extinction [146]. Stress-induced reinstatement can also be paired with early life stress, such as postweaning social isolation, though this did not alter reinstatement [148]. This procedural flexibility demonstrates that cue-drug outcome associations (S-O i /O v ) are not necessary for stress-induced reinstatement. Therefore, as with drug-primed reinstatement, the stressor must act to restore or reactivate the response-drug outcome association (R-O i /O v ) to cause recovery of responding.
Bouton and colleagues have applied context theory specifically to stressors and argue that stressors are most likely to promote recovery of responding if they have also been paired with acquisition [28,149,150]. Schepers and Bouton have conducted a series of experiments using hunger and a chronic variable stress protocol to produce interoceptive contexts [149,150]. When hunger or stress was associated with acquisition, but not extinction, then renewal was observed when these conditions were restored prior to test [149,150]. Bouton and colleagues argue that these findings show that stress produces an interoceptive context [28]. Therefore, as with drug-primed reinstatement, stress-induced reinstatement removes animals from the extinction context which enables retrieval of the response-drug outcome association and recovery of responding.
However, there are some procedural differences between Schepers and Bouton's studies and the stress-induced reinstatement paradigm that suggest alternative explanations. In Schepers and Bouton's studies, hunger and stress were present during both acquisition and test [149,150]. In contrast, typical stress-induced reinstatement designs do not introduce the stressor until after extinction [131]. For example, the first stress-induced reinstatement papers used 10 min of intermittent foot-shock in the test context prior to the start of the session [146,147]. Therefore, the design of stress-induced reinstatement studies does not follow the ABA renewal design used by Schepers and Bouton, but more closely resembles an interoceptive ABC renewal design because acquisition, extinction, and reinstatement are each associated with their own interoceptive states produced, respectively, by the drug's subjective effects, the absence of drug, and stress. Moreover, Schepers and Bouton used food as a reinforcer [149,150], while most studies of stress-induced reinstatement have shown that food-seeking is not reinstated by stress [131]. These procedural differences raise the possibility that stress-induced reinstatement is driven by factors others than interoceptive contexts.
One possibility is that stress-induced reinstatement functions through largely non-associative affective mechanisms as animals attempt to relieve their negative affective state via drug-seeking. Drug addiction has previously been theorised to involve processes of negative reinforcement, where drug use alleviates aversive states [151][152][153]. If this were true, then it would imply that the recovery of responding observed during stress-induced reinstatement is goal-directed. As Trask and colleagues have argued, Pavlovian and operant extinction and recovery phenomena share many features and common processes, but goal-directed vs habitual actions are unique to operant behaviours [154]. If stress-induced reinstatement is directed towards alleviating aversive states produced by the stressor, then the recovery of responding relies on the association between the operant response and the affective value of the drug outcome. Therefore, stress-induced reinstatement may be sensitive to outcome devaluation manipulations and this possibility invites empirical verification.

Behavioural Processes in Cue-Induced Reinstatement
Cue-induced reinstatement is driven by ambiguous behavioural and associative processes. Cue-induced reinstatement is a common relapse model that is driven by the presentation of an unextinguished drug-paired cue [98,155,156]. During acquisition, animals first learn an operant response (e.g. lever press) for foodor drug-reward paired with a light or tone cue (Figure 2a). The operant response is then extinguished such that lever presses no longer results in outcome delivery or presentations of the cue (Figure 2b) [103,[157][158][159]. This extinguishes both the response-cue (R-S) and response-drug (R-O) associations, but leaves the cuedrug (S-O) associations intact (Figure 2b). Unlike other Pavlovian reinstatement paradigms, reinstatement of responding in cueinduced reinstatement is assessed using response-contingent presentations of the food-or drug-paired cue (Figure 2c). In other words, the animal makes a response for the non-extinguished reward-associated cue. Now, given that reward-paired cues can elicit conditioned responding in and of themselves (see [160]), dissociating the mechanism mediating the return in responding when the cue is presented as the outcome for the extinguished response becomes challenging. It has also been noted that the usage of the term reinstatement in cue-induced reinstatement is not consistent with the definition used in Pavlovian conditioning and most of the behavioural literature [28].
The associative processes underlying cue-induced reinstatement are important from both theoretical and practical or translational perspectives. Procedural differences between cue-induced reinstatement and the therapeutic approaches it is supposed to model, reduce its translational potential. Specifically, the lack of extinction of the S-O association in the cue-induced reinstatement model does not properly model the cue-exposure therapy paradigms in humans [161]. In a clinical context, cue-exposure therapy involves repeated presentation of drug-associated cues, which would be more appropriately modelled by Pavlovian extinction of drug-paired cues rather than instrumental extinction. In the cue-induced reinstatement paradigm, it is only the instrumental response that is extinguished, so cue-induced reinstatement is not an ideal model for cue-exposure therapy and relapse.
The ambiguity regarding the associative processes which drive cue-induced reinstatement also limit its contribution to theory. If the instrumental response is extinguished alone, it is unclear what is causing the restoration of responding observed during reinstatement. As discussed above, in Pavlovian conditioning it is the restoration or reactivation of the CS-US association that drives reinstatement. However, in operant cue-induced reinstatement models, the analogous S-O association was never extinguished. With respect to associative learning, this leaves three possibilities -the R-S, S-O i , and S-O v associations. Experimental evidence suggests that the R-S association alone is not the driver of cueinduced reinstatement, for reasons that will be discussed below, but there remains some ambiguity regarding the role of S-O i or S-O v associations.

Cue-Induced Reinstatement Relies on Cue-Outcome Associations
A small number of studies have demonstrated that cue-induced reinstatement relies on cue-drug outcome (S-O) associations because separate Pavlovian conditioning or extinction of drugpaired cues can alter later cue-induced reinstatement. One example is the Pavlovian cue-conditioned reinstatement approach, which demonstrates that a Pavlovian conditioned cue can promote reinstatement [162,163]. As shown in Figure 3, rats are trained to self-administer cocaine without cues and given a single Pavlovian conditioning session in the middle of self-administration training. These Pavlovian conditioned cues can later precipitate reinstatement after instrumental extinction when they are presented contingently [162,163]. Since the operant response and drug-paired cue were never combined, this design demonstrates that reinstatement relies on the Pavlovian associations between the cue and the drug outcome.
Studies that combined Pavlovian non-contingent extinction with instrumental extinction have further shown the importance of the S-O associations and further demonstrated context effects. In these designs, rats are trained to self-administer in the presence of cues (Figure 4a), before receiving two separate kinds of extinction (Figure 4b). Instrumental extinction follows standard procedures, omitting both cue and drug. However, additional Pavlovian extinction sessions present the cue alone in a non-contingent manner, extinguishing the S-O associations. At test, reinstatement is diminished because all of the associations have been ex-tinguished (Figure 4c), but these demonstrate that the Pavlovian S-O association is important because the reinstatement test was not simply identical to the instrumental extinction sessions as it was in previous studies [164]. Buffalari and colleagues also conducted extinction with the cues present, which leaves R-S intact, or Pavlovian extinction of the S-O association alone, leaving R-O intact [164]. They found, unsurprisingly, that rats extinguished with cues present showed the least reinstatement while rats that received Pavlovian extinction of the cue alone had the highest level of reinstatement [164]. These effects may also be context dependent. Torregrossa and colleagues gave rats instrumental extinction and then a phase of non-contingent cue extinction in either the training context (A) or a distinct extinction context (B). They found that when non-contingent extinction was given in context A, this produced the lowest levels of cue-induced reinstatement [165]. Non-contingent cue extinction in context B was not effective unless combined with d-cycloserine treatment [165].
Separate studies using the same approach in a single-context paradigm have replicated these results. In a study by Perry and colleagues, rats were trained to self-administer cocaine followed by standard instrumental extinction [166]. Rats that subsequently received Pavlovian non-contingent cue extinction showed reduced cue-induced reinstatement relative to controls [166]. Follow-up studies from the same group have replicated these findings in adult, but not adolescent rats undergoing cue-induced reinstatement [167], and shown that non-contingent cue extinction can effectively abolish incubation of craving [168]. Together, these findings seem to indicate that it is learned associations between the drug-paired cue and the drug outcome (S-O) that drive cueinduced reinstatement. However, while these studies clearly demonstrate that the S-O associations are important, they don't provide evidence about whether it is S-O i or general affective S-O v associations that drive reinstatement.
Unfortunately, there is no simple solution to this ambiguity because cue-induced reinstatement designs require there to be discrete drug-paired cues during self-administration training and for those cues to be omitted during extinction. Unlike other recovery procedures, such as contextual renewal studies where drugpaired cues can be present [102,169] or absent [103] during extinction, cue-induced reinstatement has no alternative trigger for recovery. If drug-paired cues are retained during extinction, then the response rate will decline and the cue-induced reinstatement test will simply be identical to another extinction session. Contemporary reinstatement designs do not reinforce responses during test [103,[157][158][159], nor is this a possible solution because otherwise they would be rapid reacquisition experiments [41,169].

Outcome -Identity (O i ) -V lue (O v )
Pavlovian cue-conditioned reinstatement In order to produce a return of responding, there must be some kind of precipitating factor. In Pavlovian designs, this is achieved with the US, but for cue-induced reinstatement, this has to be the never-extinguished drug-paired cues. Current views centre on the idea that it is a form of conditioned reinforcement [28], however, we argue that cue-induced reinstatement is more ambiguous and complex than this. There are procedural and empirical reasons to believe that cue-induced reinstatement may be better conceptualised as reacquisition of conditioned reinforcement or, alternatively, that it may involve Pavlovian to Instrumental Transfer [28].

Reinstatement as Reacquisition of Conditioned Reinforcement
Conditioned reinforcers are previously neutral stimuli that have become reinforcers through repeated pairings with a primary reinforcer [170][171][172]. In some cases, the definition is operationalised by the requirement that they can support and maintain new operant responses [173,174]. The absence of extinction for the drugpaired cues' S-O association suggests that the reinstatement effect observed during cue-induced reinstatement might be better classified as conditioned reinforcement. This is analogous with the idea proposed by Davis and Smith in 1976 that cues can promote reinstatement [119]. Moreover, some researchers have gone as far as referring to cue-induced reinstatement as an alias for conditioned reinforcement [175]. Bouton and colleagues have also recently considered cue-induced reinstatement and argue that it is driven by conditioned reinforcement, rendering it distinct from drug-primed or stress-induced reinstatement [28]. Conditioned reinforcement involves animals responding for the cue and the classical Pavlovian view of conditioned reinforcement is that the cue itself acquires conditioned value [170,171]. Conditioned reinforcement is also one of the key phenomena cited in support of the incentive sensitization theory of addiction [176]. Consistent with the classical Pavlovian view, incentive sensitization theory posits that repeated pairings between the cue and the drug results in some of the drug's incentive motivational properties being transferred to the cue. This incentive motivational transfer is thought to be observable in sign-tracking behaviour, where animals approach and attempt to interact with appetitive cues [176].

Evidence for Conditioned Reinforcement in Cue-Induced Reinstatement
Several studies have shown that discrete cues paired with drug delivery acquire conditioned reinforcing properties during selfadministration as animals will respond for the presentation of these cues alone in later tests [174,[177][178][179][180]. These effects may be particularly pronounced for nicotine because nicotine-paired cues alone have been shown to maintain responding for several days after a prolonged 40-day self-administration phase [181]. Even if cue omission during extinction results in the extinction of the R-S association, reinstatement is explained by the conditioned reinforcing properties of the drug-paired cues, which have acquired their own incentive value. It would therefore not be the R-O association that is reactivated during reinstatement, but an R-S-O association that drives responding. However, if cue-induced reinstatement really is driven by conditioned reinforcement, as both Kawa and colleagues and Bouton and colleagues have suggested [28,175], then it may actually be a form of reacquisition. In their study, Kawa and colleagues trained rats to nosepoke for cocaine. Each cocaine delivery was simultaneously paired with presentation of a cue light. Following standard protocols, nosepokes made during extinction had no programmed consequences, but during their reinstatement test nosepokes resulted in cue presentation but not drug delivery [175]. However, the R-S association should have been extinguished by cue omission during the extinction phase. When the CS and US are paired again after extinction in Pavlovian designs, this is referred to as reacquisition [41]. If the cue is a conditioned reinforcer and is paired with the response again after extinction, then this design more closely matches reacquisition of conditioned reinforcement than simply conditioned reinforcement.
Even if cue-induced reinstatement is driven by conditioned reinforcement or its reacquisition, this does not necessarily clarify the mechanism by which the cue elicits responding. For example, is the elevation in responding during cue-induced reinstatement because the cue is reinforcing in itself or does cue presentation produce an excitatory signal that stimulates further responding? Shahan has argued that conditioned reinforcement occurs because the cue acts as a sign-post towards the physiologically-relevant reinforcer [182]. According to this view, animals respond for predictive stimuli because of their temporal relationship with the reinforcer [182][183][184]. Along with the classical Pavlovian account of conditioned reinforcement as acquiring conditioned value [170,171], this would imply that conditioned reinforcement is driven by the S-O association. Parkinson and colleagues found that a sucrosepaired Pavlovian conditioned reinforcer is not sensitive to outcome devaluation, which they suggest may be because conditioned reinforcers activate a central appetitive motivational state or can become a goal in their own right [185]. Further studies are required to assess whether these findings are relevant to cue-induced reinstatement for drugs, for example by conducting devaluation of the cue or outcome prior to reinstatement testing.

Conditioned Reinforcement Does Not Fully Explain Cue-Induced Reinstatement
Conditioned reinforcement provides a compelling explanation for cue-induced reinstatement, but it does not fully explain all aspects  of the phenomenon. For example, non-contingent cue presentations at the start of the session have been used to precipitate cue-induced reinstatement in mice trained to nosepoke for nicotine [186]. Non-contingent presentation of the cue prior to extension of the lever can also promote reinstatement of cocaineseeking and sucrose-seeking in rats [187,188]. Moreover, noncontingent cue presentation is not a redundant reinstatement trigger because studies of cue-induced reinstatement of alcohol and cocaine-seeking have used non-contingent cue presentations in cases when animals did not earn their own cue-presentations by operant responding early in the session [157,167,189]. The time course of responding during cue-induced reinstatement also suggests that there are contributing factors other than conditioned reinforcement. Tunstall and Kearns have reported, for example, that approximately half of lever presses during cue-induced reinstatement for cocaine occurred during the 10 s cue presentation [190]. Indeed, if responses during cue presentation are excluded, it appears as if rats are barely increasing their responding above extinction levels (approximately 20 responses under extinction vs. 30 responses during reinstatement) [190]. Similar results have been found with cue-induced reinstatement of sucrose-seeking, where approximately half of responses were during cue presentation or time-out [191]. Assuming cue-induced reinstatement is driven by conditioned reinforcement, this pattern of responding suggests that rats are responding not only to obtain the cue, but because of the cue.
Furthermore, contingent cocaine-paired cues appear to have no effect on established instrumental responding, suggesting they provide little conditioned reinforcement in many selfadministration studies. If animals are trained to self-administer cocaine in the presence of cues, the removal of these cues does not alter responding [192]. Moreover, if extinction is initiated without cocaine delivery but with the presentation of cues, rats will rapidly extinguish their responding [192] with no significant difference when compared with rats that receive extinction of the lever alone [164]. If the cocaine-paired cues were indeed acting as conditioned reinforcers, as they have previously been shown to support the acquisition of a new response in sessions across multiple days [180], then an operant response that is still paired with the cue should be more resistant to extinction than for the lever alone. These results indicate that, at least for cocaine, the presence or absence of the cue during extinction is not sufficient to maintain responding.
In the case of nicotine, conditioned reinforcement may make a larger contribution to cue-induced reinstatement. Nicotine facilitates the acquisition of conditioned reinforcement [193] and cues are important for the acquisition of nicotine self-administration [194]. Once self-administration has become established, nicotinepaired cues can then maintain responding on their own (i.e. in the absence of nicotine), for months [195] demonstrating a powerful and persistent conditioned reinforcement effect. Similarly, cueinduced reinstatement for nicotine persists across multiple tests, although lower doses of nicotine may only support a single reinstatement test [196]. These results suggest although conditioned reinforcement might not fully explain cue-induced reinstatement for several drugs of abuse, its relative contribution varies between drugs and may be greater for nicotine.

Cue-Induced Reinstatement as Pavlovian to Instrumental Transfer
Cue-induced reinstatement protocols also strongly resemble Pavlovian to Instrumental Transfer (PIT) and there is empirical evidence that supports a role for PIT rather than conditioned reinforcement alone. Indeed, conditioned reinforcement itself has been shown to be mediated by PIT mechanisms in some circumstances [173] and procedures which produce PIT can also produce conditioned reinforcement [197]. For example, acquisition of an operant response for a conditioned reinforcer can be insensitive to outcome devaluation [173,185], suggesting a general affective or excitatory effect analogous to general transfer in PIT (also called non-selective PIT). While there may be overlap in the processes involved in both conditioned reinforcement and PIT, they can be distinguished behaviourally -such as in circumstances where a procedure produces one but not the other -and neuropharmacological manipulations may also be specific for conditioned reinforcement or PIT [197]. In PIT paradigms, animals receive separate instrumental and Pavlovian conditioning for the same reinforcer (Figure 5a-b). At test, animals continue to perform instrumental responses, but presentations of the cue modulate the rate of instrumental responding [197][198][199]. PIT mechanisms would therefore explain the pattern of responding observed during cue-induced reinstatement, where a very high percentage of responses occur during cue presentation [190,191]. PIT is most commonly conducted using non-drug reinforcers such as food pellets [200], but studies using drug reinforcers have also been conducted [201][202][203]. In humans, a nicotine-paired cue potentiated instrumental responding more than a food-paired cue [204], demonstrating that PIT may vary depending on the reinforcer. As shown in Figure 5c, specific PIT involves responding driven by the predictive value of the cue via an S-O i -R association while general PIT (Figure 5d) involves responding driven by retrieval of the affective value via an [S-O v ]-R association [143]. As discussed above, reinstatement is driven by S-O associations [162,164]. However, there remains some ambiguity in whether reinstatement is driven by S-O i or S-O v associations. PIT studies may therefore help to clarify this ambiguity.
The design of cue-induced reinstatement studies is more consistent with a general or non-selective PIT effect driven by an [S-O v ]-R association due to its use of a single outcome. There are  different PIT procedures that can preferentially evoke general and specific transfer, with the main variants called non-selective PIT and outcome-specific PIT [197,198]. In non-selective paradigms, the Pavlovian conditioning phase involves two stimuli of which only one is reinforced and the instrumental phase involves a single lever paired with a single outcome. In outcome-specific transfer paradigms, the Pavlovian conditioning phase provides a different reinforcer for each stimulus and the instrumental phase similarly trains two levers each paired with their own outcome [197,198,205,206]. As reviewed by Cartoni and colleagues, nonselective PIT usually produces general transfer rather than specific transfer [197], which is thought to be mediated by the general excitatory or motivational function of the cue [198,207,208]. In cue-induced reinstatement, the design of the study is most similar to non-selective PIT because, although there are usually two levers and only one cue, there is only one outcome. Following Holland [209], Cartoni and colleagues suggest that general transfer tends to be associated with non-specific PIT paradigms because of their less detailed representations about the outcome. Therefore, it would be expected that cue-induced reinstatement would be mediated by a general PIT effect because the outcome representations in these designs are singular. Holland also showed that extended instrumental training (20 sessions) was more likely to result in general transfer than minimal instrumental training (5 sessions) [209], which would also imply a role for general PIT in cueinduced reinstatement since self-administration studies typically involve 10 or more days of self-administration [157,168,189,[210][211][212][213][214]. Bouton and colleagues have recently suggested a role for general PIT in cue-induced reinstatement, noting many common neurobiological substrates between them [28].
There is also some experimental evidence that drug-paired cues can have outcome-specific effects. Rubio and colleagues trained rats to press one lever for cocaine and, on alternate days, to press a second lever for heroin [215]. Each drug delivery was paired with activation of a cue light above its respective lever. Rats were then subjected to standard lever extinction, where levers were inserted but had no programmed consequences. At test, rats received initial non-contingent presentations of either the cocaine cue or heroin cue immediately prior to extension of all levers, with the lever corresponding to the drug presented at the start of the session triggering cue presentations [215]. They found that cues specifically reinstated responding on their lever, but that they did not trigger reinstatement on the alternative drug lever [215]. Their findings are consistent with previous studies of drug-primed reinstatement of polydrug use, where animals were trained on both co-caine and heroin, but a priming injection only reinstated responding on the lever that matched the drug prime [216]. These findings do not rule out a role for general PIT because they involve much more complex outcome representations, but they do suggest that reinstatement may be goal-directed.
The procedural parallels between cue-induced reinstatement and PIT, combined with evidence that suggests a potential goaldirected component to reinstatement, suggest that PIT may contribute to cue-induced reinstatement. However, further studies that more precisely examine whether the specific S-O i or general affective S-O v associations drive cue-induced reinstatement are required. One approach is suggested by Clemens and colleagues who combined outcome devaluation with extinction and drugprimed reinstatement [217]. In their study, rats received nicotine self-administration training followed by outcome devaluation by pairing nicotine with lithium injections. Nicotine-primed reinstatement was impaired in animals that had received 10, but not 47 days of self-administration training [217]. If this kind of design could be replicated for cue-induced reinstatement, it might provide evidence about whether cue-induced reinstatement is driven primarily by general PIT or whether it is a goal-directed behaviour.

Alternative Mechanisms in Cue-Induced Reinstatement
Alternative explanations for cue-induced reinstatement may arise from other associative and non-associative mechanisms. Recent work has shown that performance in a PIT paradigm does not differ significantly from rats characterised as having an addictionlike phenotype, based on motivation in a progressive ratio, persistent responding during intermittent periods of drug unavailability, and punishment-resistant responding [218]. Although performance in the PIT paradigm did correlate with performance during cocaine self-administration, it is not clear whether this would translate to cue-induced reinstatement [218]. These studies did not address whether PIT mediated reinstatement directly, but because PIT did not correlate with other addiction-like behaviours, it suggests that these behaviours may not be completely driven by the associative mechanisms discussed above.
One possible alternative mechanism is habit learning. Habit learning is thought to involve cue-elicited drug-seeking without retrieval of drug outcome (O i or O v ) memories [143]. While some have disputed the importance of habits in drug addiction [219], habit formation is commonly thought to support drug addiction [217,220,221]. However, if habitual responding does not rely on retrieval of the drug outcome, then this raises whether it is reactivating R-O associations like Pavlovian reinstatement approaches are thought to. It also might not be expected that Pavlovian noncontingent cue extinction would be effective in reducing habitual responding if no drug outcome memories are required.
Another mechanism that could play a role in cue-induced reinstatement is incubation of craving. As discussed above, incubation of craving refers to the time-dependent increase in drug-seeking after cessation. Evidence from both human [222] and animal [133] studies have shown an increase in the degree of cue-induced craving or reinstatement after longer periods of abstinence [223]. For instance, humans experiencing incubation of craving report craving the drug more when exposed to drug-related cues after 35 days than after 7 days [222]. This increase in craving initially appears to be a non-associative mechanism that runs contrary to the classical associative learning view that associative strength can decay over time [27]. However, there are also plausible associative accounts of incubation of craving, such as a loss of reactive inhibition [224], weakening of an opponent process that was inhibiting craving [153], or the Kamin effect -a U-shaped memory retention curve [225][226][227][228][229]. Further, incubation of craving appears to modulate drug memory retrieval because it can be inhibited with further extinction training. This appears to be effective whether animals are given instrumental extinction [230] or Pavlovian non-contingent cue extinction [168], indicating that retrieval of both the R-O and S-O associations may be important in reinstatement. Therefore, cue-induced reinstatement may, at least in part, rely on alternative mechanisms that modulate retrieval of the previously-extinguished R-O association and unextinguished S-O association.

Reinstatement Nomenclature
As noted above, the terminology of reinstatement differs between the addiction neuroscience literature and the behavioural literature [28]. This is not unusual in historical terms, as the term reinstatement has been variously used to refer to resurgence [67], and with respect to cues in a study now considered an antecedent of contextual renewal [79]. Reinstatement has also been used since the earliest operant relapse models in the addiction neuroscience literature emerged in the 1970s and 1980s [119,120,123]. However, the differing usage of the term reinstatement between drug self-administration studies and the generally Pavlovian behavioural literature does need to be recognised [28]. Despite both being described as reinstatement, Pavlovian reinstatement and cue-induced reinstatement in drug self-administration studies are clearly driven by diverse associative mechanisms. Indeed, the term reinstatement in the addiction literature has become more of an umbrella term, encompassing relapse-like models driven by drug priming, stress, cues, and context change.
The literature contains other examples of such pragmatic resolutions of differences in nomenclature. For example, the orexin/hypocretin system was simultaneously discovered by two research groups via different approaches and given two namesorexin and hypocretin [231,232]. Both terms have neuroanatomic or behavioural merit and are widely used resulting in a compromise on nomenclature -hypocretin is the official gene name and pharmacologists use orexins to describe the ligands and receptors [233]. Corticotropin releasing factor or corticotropin releasing hormone (CRF/CRH) also has a disputed nomenclature based on considerations of molecular structure and hormonal or extrahormonal functions [234,235]. Like the orexin/hypocretins, one term (CRH) became the official nomenclature for geneticists while the other (CRF) is used by pharmacologists to describe the protein products [234]. In each case it is now a practical necessity to recognise both terms because of their widespread usage. It seems that a similar compromise is emerging for reinstatement, as it is now acknowledged that the term's usage is different between the addiction and behavioural literature [28].

Conclusions
Several recovery-from-extinction approaches are currently used in addiction neuroscience to model relapse. These include spontaneous recovery, rapid reacquisition, resurgence, renewal, and reinstatement. In each case, there are multiple associative learning approaches that can elucidate or provide insight into how the operant response recovers after extinction, with context theory being one of the most influential. In most cases, the associative processes in Pavlovian designs and operant drug self-administration studies are similar with the exception of cue-induced reinstatement, where recovery of responding is driven by an ambiguous process associated with the unextinguished drug-paired cue. Since the instrumental response is extinguished with respect to the drug outcome, the reinstatement effect is described by some as a conditioned reinforcement effect, even though the cue is also omitted during extinction. However, examination of the experimental design suggests it is more akin to reacquisition of conditioned reinforcement. The pattern of responding during cueinduced reinstatement also implies that animals are responding because of the cue, in addition to responding for the cue, suggesting a potential role for Pavlovian to Instrumental Transfer. There are also alternative mechanisms, such as incubation of craving, that may modulate the retrieval of operant associations, including the response-drug outcome (R-O). While the associative processes that contribute to cue-induced reinstatement remain ambiguous, this ambiguity suggests several additional hypotheses related to conditioned reinforcement and PIT in cue-induced reinstatement that invite empirical validation. Although reinstatement terminology and experimental procedures differ between associative learning and addiction neuroscience, it is clear that associative learning mechanisms are highly relevant and informative to understanding the processes mediating relapse-like behaviours. As scientists turn to associative learning models to develop improvements in extinction-based therapies for addiction [3,9,10], a better understanding of the associative learning that underpins relapse is likely to be essential for improving future clinical outcomes.

Declarations Editorial Checks
• Plagiarism: Plagiarism detection software found no evidence of plagiarism. • References: Zotero did not identify any references in the Re-tractionWatch database.

Peer Review
The review process for this paper was conducted double-blind because at least one of the authors is a member of the committee of management of the publisher, Episteme Health Inc. During review, neither the authors nor the reviewers were aware of each other's identities.
For the benefit of readers, reviewers are asked to write a public summary of their review to highlight the key strengths and weaknesses of the paper. Signing of reviews is optional.

Reviewer 1 (Sarah Baracz, UNSW Sydney, Australia.)
This review article provides a comprehensive overview of associative learning theories and demonstrates how such theories provide insight into patterns of behaviour evident when modelling aspects of addiction, particularly reinstatement. The authors provide a compelling argument that cue-induced reinstatement is likely explained by two theories, these being reacquisition of conditioned reinforcement and Pavlovian to Instrumental Transfer.

Reviewer 2 (Anonymous)
This review provides a comprehensive and scholarly analysis of the studies that use particularly the extinction-reinstatement model to study relapse in substance use disorder. It covers both historical and seminal findings as well as recent findings, and is both engaging and informative. It deals almost exclusively with operant self-administration, not taking into account models such as conditioned place preference (mentioned but not really reviewed). Another notable omission are more recent variants of extinctionreinstatement model (e.g. voluntary abstinence following punishment/social interaction), which may perhaps be seen as beyond the scope of the review. Nevertheless, the conclusion drawnthat an understanding of associative processes is important to understanding substance use and relapse -is worthwhile and welljustified by the arguments presented.

Reviewer 3 -References Review (Anonymous)
I have checked the paper's references and have found that the information on each reference is correct and complete, papers have been cited appropriately and the reference list contains only papers in legitimate peer-reviewed sources with no applicable editorial notices.