Conventional anti-jamming methods mostly rely on frequency hopping to hide or escape from jammers. These approaches are not efficient in terms of bandwidth usage and can also result in a high probability of jamming. Different from existing works, in this article, a novel anti-jamming strategy is proposed based on the idea of deceiving the jammer into attacking a victim channel while maintaining the communications of legitimate users in safe channels. Since the jammer’s channel information is not known to the users, an optimal channel selection scheme and a sub-optimal power allocation algorithm are proposed using reinforcement learning (RL). The performance of the proposed anti-jamming technique is evaluated by deriving the statistical lower bound of the total received power (TRP). Analytical results show that, for a given access point, over 50% of the highest achievable TRP, i.e. in the absence of jammers, is achieved for the case of a single user and three frequency channels. Moreover, this value increases with the number of users and available channels. The obtained results are compared with two existing RL based anti-jamming techniques, and a random channel allocation strategy without any jamming attacks. Simulation results show that the proposed anti-jamming method outperforms the compared RL based anti-jamming methods and the random search method, and yields near optimal achievable TRP.