The Utility of Voluntary and Involuntary Behavior in Detecting Deception’

ultiplied by its probability. With each action’s value we can start calculating the regret for each of them, to do this we must first multiply each action’s value by its probability in every previous strategy used by the agent, to obtain a value for each of those past strategies. Having the value of each past strategy, the algorithm will proceed to calculate the regret of each action by adding the difference between the action’s value and a past strategy’s value, multiplied by the probability of reaching the current state of the game using that past strategy. Averaging the values obtained for each past strategy per action we obtain the regret for a given action. A new strategy is then produced based on the regret that each action gives, when considering the total amount of regret of all actions. After having the new strategy defined, the agent will then chose to play the action that produces the least amount of regret. The process mentioned before is only used when a decision involves an action, challenge or counteraction. When the decision involves losing a card or changing cards, the algorithm uses a different approach where the calculations only take into account the value of the outcomes of losing or changing certain cards. After calculating the value of the outcome for each possible action of losing or changing cards, the algorithm selects the action that provides the most value for our agent and is most likely to reach a favourable end of game state. The environment chosen to test their agent was the same as ours, using the Coup Game as a domain where the agent and it’s opponents can and will probably bluff many times to outsmart the others.