Poker Bot 'Pluribus' Beats Top Humans In Six-Max No-Limit Hold'em

When computer scientists first started making headway towards creating bots capable of beating top human professional players at poker, there was at least some solace in the fact the bot only excelled in limit hold’em. The complexities of bet sizing kept the more commonly spread variant of no-limit hold’em safe for the time being.

Then in 2017, a bot named Libratus beat a selection of top human pros in heads-up no-limit hold’em, and the poker community had to resort to comforting themselves that the bots could only win at heads-up poker.

In June of 2019, however, the researchers behind Libratus revealed that their new bot, Pluribus, was able to win against top human professionals while playing six-max no-limit hold’em.

Noam Brown and Tuomas Sandholm began working on poker as a stepping stone to other complex incomplete-information problems over a decade ago at Carnegie Mellon University. Their most recent creation was made in conjunction with Facebook AI research.

“No other popular recreational game captures the challenges of hidden information as effectively and as elegantly as poker. Although poker has been useful as a benchmark for new AI and game-theoretic techniques, the challenge of hidden information in strategic settings is not limited to recreational games. The equilibrium concepts of von Neumann and Nash have been applied to many real-world challenges such as auctions, cybersecurity, and pricing,” Brown and Sandholm wrote in their research article released in Science Magazine. “The past two decades have witnessed rapid progress in the ability of AI systems to play increasingly complex forms of poker. However, all prior breakthroughs have been limited to settings involving only two players. Developing a superhuman AI for multiplayer poker was the widely-recognized main remaining milestone.”

2012 WSOP main event champion Greg Merson The Pluribus bot was evaluated against human poker professionals in two experiments. In one, five human players sat with one copy of Pluribus. The human players involved in this experiment were Jimmy Chou, Seth Davies, Michael Gagliano, Anthony Gregg, Dong Kim, Jason Les, Linus Loeliger, Daniel McAulay, 2012 WSOP main event winner Greg Merson, two-time bracelet winner Nick Petrangelo, Sean Ruane, Trevor Savage, and Jacob Toole.

Over the course of 12 days, a total of 10,000 hands were played, with five volunteer players selected based on their availability. Players were assigned aliases, essentially screen names, so that they could track opponent’s tendencies, but were not told the identities of their opponents while playing. To incentivize the humans to play their best, $50,000 was divided among the human players based on their performance. Pluribus won an average of 48 milli-big-blinds per game, or in the more commonly used metric: 4.8 big blinds per 100 hands.

“This is considered a very high win rate in six-player no-limit Texas hold’em poker, especially against a collection of elite professionals,” the article continued. “[It} implies that Pluribus is stronger than the human opponents.”

The second experiment saw six-time WSOP bracelet winner Chris Ferguson and four-time World Poker Tour main event winner Darren Elias each square off against a table of five copies of the Pluribus bot. They were compensated $2,000 for participating, with an additional $2,000 to go to whichever player outperformed their other human counterpart. The humans did not know which human opponent they were trying to outperform, so they couldn’t focus on adapting their play based on any prior experience with that player.

Darren Elias Again, 10,000 hands were played. Pluribus beat Elias for 40 mbb/game, and Ferguson for 25 mbb/game, which meant that Ferguson secured the $2,000 performance bonus.

Brown and Sandholm offered some thoughts on what Pluribus’ approach to playing six-max no-limit hold’em can tell human players about strategy.

“Pluribus confirms the conventional human wisdom that limping (calling the big blind rather than folding or raising) is sub optimal for any player except the small blind player who already has half the big blind in the pot by the rules, and thus has to invest only half as much as the other players to call,” they said. “While Pluribus initially experimented with limping when computing its blueprint strategy offline through self play, it gradually discarded this action from its strategy as self play continued. However, Pluribus disagrees with the folk wisdom that “donk betting” (leading out when you were not the preflop aggressor) is a mistake; Pluribus does this far more often than professional humans do.”

With one of the final major milestones for poker bots being surpassed, researchers like Brown and Sandholm may start devoting more time to other applications of AI. In January of 2019 it was announced that Sandholm was working on applying what he had learned through poker to new incomplete-information problems, such as war simulations, military strategy analysis, and in the commercial marketplace. But who knows? Perhaps he will be back and working on a bot that crushes full-ring Badugi in the near future.

Source link