This article is produced by NetEase Smart Studio (public number smartman 163). Focus on AI and read the next big era!
[Netease Technology News October 24 news] In 2016, as one of the best Go players in the world, Li Shishi lost to AlphaGo in a 4 to 1 game in Seoul. Whether in the history of Go or in the history of artificial intelligence (AI), this is a major event. The position of Go in Chinese, Korean, and Japanese culture is as important as chess in Western culture.
After defeating Li Shishi, AlphaGo defeated dozens of famous human players in an online series of anonymous games, and then reappeared in May to deal with Ke Jie, a Wuqi Go player from China. But Mr. Ke's performance was not better than that of Mr. Li. He eventually lost to the computer with a 3-0 score.
For artificial intelligence researchers, Go is also respected. Chess appeared on a computer in 1997. Garry Kasparov fought against an IBM computer called Deep Blue and lost the game. However, before Li Shishi's failure, the complexity of Go made it difficult to perform on the machine. AlphaGo's victory was very compelling. It fully demonstrated an artificial intelligence called "machine learning". The goal was to let the computer teach itself some complex tasks.
AlphaGo learns rules and tactics from these games by studying thousands of confrontations between human expert chess players, and then continues to improve in millions of games to learn Go. This is enough to make it stronger than any human being. But AlphaGo's company, DeepMind researchers believe they can improve this technology. In a paper just published in Nature, they released the latest version of "AlphaGo Zero." It performs better in games, learns faster, and requires less computing hardware to do well. However, the most important thing is that unlike the original, AlphaGo Zero successfully learned the game without asking human experts for help.
This technology immediately attracted a lot of attention. Like many games, learning Go is easy, but it is hard to play well. Two players with black and white players took turns placing chess pieces on a chessboard consisting of 19 vertical lines and 19 horizontal lines. The goal is to occupy more territory than the opponent. The pieces that are surrounded by opponents will be removed from the board. The player continues to move until both parties are reluctant to continue. Then, each person adds his number of pieces to the intersection of the empty grid around. In the end, a large number will become winners.
Difficulty comes from a variety of possible moves. There are 361 different places on the 19x19 board, and the black one can place the pieces first. Then, there are 360 ​​possible moves for White. The total number of moves on the board is 10170. This number is too large to be able to perform any physical analogy (for example, there are approximately 1080 atoms in the observable universe).
And human experts are committed to understanding this game at a higher level. Go rules are simple but there will be a lot of different situations. Players talk about chess games such as "eyes" and "ladders", as well as concepts such as "threat" and "life and death." However, although human players understand these concepts, it is much more difficult to interpret computer programs in a hypertext way. On the contrary, the original Alpha Go studied thousands of examples of personal games, a process known as "supervised learning." Since human games reflect human understanding of such concepts, a computer that touches a chess game can understand these concepts. Once AlphaGo had mastered tactics and tactics with the help of human teachers, it overcame many obstacles and began to participate in a million unsupervised training games. Each game improved its skills.
Supervised learning is more useful than Go. This is the basic idea behind recent advances in the field of artificial intelligence. It helps computers learn to do things like identify faces in photos, reliably recognize human speech, and effectively filter spam in emails. However, as Deepmind boss Demis Hassabis said, supervised learning is limited. It depends on the availability of training data and provides data to the computer to show the machine what it should do. These data must be filtered by human experts. For example, facial recognition training data consists of thousands of pictures. Some pictures have faces, while others do not, and each picture requires artificial annotation. This makes the cost of such data very high, provided that they are available. And, as the paper points out, there may be some more subtle issues here. Relying on the guidance of human experts may limit human limitations on computer capabilities.
The "AlphaGo Zero" was originally designed to avoid all of these problems and to completely skip the "train wheel" stage. The development of this project makes use of the rules of the game and the "reward function", that is, when it wins the game, it rewards a little, and if it loses, it deducts a little. And then continue to experiment, repeatedly through the game against other versions of yourself, and subject to the reward mechanism, that is, must win as many rewards as possible, so as to maximize rewards.
This project started with randomly placing pieces, and the machine had no idea what it was doing. But it has made rapid progress. One day later, its chess skills rose to the level of senior experts. Two days later, its performance surpassed that of the defeat of Li Shishi in 2016.
DeepMind researchers can observe their self-renovation and rediscover the knowledge of Go that humans have accumulated over thousands of years. Sometimes it looks as strange as humans. After about three hours, focusing on the training of “striking chess pieces†is a stage that most human beginners must also experience. To others, this is obviously an alien. For example, a "ladder" is a sort of chess piece arrangement. When a player attempts to capture a group of opponent's pieces, he will be placed on the diagonal on the board. They are a common situation for Go games. Because the ladder consists of a simple repeating pattern, human novices will soon learn to infer them and evaluate the success of the ladder's construction. But AlphaGo Zero - it can not be inferred, but try new actions semi-randomly - it took longer than expected to master this technique.
However, learning from oneself rather than relying on human cues is, on the whole, a big step forward. For example, josek is a special sequence of actions that occur near the edge of a board. (Their scripts naturally made them a bit like the beginning of chess.) AlphaGo Zero discovered the rules that josek taught to human players. However, it also found some methods that are entirely his own and eventually became their first choice for playing chess. David Silver, who is responsible for the AlphaGo project, said that this machine seems to have a distinctly non-human style.
The result is a project that is not only superhuman, but also unacceptable. Go (and chess, and many other games) can be quantified with something called "Elo Rating," which gives a probability that a player can beat another player based on past performance. A player has a 50:50 chance to beat his opponent, but only 25% is 200 points higher than his opponent. The support rate for Mr. Ke’s victory is 3661. Mr. Li's is 3526. After 40 days of training, AlphaGo Zero scored more than 5,000 - a number far ahead of super player Ke Jie, and implied that any human player including Ke Jie could not beat it. When it battled with AlphaGo's first defeat against Liski's version, it won 100 to 0.
Of course, there are many other things in life than Go. Its creators hope that algorithms like those that power AlphaGo's different iterations can theoretically be applied to similar tasks. (DeepMind has used the technology behind AlphaGo to help Google drastically reduce energy consumption in its data centers.) However, an algorithm that can be learned without the guidance of others means that the machine can let go without people knowing how to solve the problem. Hassabis said that anything that can be attributed to intelligent search through a large number of possibilities can benefit from AlphaGo's approach. He cited some classic thorny issues, such as studying how proteins fold into their final functional shape, predicting which molecules might act as drugs, or accurately mimicking chemical reactions.
Advances in artificial intelligence often raise concerns about the degradation of humans. DeepMind hopes that this type of machine will eventually become a biological brain's assistant, rather than replacing them, just as it does from search engines to paper. After all, a machine invents new ways to solve problems that can push people to a new, efficient road. Mr. Silver said that one of the benefits of AlphaGo is that, in a game full of history and tradition, it encourages human players to question ancient wisdom and conduct experiments. After losing to AlphaGo, Ke Jie studied the computer matrix to find inspiration. After that, he defeated human opponents with a 22-game winning streak. This is an impressive feat, even for his opponents. After all, supervised learning is two-way.
(From: economists. Translation: Netease's translation robot reviewer: Qin Hao)
Pay attention to NetEase smart public number (smartman163), get the latest report of artificial intelligence industry.
Tr90 Glasses,Tr90 Perspective Glasses,Tr90 Sunglasses,Tr90 Material Glasses
Danyang Hengshi Optical Glasses Co., Ltd. , https://www.hengshi-optical.com