Vinsamlegast notið þetta auðkenni þegar þið vitnið til verksins eða tengið í það: https://hdl.handle.net/1946/44332
One of the main appeals of AlphaZero-style game-playing agents, which combine deep learning and Monte Carlo Tree Search, is that they can be trained autonomously without external expert-level domain knowledge. However, training such agents is generally computationally expensive, with the most computationally time-consuming step being generating training data via self-play. Here we propose an improved strategy for generating self-play training data, resulting in higher-quality samples, especially in earlier training phases. The new strategy initially emphasizes the latter game phases and gradually extends those phases to entire games as the training progresses. In our test domains, the games Connect4 and Breakthrough, we show that game-playing agents using the improved training approach learn significantly faster than counterpart agents using a standard approach. Furthermore, we empirically show that the proposed strategy is (in our test domains) superior to several recently proposed strategies for expediting self-play learning in game playing.
Skráarnafn | Stærð | Aðgangur | Lýsing | Skráartegund | |
---|---|---|---|---|---|
Expediting_Self_Play_Learning_in_AlphaZero_Style_Game_Playing_Agents.pdf | 969.3 kB | Opinn | Heildartexti | Skoða/Opna |