Machine Learning in Action : เริ่มต้นด้วยเกมเล็กๆอย่าง Tic Tac Toe

Nov 23, 2019 by Spaggiari in Articles

มีไอเดียอยากลองทำเกมง่ายๆ ที่สามารถเรียนรู้ หรือพัฒนาตนเองได้ด้วยประสบการณ์ที่ตัวมันได้พบเจอมา ซึ่งมันก็มีหลายเทคนิคที่มีใช้กันอยู่ และที่เห็นว่าน่าสนใจก็เป็น Q-Learning และ Minimax ทั้งสองเทคนิคล้วนแล้วแต่เป็น State–Action–Reward–State คือ เริ่มต้นคิดจาก State ปัจจุบัน เพื่อหา Action และประเมินผลเป็น Reward จาก State อีกที ซึ่งเหมาะสมมากที่จะนำมาใช้ในระบบเกม

ส่วนเกมที่สนใจจะนำมาทดลองนั้น ก็เป็นเกมง่ายๆ อย่างเกม Tic Tac Toe ที่มีรูปแบบไม่เยอะมากนัก กฎกติกาไม่ซับซ้อน และเป็นการแข่งขันกันของสองผู้เล่น ทำให้ระบบสามารถเรียนรู้ หรือเลียนแบบ วิธีการเล่นของฝั่งตรงข้ามได้โดยง่าย ผ่าน State ที่ได้บันทึกไว้ก่อนหน้านี้เท่านั้นเอง

ข้อเสียของวิธีการใช้ State แบบนี้คือ รูปแบบในการเล่นมักจะถูกจำกัดอยู่เฉพาะจากประสบการณ์ที่ผ่านมาเท่านั้น แม้จะเป็นการเลือกวิธีการเล่นที่ดีที่สุด แต่ก็จะดีที่สุดเท่าที่รู้อยู่เท่านั้น ดังนั้นถ้าประสบการณ์ของมันเต็มไปด้วยวิธีการที่ไม่มีคุณภาพ หรือไร้ประสิทธิภาพแล้ว ตัวมันก็เองก็จะมีพฤติกรรมที่ด้อยคุณภาพไปด้วยเช่นกัน

เริ่มต้นออกแบบ

เราจะแบ่งระบบออกเป็น 3 ส่วนใหญ่ๆ ดังนั้น

ส่วนของระบบเกม การจัดการแสดงตาราง การจัดการ Turn ของผู้เล่น รับค่า Input จากคีย์บอร์ด การกำหนดผู้เริ่มเล่น
ส่วนของการตัดสินเกม ระบบนี้จะทำงานเมื่อเกมสิ้นสุดลง และตัดสินผลของเกม
ส่วนของตัว Machine ที่สามารถเรียนรู้ และตัดสินใจในการเล่นแข่งขันกับผู้เล่นที่เป็นมนุษย์

ระบบเกมส์

เริ่มที่การกำหนดวิธีการเล่น และการใช้งานของเกมส์นี้
โดยเราจะเล่นผ่าน Console Terminal เป็นหลัก เพื่อลดภาระงานการออกแบบ Graphic ที่ไม่ใช่จุดประสงค์หลักของสิ่งที่เรากำลังจะทำนี้ ทำให้สภาวะแวดล้อมของเกมส์จะมีลักษณะประมาณนี้ครับ

พื้นที่ในเกมส์ จะเป็นตารางขนาด 3×3 ช่อง รวมแล้วมี 9 ช่อง: กำหนดให้ค่า 1 – 9 แทนตำแหน่งของแต่ละช่องของตาราง
สถานะของแต่ละช่อง มี 3 สถานะ คือ X , O และ ว่าง: กำหนดให้ X มีค่าเป็น 1 , O มีค่าเป็น -1 และสถานะว่างมีค่าเป็น 0 เพื่อให้สามารถทำการคำนวณทางคณิตศาสตร์ได้ง่ายมากขึ้น
กำหนดให้ X เป็นผู้เริ่มเล่นก่อนเสมอ เพื่อให้ง่ายต่อการสร้าง State ให้กับบอต
มีการคำนวนส่วนตัดสินเกมทุกครั้งที่จบ Turn
แสดงตารางในสภาวะปัจจุบันหลังจบ Turn

การตัดสินเกม

ใช้ผลรวมของผลสัมบูรณ์ในแต่ละหลัก ถ้าหากเท่ากับ 3 ก็จะถือว่ามีผู้ชนะ
แต่หากไม่มีผู้แพ้ชนะ และไม่มีช่องตารางว่างเหลืออยู่แล้ว ก็จะให้ผลเสมอ
นอกจากเงื่อนไขนี้จะถือว่าเกมยังไม่จบลง

คำนวนผลจากตาราง ณ ปัจจุบัน
ให้ผลว่าใครเป็นผู้ชนะ เมื่อ X เป็นผู้ชนะ ให้ผล 1 , เมื่อ O เป็นผู้ชนะ ให้ผล -1 , เสมอให้ผลเป็น 0 และถ้าเกมส์ยังไม่จบ ให้ผล None เพื่อให้ระบบเกมทำงานต่อไป

ระบบ Machine

ใช้ วิธี Minimax ในการประเมินสถานการณ์และตัดสินใจว่า Machine ควรจะตอบสนองต่อสถานการณ์นั้นอย่างไร

บันทึกตารางในแต่ละสถานะ
สร้างความสัมพันธ์ระหว่างสถานะ
คำนวนค่าความสัมพันธ์ระหว่างสถานะ
ตัดสินใจเลือกการกระทำจากค่าความสัมพันธ์ระหว่างสถานการณ์ปัจจุบันกับสถานะถัดไป
เมื่อไม่มีสถานะที่เหมาะสม ให้ทำการสุ่มเล่น

ระบบตัดสินเกม

เป็นฟังก์ชันง่ายๆ ที่เริ่มโดยรับค่าตาราง ณ เวลาปัจจุบันเข้ามา
เริ่มต้นตรวจสอบว่ามีช่องว่างเหลืออยู่บนตารางหรือไม่

    for i in range(9):
        if game_state[i] == 0:
            draw_flag = 1

for i in range(9):

if game_state[i] == 0:

draw_flag = 1

แล้วคำนวนตามแกนต่างๆ ว่ามีแกนใดที่มีหมากของผู้เล่นวางเรียงกัน ทำให้มีผลรวมของค่าสัมบูรณ์แล้วเท่ากับ 3 บ้าง ถ้าหากมีก็ส่งค่าของผู้ชนะกลับไป

    if (abs(game_state[0] + game_state[1] + game_state[2]) == 3):
        return game_state[0], "Done"
    if (abs(game_state[3] + game_state[4] + game_state[5]) == 3):
        return game_state[3], "Done"
    if (abs(game_state[6] + game_state[7] + game_state[8]) == 3):
        return game_state[6], "Done"

    if (abs(game_state[0] + game_state[3] + game_state[6]) == 3):
        return game_state[0], "Done"
    if (abs(game_state[1] + game_state[4] + game_state[7]) == 3):
        return game_state[1], "Done"
    if (abs(game_state[2] + game_state[5] + game_state[8]) == 3):
        return game_state[2], "Done"
    
    if (abs(game_state[0] + game_state[4] + game_state[8]) == 3):
        return game_state[4], "Done"
    if (abs(game_state[2] + game_state[4] + game_state[6]) == 3):
        return game_state[4], "Done"

if (abs(game_state[0] + game_state[1] + game_state[2]) == 3):

return game_state[0], "Done"

if (abs(game_state[3] + game_state[4] + game_state[5]) == 3):

return game_state[3], "Done"

if (abs(game_state[6] + game_state[7] + game_state[8]) == 3):

return game_state[6], "Done"

if (abs(game_state[0] + game_state[3] + game_state[6]) == 3):

return game_state[0], "Done"

if (abs(game_state[1] + game_state[4] + game_state[7]) == 3):

return game_state[1], "Done"

if (abs(game_state[2] + game_state[5] + game_state[8]) == 3):

return game_state[2], "Done"

if (abs(game_state[0] + game_state[4] + game_state[8]) == 3):

return game_state[4], "Done"

if (abs(game_state[2] + game_state[4] + game_state[6]) == 3):

return game_state[4], "Done"

แต่หากว่ายังไม่มีผู้ชนะ แต่ไม่มีช่องว่างบนตารางเหลืออยู่แล้ว จะถือว่าเกมนี้จบลงด้วยผลเสมอ ค่าที่ส่งคือ 0

     if draw_flag is 0:
        return 0, "Done"

1 2	if draw_flag is 0: return 0, "Done"

ถ้าหากยังไม่มีสถานการณ์ที่เป็นไปตามเงื่อนไขใดเลย ถือว่าเกมยังสิ้นสุดจะส่งค่า None

return None, "Not Done"

1	return None, "Not Done"

นี่คือฟังก์ชันที่เราจะใช้ตรวจสอบ และตัดสินเกม Tic – Tac – Toe ของเราครับ

 def check_current_state(game_state):
    draw_flag = 0
    for i in range(9):
        if game_state[i] == 0:
            draw_flag = 1
    
    if (abs(game_state[0] + game_state[1] + game_state[2]) == 3):
        return game_state[0], "Done"
    if (abs(game_state[3] + game_state[4] + game_state[5]) == 3):
        return game_state[3], "Done"
    if (abs(game_state[6] + game_state[7] + game_state[8]) == 3):
        return game_state[6], "Done"

    if (abs(game_state[0] + game_state[3] + game_state[6]) == 3):
        return game_state[0], "Done"
    if (abs(game_state[1] + game_state[4] + game_state[7]) == 3):
        return game_state[1], "Done"
    if (abs(game_state[2] + game_state[5] + game_state[8]) == 3):
        return game_state[2], "Done"
    
    if (abs(game_state[0] + game_state[4] + game_state[8]) == 3):
        return game_state[4], "Done"
    if (abs(game_state[2] + game_state[4] + game_state[6]) == 3):
        return game_state[4], "Done"
    
    if draw_flag is 0:
        return 0, "Done"
        
    return None, "Not Done"

def check_current_state(game_state):

draw_flag = 0

for i in range(9):

if game_state[i] == 0:

draw_flag = 1

if (abs(game_state[0] + game_state[1] + game_state[2]) == 3):

return game_state[0], "Done"

if (abs(game_state[3] + game_state[4] + game_state[5]) == 3):

return game_state[3], "Done"

if (abs(game_state[6] + game_state[7] + game_state[8]) == 3):

return game_state[6], "Done"

if (abs(game_state[0] + game_state[3] + game_state[6]) == 3):

return game_state[0], "Done"

if (abs(game_state[1] + game_state[4] + game_state[7]) == 3):

return game_state[1], "Done"

if (abs(game_state[2] + game_state[5] + game_state[8]) == 3):

return game_state[2], "Done"

if (abs(game_state[0] + game_state[4] + game_state[8]) == 3):

return game_state[4], "Done"

if (abs(game_state[2] + game_state[4] + game_state[6]) == 3):

return game_state[4], "Done"

if draw_flag is 0:

return 0, "Done"

return None, "Not Done"

ระบบเกม

ฟังก์ชันการแสดงผลตาราง

เนื่องจากในระบบ เราต้องการให้ระบบสามารถคำนวณผลได้โดยง่าย บันทึกตารางการเล่นสำหรับให้ Machine ได้เรียนรู้ได้ง่ายขึ้น เราจึงเก็บตารางในรูปแบบ List ขนาด 9 ช่อง
และใช้ตัวเลขแทนสถานะบนตาราง
แต่มันจะยุ่งยากเกินไปถ้าหากนำตารางแบบนั้นมาแสดงเพื่อการใช้งาน ดังนั้นจึงต้องสร้างการแสดงผลขึ้นมาใหม่ เพื่อให้ง่ายต่อการใช้งานมากขึ้น โดยเราอยากได้การแสดงผลตารางอยู่ในรูปแบบนี้

เพื่อความสะดวก จึงสร้างเป็นฟังก์ชันง่ายๆ ขึ้นมา เพื่อใช้ในการแสดงตารางแทน
โดยการแทนค่าตัวเลขที่อยู่ในลิสต์ ด้วยตัวอักษร X , O และช่องว่าง ในตำแหน่งที่ถูกต้องแล้วนำมันมาแสดงผล

ตัวแปร List สำหรับเก็บค่าไว้แสดงผล

    show_state =    [   ' ',' ',' ',
                        ' ',' ',' ',
                        ' ',' ',' ' ]

show_state = [ ' ',' ',' ',

' ',' ',' ',

' ',' ',' ' ]

วนแทนค่าจากตารางในระบบ ให้เป็นตารางสำหรับแสดงผล

     for pos in range(len(game_state)):
        show_state[pos] = dict_state[game_state[pos]]

1 2	for pos in range(len(game_state)): show_state[pos] = dict_state[game_state[pos]]

ส่วนแสดงผลผ่าน Terminal

     print('----------------')
    print('| {} || {} || {} |'.format(show_state[0],show_state[1],show_state[2]))
    print('----------------')
    print('| {} || {} || {} |'.format(show_state[3],show_state[4],show_state[5]))
    print('----------------')
    print('| {} || {} || {} |'.format(show_state[6],show_state[7],show_state[8]))
    print('----------------')

print('----------------')

print('| {} || {} || {} |'.format(show_state[0],show_state[1],show_state[2]))

print('----------------')

print('| {} || {} || {} |'.format(show_state[3],show_state[4],show_state[5]))

print('----------------')

print('| {} || {} || {} |'.format(show_state[6],show_state[7],show_state[8]))

print('----------------')

ฟังก์ชันรับค่าจากคีย์บอร์ด

ฟังก์ชันนี้จะรับค่าตำแหน่ง 1 – 9 จากคีย์บอร์ด และตรวจสอบว่าค่าที่ได้รับเหมาะสมและใช้งานได้หรือไม่ ก่อนที่จะส่งค่าตำแหน่งที่เล่นคืนกลับสู่ระบบเกม เพื่อให้เกมดำเนินต่อไป

ส่วนการตรวจสอบช่องว่างบนตารางที่สามารถเล่นได้

    for pos in range(len(state)):
        if state[pos] == 0:
            aval_block.append(pos+1)
    print("get aval_block : {}".format(aval_block))

for pos in range(len(state)):

if state[pos] == 0:

aval_block.append(pos+1)

print("get aval_block : {}".format(aval_block))

ส่วนของการวนรับค่า และตรวจสอบข้อมูลที่ได้รับว่าใช้งานได้หรือไม่

     while cont:
        blockInput = input("Human !!!! , your turn! Choose where to place {} to {}: ".format(dict_state[playas],aval_block))
        if blockInput.isdigit(): 
            block_choice = int(blockInput)
            if block_choice not in aval_block:
                print("Please insert only {}".format(aval_block))
            else:
                play_block = block_choice
                cont = False

while cont:

blockInput = input("Human !!!! , your turn! Choose where to place {} to {}: ".format(dict_state[playas],aval_block))

if blockInput.isdigit():

block_choice = int(blockInput)

if block_choice not in aval_block:

print("Please insert only {}".format(aval_block))

else:

play_block = block_choice

cont = False

ฟังก์ชันการเคลียร์ Terminal

ใช้เพื่อล้างข้อความที่แสดงอยู่บน Terminal เพื่อให้สามารถแสดงข้อความอื่นขึ้นมาแทนที่ได้ ซึ่งคำสั่งล้างข้อความนี้ บนระบบวินโดวส์กับลินุกส์ จะใช้คำสั่งแตกต่างกันครับ

 # define our clear function 
def scnclear(): 
    # for windows 
    if name == 'nt': 
        _ = system('cls') 
    # for mac and linux(here, os.name is 'posix') 
    else: 
        _ = system('clear')

# define our clear function

def scnclear():

# for windows

if name == 'nt':

_ = system('cls')

# for mac and linux(here, os.name is 'posix')

else:

_ = system('clear')

ฟังก์ชันการอัปเดตตาราง

เพราะระบบของเกมจะต้องรองรับทั้งค่าที่ได้รับจากคีย์บอร์ด และค่าที่ได้จาก Machine
ดังนั้นจึงทำฟังก์ชันขึ้นมา เพื่อให้ทั้งสองวิธีสามารถส่งผ่านตำแหน่งการเล่นของตนไปสู่ตารางเดียวกันได้

 def play_move(state, player, block_num):
    if state[int(block_num-1)] == 0:
        state[int(block_num-1)] = dict_player[player]

def play_move(state, player, block_num):

if state[int(block_num-1)] == 0:

state[int(block_num-1)] = dict_player[player]

ระบบเกมหลัก

เกมจะเริ่มต้นที่ การเลือกว่าจะให้ Machine เล่นกับใคร โดยมี 3 ตัวเลือกคือ

You VS Machine
Random VS Machine
Machine VS Machine

โค้ดส่วนการเลือก Mode ของเกม ซึ่งจะวนถาม/รับค่า ไปจนกว่าคำตอบนั้นจะเหมาะสมกับตัวเลือกที่มีให้

     while play_mode not in pMode:
        print('''Choose Play Mode :
                Machine has {} states
                [1] : You VS Machine
                [2] Random VS Machine
                [3] Machine VS Machine'''.format(len(mk.state_action)))
        play_mode = input("Please Choose [1-3]")
        print("PlayMode : {}".format(play_mode))

while play_mode not in pMode:

print('''Choose Play Mode :

Machine has {} states

[1] : You VS Machine

[2] Random VS Machine

[3] Machine VS Machine'''.format(len(mk.state_action)))

play_mode = input("Please Choose [1-3]")

print("PlayMode : {}".format(play_mode))

เมื่อเลือก Mode เรียบร้อย เกมจะให้เราเลือกว่า เราจะเป็นฝ่ายเริ่มก่อนหรือไม่
ถ้าเราจะเริ่มเล่นก่อนให้ตอบ Y แล้วกด Enter

โค้ดที่ใช้รับค่าและตรวจสอบค่าที่ได้รับ ซึ่งจะทำงานก็ต่อเมื่อเราเล่นใน Mode ในข้อ 1 เท่านั้น

        print("New Game! {}".format(playCount))      
        print_board(game_state)
        if play_mode == '1':
            player_choice = input("X is play first , Do you wanna play first [Y/N] : ") 
        winner = None
        
        if player_choice == 'Y' or player_choice == 'y':
            # player_choice = 'X'
            mk.playFirst = False
        else:
            # player_choice = 'O'
            mk.playFirst = True

print("New Game! {}".format(playCount))

print_board(game_state)

if play_mode == '1':

player_choice = input("X is play first , Do you wanna play first [Y/N] : ")

winner = None

if player_choice == 'Y' or player_choice == 'y':

# player_choice = 'X'

mk.playFirst = False

else:

# player_choice = 'O'

mk.playFirst = True

โค้ดในส่วนที่ใช้จัดการลำดับการเล่นของผู้เล่นในแต่ละ Turn
โดยมีตัวแปล current_player เป็นตัวกำหนดว่ารอบเป็น Turn ของ X หรือ O
และในแต่ละ Turn จะถูกจัดการตาม Mode ที่ได้เลือกไว้ก่อนหน้านี้
ตารางจะอัพเดทผ่านฟังก์ชัน play_move ที่ถูกสร้างไว้ก่อนหน้า

         while current_state == "Not Done":
            print("current game_state : {}".format(game_state))
            if current_player == 1: # play X
                if play_mode == '1':
                    if mk.playFirst: 
                        block_choice = AI_turn(state = game_state, playas = current_player)
                    else:
                        block_choice = Human_turn(state = game_state, playas = current_player)
                elif play_mode == '3':
                    block_choice = AI_turn(state = game_state, playas = current_player)
                else:
                    if mk.playFirst: 
                        block_choice = AI_turn(state = game_state, playas = current_player)
                    else:
                        block_choice = random_turn(state = game_state, playas = current_player)
            else:   # play O
                if play_mode == '1':
                    if not mk.playFirst: 
                        block_choice = AI_turn(state = game_state, playas = current_player)
                    else:
                        block_choice = Human_turn(state = game_state, playas = current_player)
                elif play_mode == '3':
                    block_choice = AI_turn(state = game_state, playas = current_player)
                else:
                    if not mk.playFirst: 
                        block_choice = AI_turn(state = game_state, playas = current_player)
                    else:
                        block_choice = random_turn(state = game_state, playas = current_player)
            play_move(game_state ,dict_state[current_player], block_choice)

while current_state == "Not Done":

print("current game_state : {}".format(game_state))

if current_player == 1: # play X

if play_mode == '1':

if mk.playFirst: