[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake

프로젝트/인공지능2017. 8. 6. 17:46

뷰어
댓글로
이전글
다음글

인공지능 구현 에 대한 글입니다.(Deep Reinforcement Learning)

글의 순서는 아래와 같습니다.

================================================

요약

- 얼음얼린 호수에서 구멍에 빠지지 않고 길을 찾아나오는 게임임

- 얼음은 미끄럽습니다. 길을 안내해주는 사람의 말을 전적으로 의지할경우, 오히려 바람등에 의해(불확실한 환경)

미끄러질수가 있습니다. 따라서 약간만 의지하고, 나의 의지를 좀더 반영하는 방식으로 정확도를 높일수 있음 ( 1.5% ==> 66% 수준)

- 현실도 주변 환경에의해 예측이 불가능한 경우가 많습니다. 이럴경우에 적용이 가능한 방식임.

1.OpenAI gym 게임을 위한 프로그램 설치

2. # 01_play_frozenlake_det_windows

=> 화살키를 입력하는 방향으로 이동

=> 화일 실행은 터미널에서 한다(키 인을 받기위함).

=> frozenlake_det_windows.py 가 있는 폴더로 가서, python 명령어 실행

=> 홀에 빠지거나,끝으로 가면 게임이 종료된다.

3.# 03_0_q_table_frozenlake_det

=> 초기에는 임의의 장소로, 두번째 부터는 큰곳으로 이동, 홀을 피해 길을 찾는 알고리즘임

=> 참고(random argmax) : 같은면 아무곳이나, 큰곳이 있으면 큰곳으로 간다

4. # 03_2_q_table_frozenlake_det

==>e보다 작으면 임의의 장소로 가고,그렇치 않으면, 큰 리워드방향으로 이동한다(explot &exporation)

==> 이사간 동네에서 초기에는 랜덤하게 식당을 다니고, 파악이 다되면 맞집위주로 찾아간다게

==> 노이즈 값을 주어서, 기존의 data를 일부 반영하는 방법도 있음 ( 상기 e의 경우는 기존 data를 무시하됨)

. 차선책을 선택하는 방법임

==> discount(0.9) 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다.

최단거리를 찾는 방법임

5. play_frozenlake_windows

==> keyboard 인식이 잘 않됨, 추후 보완필요

==> 빙판길로, 키보드 조작대로 움직이지 않고, Q 선생의 말에 전적으로 의존하지 않는 상황을 의미함

==> 현실세계와 비슷한 환경을 구현하는것임.

6. #05_0_q_table_frozenlake

==>미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

==> 빙판에서, 기존과 같이 Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

7. # 05_q_table_frozenlake

==>미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

따라서 Q 형님의 말을 조금만 반영할 필요가 있음

==> Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

==> 정확도가 어느정도 상승함 (1.55% ==> 66% )

. 빙판에서, Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

8. Next Step

==> 신경망 (Neural Network)를 이용하여 Q-Learing 구현

9. 참고자료

=================================================

[ 1.OpenAI gym 게임을 위한 프로그램 설치 ]

- 설치가이드 :https://gym.openai.com/docs

- step 1

. anaconda3 prompt 실행

- step 2

. git clone https\\github.com/openai/gym ==> gym 다운받기

. cd gym ==> gym 폴더로 이동

. pip3 install -e . ==> minimal install , pip3로 해야함 , gym으로 다운받은 gym을 pc에 설치해 주는 과정임

- step3 ==> 해당 python 선택( 패키지 별로 python 설치경로가 틀려서, 해당 python을 찾아서 연결시켜 주어야함

. python 편집기(pycharm 프로그램 설정) 의 interpreter 변경

. 변경전 : tensorflow 폴더의 python

. 변경후 : c:\\user\dhp\appdata3\python.exe

- step4 ==> 패키지 추가 ==> tensorflow 패키지 설치가 필요할경우 pycharm 에서 설치가능함

. 우측상단의 " + " 버튼을 누르면, 설치가능한 패키지 목록이 나옵니다. 여기서 tesorflow를 선택해서 설치한다.

. 이로서 c:\\user\dhp\appdata3\python.exe의 python에는 gym 과 tesorflow가 동시에 설치됨

. 진행중 필요한 패키지는 상황에 맞게 추가 설치하면 됨.

- tensorflow 내의 python에 패키지 추가방법 추가확인 필요함. gym설치되어 있는데, 잘 동작하지 않고있음.

- 설치성공 여부 확인 : pycharm 화면에서 아래 코딩후 실행확인

# cartpolo test

"""cartpolo test

"""

import gym

env = gym.make('CartPole-v0')

env.reset()

for _ in range(100):

env.render()

env.step(env.action_space.sample()) # take a random action

for i_episode in range(20):

observation = env.reset()

for t in range(100):

env.render()

print(observation)

action = env.action_space.sample()

observation, reward, done, info = env.step(action)

if done:

print("Episode finished after {} timesteps".format(t+1))

break

[ # 01_play_frozenlake_det_windows ]

# 01_play_frozenlake_det_windows

"""

화일 실행은 터미널에서 한다(키 인을 받기위함).

frozenlake_det_windows.py 가 있는 폴더로 가서, python 명령어 실행

홀에 빠지거나,끝으로 가면 게임이 종료된다.

"""

import gym

from gym.envs.registration import register

from colorama import init

from kbhit import KBHit

init(autoreset=True) # Reset the terminal mode to display ansi color

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3') # is_slippery False

env.render() # Show the initial board

key = KBHit()

while True:

action = key.getarrow();

if action not in [0, 1, 2, 3]:

print("Game aborted!")

break

state, reward, done, info = env.step(action)

env.render()

print("State: ", state, "Action: ", action, "Reward: ", reward, "Info: ", info)

if done:

print("Finished with reward", reward)

break

[## 03_0_q_table_frozenlake_det ]

# 03_0_q_table_frozenlake_det

"""

# random argmax : 같은면 아무곳이나, 큰곳이 있으면 큰곳으로 간다

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

def rargmax(vector): # https://gist.github.com/stober/1943451

""" Argmax that chooses randomly among eligible maximum idices. """

m = np.amax(vector)

indices = np.nonzero(vector == m)[0]

return pr.choice(indices)

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n]) #16*4 사이즈임

# Set learning parameters

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = rargmax(Q[state, :]) # random argmax : 같은면 아무곳이자, 큰곳이 있으면 큰곳으로 간다

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using learning rate

Q[state, action] = reward + np.max(Q[new_state, :])

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

plt.bar(range(len(rList)), rList, color="blue")

#plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# 03_2_q_table_frozenlake_det]

# 03_2_q_table_frozenlake_det

"""

==> e보다 작으면 임의의 장소로 가고,그렇치 않으면, 큰 리워드방향으로 이동한다(explot &exporation)

==> 이사간 동네에서 초기에는 랜덤하게 식당을 다니고, 파악이 다되면 맞집위주로 찾아간다게

==> 노이즈 값을 주어서, 기존의 data를 일부 반영하는 방법도 있음 ( 상기 e의 경우는 기존 data를 무시하됨)

. 차선책을 선택하는 방법임

==> discount(0.9) 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다.

최단거리를 찾는 방법임

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

e = 1. / ((i // 100) + 1) # Python2 & 3

# 후반부로 갈수록 e 값은 작아짐

# The Q-Table learning algorithm

while not done:

# Choose an action by e-greedy

if np.random.rand(1) < e:

# e보다 작으면 임의의 장소로 가고

# 랜더만 방향으로 많이 가게되면 정확도가 떨어질수 있음

action = env.action_space.sample()

else:

# 그렇치 않으면, 큰 리워드방향으로 이동한다

action = np.argmax(Q[state, :])

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using decay rate

Q[state, action] = reward + dis * np.max(Q[new_state, :])

# 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다. 최단거리를 찾는 방법임

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

#plt.bar(range(len(rList)), rList, color="blue")

plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# play_frozenlake_windows]

# play_frozenlake_windows

"""keyboard 인식이 잘 않됨, 추후 보완필요

빙판길로, 키보드 조작대로 움직이지 않고, Q 선생의 말에 전적으로 의존하지 않는 상황을 의미함"""

import gym

from gym.envs.registration import register

from colorama import init

from kbhit import KBHit

init(autoreset=True) # Reset the terminal mode to display ansi color

env = gym.make('FrozenLake-v0') # is_slippery True

env.render() # Show the initial board

key = KBHit()

while True:

action = key.getarrow();

if action not in [0, 1, 2, 3]:

print("Game aborted!")

break

state, reward, done, info = env.step(action)

env.render()

print("State: ", state, "Action: ", action, "Reward: ", reward, "Info: ", info)

if done:

print("Finished with reward", reward)

break

[#05_0_q_table_frozenlake]

#05_0_q_table_frozenlake

"""

미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

. 빙판에서, 기존과 같이 Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

env = gym.make('FrozenLake-v0')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

learning_rate = .85

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) / (i + 1))

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using learning rate

Q[state, action] = reward + dis * np.max(Q[new_state, :])

state = new_state

rAll += reward

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

plt.bar(range(len(rList)), rList, color="blue")

#plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# 05_q_table_frozenlake]

# 05_q_table_frozenlake

"""

미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

따라서 Q 형님의 말을 조금만 반영할 필요가 있음

==> Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

==> 정확도가 어느정도 상승함 (1.55% ==> 66% )

. 빙판에서, Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

#env = gym.make('FrozenLake-v3')

env = gym.make('FrozenLake-v0')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

learning_rate = .85

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) / (i + 1))

# 노이즈 추가 ==>e 값 이용은, 처음부터 계산 , 노이지 값 이용은 기존값 반영 , 즉 차선책을 선택하는 방법임

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Q 형님의 말을 조금만 반영

# Update Q-Table with new knowledge using learning rate

Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

#plt.bar(range(len(rList)), rList, color="blue")

plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[참고자료]

https://www.inflearn.com/course/기본적인-머신러닝-딥러닝-강좌

https://github.com/hunkim/deeplearningzerotoall

https://www.tensorflow.org/api_docs/python/tf/layers

https://www.inflearn.com/course/reinforcement-learning/

저작자표시 비영리 변경금지

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #14 ] 인공지능/딥러닝 실전입문_데이터 확보하기 (0)	2017.08.12
[인공지능 #13 ] q_net_frozenlake / cartpole (0)	2017.08.07
[인공지능 #11 ] hello-rnn /char-seq-rnn /char-seq-softmax-only /rnn_long_char (0)	2017.08.06
[인공지능 #10 ]mnist_cnn/mnist_deep_cnn/mnist_cnn_class/mnist_cnn_layers/mnist_cnn_ensemble_layers (0)	2017.08.06
[인공지능 #9]mnist_softmax /mnist_nn/mnist_nn_xavier/mnist_nn_deep / mnist_nn_dropout (0)	2017.08.06

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

TechTogetWorld

[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake

'프로젝트 > 인공지능' 카테고리의 다른 글

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

최근에 받은 트랙백

링크

티스토리툴바