TechTogetWorld

[인공지능 #13 ] q_net_frozenlake / cartpole

프로젝트/인공지능2017. 8. 7. 15:52

뷰어
댓글로
이전글
다음글

인공지능 구현 에 대한 글입니다.(Deep Reinforcement Learning)

글의 순서는 아래와 같습니다.

================================================

요약

- 신경망 (Neural Network)즉 Q-NETWORK 를 이용하여 Q-Learing 구현( q-talble 형식의경우 메모리가 기하 급수적으로 필요해짐 )

==> 실생활에 적용하게에는 무리가 있음(q-table 방식) , 따라서 신경망 방식이 필요해짐

1.q_net_frozenlake

- Network 으로 변환

2. 07_3_dqn_2015_cartpole

- Q-NETWORk 이슈

1) 데이터가 너무 적어 정확도가 좋치 못하다. 2개의 데이터로 학습을 하게되면 전혀 다른 직선이 나오게 되는것이다

. 깊게(deep)

. experience replay : action후 버퍼에 상태,action등 을 저장한다 , 이후 random(골고루) 하게 샘플링해서 학습한다

2) 타겟이 흔들림 ( 같은 네트웍을 사용해서, 예측변경이 타겟치도 변경이 일어남) => 화살을 쏘자마자 과녁을 움직이는것 같음

. network을 하나더 만든다 ( 각자 업데이트 하다가, 학습전에 복사해서 합친다)

3. Next Step

==> 신경망 (Neural Network)를 이용하여 Q-Learing 구현

4. 참고자료

=================================================

[ 06_q_net_frozenlake ]

06_q_net_frozenlake

This code is based on

https://github.com/hunkim/DeepRL-Agents

'''

import gym

import numpy as np

import matplotlib.pyplot as plt

import time

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # default value = 0 From http://stackoverflow.com/questions/35911252/disable-tensorflow-debugging-information

import tensorflow as tf

env = gym.make('FrozenLake-v0')

# Input and output size based on the Env

input_size = env.observation_space.n;

output_size = env.action_space.n;

learning_rate = 0.1

# These lines establish the feed-forward part of the network used to choose actions

X = tf.placeholder(shape=[1, input_size], dtype=tf.float32) # state input

W = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01)) # weight

Qpred = tf.matmul(X, W) # Out Q prediction

Y = tf.placeholder(shape=[1, output_size], dtype=tf.float32) # Y label

loss = tf.reduce_sum(tf.square(Y-Qpred))

train = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)

# Set Q-learning parameters

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

def one_hot(x):

return np.identity(16)[x:x+1]

start_time = time.time()

init = tf.global_variables_initializer()

with tf.Session() as sess:

sess.run(init)

for i in range(num_episodes):

# Reset environment and get first new observation

s = env.reset()

e = 1. / ((i / 50) + 10)

rAll = 0

done = False

local_loss = []

# The Q-Table learning algorithm

while not done:

# Choose an action by greedly (with a chance of random action)

# from the Q-network

Qs = sess.run(Qpred, feed_dict={X: one_hot(s)})

if np.random.rand(1) < e:

a = env.action_space.sample()

else:

a = np.argmax(Qs)

# Get new state and reward from environment

s1, reward, done, _ = env.step(a)

if done:

# Update Q, and no Qs+1, since it's a termial state

Qs[0, a] = reward

else:

# Obtain the Q_s` values by feeding the new state through our network

Qs1 = sess.run(Qpred, feed_dict={X: one_hot(s1)})

# Update Q

Qs[0, a] = reward + dis*np.max(Qs1)

# Train our network using target (Y) and predicted Q (Qpred) values

sess.run(train, feed_dict={X: one_hot(s), Y: Qs})

rAll += reward

s = s1

rList.append(rAll)

print("--- %s seconds ---" % (time.time() - start_time))

print("Success rate: " + str(sum(rList) / num_episodes))

#plt.bar(range(len(rList)), rList, color="blue")

plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[07_3_dqn_2015_cartpole]

07_3_dqn_2015_cartpole

This code is based on

https://github.com/hunkim/DeepRL-Agents

CF https://github.com/golbin/TensorFlow-Tutorials

https://github.com/dennybritz/reinforcement-learning/blob/master/DQN/dqn.py

Q-NETWOR 이슈

1. 데이터가 너무 적어 정확도가 좋치 못하다. 2개의 데이터로 학습을 하게되면 전혀 다른 직선이 나오게 되는것이다

- 깊게(deep)

- experience replay : action후 버퍼에 상태,action등 을 저장한다 , 이후 random(골고루) 하게 샘플링해서 학습한다

2. 타겟이 흔들림 ( 같은 네트웍을 사용해서, 예측변경이 타겟치도 변경이 일어남) => 화살을 쏘자마자 과녁을 움직이는것 같음

- network을 하나더 만든다

"""

import numpy as np

import tensorflow as tf

import random

from collections import deque

from dqn import dqn

import gym

from gym import wrappers

env = gym.make('CartPole-v0')

# Constants defining our neural network

input_size = env.observation_space.shape[0]

output_size = env.action_space.n

dis = 0.9

REPLAY_MEMORY = 50000

def replay_train(mainDQN, targetDQN, train_batch):

x_stack = np.empty(0).reshape(0, input_size)

y_stack = np.empty(0).reshape(0, output_size)

# Get stored information from the buffer

for state, action, reward, next_state, done in train_batch:

Q = mainDQN.predic(state)

# terminal?

if done:

Q[0, action] = reward

else:

# get target from target DQN (Q')

Q[0, action] = reward + dis * np.max(targetDQN.predict(next_state))

y_stack = np.vstack([y_stack, Q])

x_stack = np.vstack( [x_stack, state])

# Train our network using target and predicted Q values on each episode

return mainDQN.update(x_stack, y_stack)

def ddqn_replay_train(mainDQN, targetDQN, train_batch):

#Double DQN implementation

#param mainDQN main DQN

#param targetDQN target DQN

#param train_batch minibatch for train

#return loss

x_stack = np.empty(0).reshape(0, mainDQN.input_size)

y_stack = np.empty(0).reshape(0, mainDQN.output_size)

# Get stored information from the buffer

for state, action, reward, next_state, done in train_batch:

Q = mainDQN.predict(state)

# terminal?

if done:

Q[0, action] = reward

else:

# Double DQN: y = r + gamma * targetDQN(s')[a] where

# a = argmax(mainDQN(s'))

Q[0, action] = reward + dis * targetDQN.predict(next_state)[0, np.argmax(mainDQN.predict(next_state))]

y_stack = np.vstack([y_stack, Q])

x_stack = np.vstack([x_stack, state])

# Train our network using target and predicted Q values on each episode

return mainDQN.update(x_stack, y_stack)

def get_copy_var_ops(*, dest_scope_name="target", src_scope_name="main"):

# Copy variables src_scope to dest_scope

op_holder = []

src_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=src_scope_name)

dest_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=dest_scope_name)

for src_var, dest_var in zip(src_vars, dest_vars):

op_holder.append(dest_var.assign(src_var.value()))

return op_holder

def bot_play(mainDQN, env=env):

# See our trained network in action

state = env.reset()

reward_sum = 0

while True:

env.render()

action = np.argmax(mainDQN.predict(state))

state, reward, done, _ = env.step(action)

reward_sum += reward

if done:

print("Total score: {}".format(reward_sum))

break

def main():

max_episodes = 5000

# store the previous observations in replay memory

replay_buffer = deque()

with tf.Session() as sess:

mainDQN = dqn.DQN(sess, input_size, output_size, name="main")

targetDQN = dqn.DQN(sess, input_size, output_size, name="target")

tf.global_variables_initializer().run()

#initial copy q_net -> target_net

copy_ops = get_copy_var_ops(dest_scope_name="target", src_scope_name="main")

sess.run(copy_ops)

for episode in range(max_episodes):

e = 1. / ((episode / 10) + 1)

done = False

step_count = 0

state = env.reset()

while not done:

if np.random.rand(1) < e:

action = env.action_space.sample()

else:

# Choose an action by greedily from the Q-network

action = np.argmax(mainDQN.predict(state))

# Get new state and reward from environment

next_state, reward, done, _ = env.step(action)

if done: # Penalty

reward = -100

# Save the experience to our buffer

replay_buffer.append((state, action, reward, next_state, done))

if len(replay_buffer) > REPLAY_MEMORY:

replay_buffer.popleft()

state = next_state

step_count += 1

if step_count > 10000: # Good enough. Let's move on

break

print("Episode: {} steps: {}".format(episode, step_count))

if step_count > 10000:

pass

##10,000이면 정지(무한루프방지)

# break

if episode % 10 == 1: # train every 10 episode

# Get a random batch of experiences

for _ in range(50):

minibatch = random.sample(replay_buffer, 10)

loss, _ = ddqn_replay_train(mainDQN, targetDQN, minibatch)

print("Loss: ", loss)

# copy q_net -> target_net

sess.run(copy_ops)

# See our trained bot in action

env2 = wrappers.Monitor(env, 'gym-results', force=True)

for i in range(200):

bot_play(mainDQN, env=env2)

env2.close()

# gym.upload("gym-results", api_key="sk_VT2wPcSSOylnlPORltmQ")

if __name__ == "__main__":

main()

[ 참고자료 ]

https://www.inflearn.com/course/기본적인-머신러닝-딥러닝-강좌

https://github.com/hunkim/deeplearningzerotoall

https://www.tensorflow.org/api_docs/python/tf/layers

https://www.inflearn.com/course/reinforcement-learning/

저작자표시 비영리 변경금지

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #15 ] 인공지능/딥러닝 실전입문_XOR/손글씨 맞추기 (0)	2017.08.14
[인공지능 #14 ] 인공지능/딥러닝 실전입문_데이터 확보하기 (0)	2017.08.12
[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake (0)	2017.08.06
[인공지능 #11 ] hello-rnn /char-seq-rnn /char-seq-softmax-only /rnn_long_char (0)	2017.08.06
[인공지능 #10 ]mnist_cnn/mnist_deep_cnn/mnist_cnn_class/mnist_cnn_layers/mnist_cnn_ensemble_layers (0)	2017.08.06

[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake

프로젝트/인공지능2017. 8. 6. 17:46

뷰어
댓글로
이전글
다음글

인공지능 구현 에 대한 글입니다.(Deep Reinforcement Learning)

글의 순서는 아래와 같습니다.

================================================

요약

- 얼음얼린 호수에서 구멍에 빠지지 않고 길을 찾아나오는 게임임

- 얼음은 미끄럽습니다. 길을 안내해주는 사람의 말을 전적으로 의지할경우, 오히려 바람등에 의해(불확실한 환경)

미끄러질수가 있습니다. 따라서 약간만 의지하고, 나의 의지를 좀더 반영하는 방식으로 정확도를 높일수 있음 ( 1.5% ==> 66% 수준)

- 현실도 주변 환경에의해 예측이 불가능한 경우가 많습니다. 이럴경우에 적용이 가능한 방식임.

1.OpenAI gym 게임을 위한 프로그램 설치

2. # 01_play_frozenlake_det_windows

=> 화살키를 입력하는 방향으로 이동

=> 화일 실행은 터미널에서 한다(키 인을 받기위함).

=> frozenlake_det_windows.py 가 있는 폴더로 가서, python 명령어 실행

=> 홀에 빠지거나,끝으로 가면 게임이 종료된다.

3.# 03_0_q_table_frozenlake_det

=> 초기에는 임의의 장소로, 두번째 부터는 큰곳으로 이동, 홀을 피해 길을 찾는 알고리즘임

=> 참고(random argmax) : 같은면 아무곳이나, 큰곳이 있으면 큰곳으로 간다

4. # 03_2_q_table_frozenlake_det

==>e보다 작으면 임의의 장소로 가고,그렇치 않으면, 큰 리워드방향으로 이동한다(explot &exporation)

==> 이사간 동네에서 초기에는 랜덤하게 식당을 다니고, 파악이 다되면 맞집위주로 찾아간다게

==> 노이즈 값을 주어서, 기존의 data를 일부 반영하는 방법도 있음 ( 상기 e의 경우는 기존 data를 무시하됨)

. 차선책을 선택하는 방법임

==> discount(0.9) 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다.

최단거리를 찾는 방법임

5. play_frozenlake_windows

==> keyboard 인식이 잘 않됨, 추후 보완필요

==> 빙판길로, 키보드 조작대로 움직이지 않고, Q 선생의 말에 전적으로 의존하지 않는 상황을 의미함

==> 현실세계와 비슷한 환경을 구현하는것임.

6. #05_0_q_table_frozenlake

==>미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

==> 빙판에서, 기존과 같이 Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

7. # 05_q_table_frozenlake

==>미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

따라서 Q 형님의 말을 조금만 반영할 필요가 있음

==> Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

==> 정확도가 어느정도 상승함 (1.55% ==> 66% )

. 빙판에서, Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

8. Next Step

==> 신경망 (Neural Network)를 이용하여 Q-Learing 구현

9. 참고자료

=================================================

[ 1.OpenAI gym 게임을 위한 프로그램 설치 ]

- 설치가이드 :https://gym.openai.com/docs

- step 1

. anaconda3 prompt 실행

- step 2

. git clone https\\github.com/openai/gym ==> gym 다운받기

. cd gym ==> gym 폴더로 이동

. pip3 install -e . ==> minimal install , pip3로 해야함 , gym으로 다운받은 gym을 pc에 설치해 주는 과정임

- step3 ==> 해당 python 선택( 패키지 별로 python 설치경로가 틀려서, 해당 python을 찾아서 연결시켜 주어야함

. python 편집기(pycharm 프로그램 설정) 의 interpreter 변경

. 변경전 : tensorflow 폴더의 python

. 변경후 : c:\\user\dhp\appdata3\python.exe

- step4 ==> 패키지 추가 ==> tensorflow 패키지 설치가 필요할경우 pycharm 에서 설치가능함

. 우측상단의 " + " 버튼을 누르면, 설치가능한 패키지 목록이 나옵니다. 여기서 tesorflow를 선택해서 설치한다.

. 이로서 c:\\user\dhp\appdata3\python.exe의 python에는 gym 과 tesorflow가 동시에 설치됨

. 진행중 필요한 패키지는 상황에 맞게 추가 설치하면 됨.

- tensorflow 내의 python에 패키지 추가방법 추가확인 필요함. gym설치되어 있는데, 잘 동작하지 않고있음.

- 설치성공 여부 확인 : pycharm 화면에서 아래 코딩후 실행확인

# cartpolo test

"""cartpolo test

"""

import gym

env = gym.make('CartPole-v0')

env.reset()

for _ in range(100):

env.render()

env.step(env.action_space.sample()) # take a random action

for i_episode in range(20):

observation = env.reset()

for t in range(100):

env.render()

print(observation)

action = env.action_space.sample()

observation, reward, done, info = env.step(action)

if done:

print("Episode finished after {} timesteps".format(t+1))

break

[ # 01_play_frozenlake_det_windows ]

# 01_play_frozenlake_det_windows

"""

화일 실행은 터미널에서 한다(키 인을 받기위함).

frozenlake_det_windows.py 가 있는 폴더로 가서, python 명령어 실행

홀에 빠지거나,끝으로 가면 게임이 종료된다.

"""

import gym

from gym.envs.registration import register

from colorama import init

from kbhit import KBHit

init(autoreset=True) # Reset the terminal mode to display ansi color

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3') # is_slippery False

env.render() # Show the initial board

key = KBHit()

while True:

action = key.getarrow();

if action not in [0, 1, 2, 3]:

print("Game aborted!")

break

state, reward, done, info = env.step(action)

env.render()

print("State: ", state, "Action: ", action, "Reward: ", reward, "Info: ", info)

if done:

print("Finished with reward", reward)

break

[## 03_0_q_table_frozenlake_det ]

# 03_0_q_table_frozenlake_det

"""

# random argmax : 같은면 아무곳이나, 큰곳이 있으면 큰곳으로 간다

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

def rargmax(vector): # https://gist.github.com/stober/1943451

""" Argmax that chooses randomly among eligible maximum idices. """

m = np.amax(vector)

indices = np.nonzero(vector == m)[0]

return pr.choice(indices)

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n]) #16*4 사이즈임

# Set learning parameters

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = rargmax(Q[state, :]) # random argmax : 같은면 아무곳이자, 큰곳이 있으면 큰곳으로 간다

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using learning rate

Q[state, action] = reward + np.max(Q[new_state, :])

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

plt.bar(range(len(rList)), rList, color="blue")

#plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# 03_2_q_table_frozenlake_det]

# 03_2_q_table_frozenlake_det

"""

==> e보다 작으면 임의의 장소로 가고,그렇치 않으면, 큰 리워드방향으로 이동한다(explot &exporation)

==> 이사간 동네에서 초기에는 랜덤하게 식당을 다니고, 파악이 다되면 맞집위주로 찾아간다게

==> 노이즈 값을 주어서, 기존의 data를 일부 반영하는 방법도 있음 ( 상기 e의 경우는 기존 data를 무시하됨)

. 차선책을 선택하는 방법임

==> discount(0.9) 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다.

최단거리를 찾는 방법임

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

env = gym.make('FrozenLake-v3')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

e = 1. / ((i // 100) + 1) # Python2 & 3

# 후반부로 갈수록 e 값은 작아짐

# The Q-Table learning algorithm

while not done:

# Choose an action by e-greedy

if np.random.rand(1) < e:

# e보다 작으면 임의의 장소로 가고

# 랜더만 방향으로 많이 가게되면 정확도가 떨어질수 있음

action = env.action_space.sample()

else:

# 그렇치 않으면, 큰 리워드방향으로 이동한다

action = np.argmax(Q[state, :])

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using decay rate

Q[state, action] = reward + dis * np.max(Q[new_state, :])

# 나중에 받을 리워드는 0.9를 곱해서 비중을 낮춘다. 최단거리를 찾는 방법임

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

#plt.bar(range(len(rList)), rList, color="blue")

plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# play_frozenlake_windows]

# play_frozenlake_windows

"""keyboard 인식이 잘 않됨, 추후 보완필요

빙판길로, 키보드 조작대로 움직이지 않고, Q 선생의 말에 전적으로 의존하지 않는 상황을 의미함"""

import gym

from gym.envs.registration import register

from colorama import init

from kbhit import KBHit

init(autoreset=True) # Reset the terminal mode to display ansi color

env = gym.make('FrozenLake-v0') # is_slippery True

env.render() # Show the initial board

key = KBHit()

while True:

action = key.getarrow();

if action not in [0, 1, 2, 3]:

print("Game aborted!")

break

state, reward, done, info = env.step(action)

env.render()

print("State: ", state, "Action: ", action, "Reward: ", reward, "Info: ", info)

if done:

print("Finished with reward", reward)

break

[#05_0_q_table_frozenlake]

#05_0_q_table_frozenlake

"""

미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

. 빙판에서, 기존과 같이 Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

env = gym.make('FrozenLake-v0')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

learning_rate = .85

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) / (i + 1))

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using learning rate

Q[state, action] = reward + dis * np.max(Q[new_state, :])

state = new_state

rAll += reward

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

plt.bar(range(len(rList)), rList, color="blue")

#plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[# 05_q_table_frozenlake]

# 05_q_table_frozenlake

"""

미끄러운 환경 ('FrozenLake-v0' ) 에서는 Q 형님의 조언을 그대로 따르면 않된다, 주변환경에 의해 미끄질수 있기때문임

따라서 Q 형님의 말을 조금만 반영할 필요가 있음

==> Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

==> 정확도가 어느정도 상승함 (1.55% ==> 66% )

. 빙판에서, Q 형님의 말에 전적으로 의존할 경우 1.55% , 일부 의존할경우 66%

"""

import gym

import numpy as np

import matplotlib.pyplot as plt

from gym.envs.registration import register

import random as pr

id='FrozenLake-v3',

entry_point='gym.envs.toy_text:FrozenLakeEnv',

kwargs={'map_name' : '4x4', 'is_slippery': False}

)

#env = gym.make('FrozenLake-v3')

env = gym.make('FrozenLake-v0')

# Initialize table with all zeros

Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set learning parameters

learning_rate = .85

dis = .99

num_episodes = 2000

# create lists to contain total rewards and steps per episode

rList = []

for i in range(num_episodes):

# Reset environment and get first new observation

state = env.reset()

rAll = 0

done = False

# The Q-Table learning algorithm

while not done:

action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) / (i + 1))

# 노이즈 추가 ==>e 값 이용은, 처음부터 계산 , 노이지 값 이용은 기존값 반영 , 즉 차선책을 선택하는 방법임

# Get new state and reward from environment

new_state, reward, done, _ = env.step(action)

# Q 형님의 말을 조금만 반영

# Update Q-Table with new knowledge using learning rate

Q[state, action] = (1-learning_rate) * Q[state, action] \

+ learning_rate*(reward + dis * np.max(Q[new_state, :]))

rAll += reward

state = new_state

rList.append(rAll)

print("Success rate: " + str(sum(rList) / num_episodes))

print("Final Q-Table Values")

print("LEFT DOWN RIGHT UP")

print(Q)

#plt.bar(range(len(rList)), rList, color="blue")

plt.bar(range(len(rList)), rList, color='b', alpha=0.4)

plt.show()

[참고자료]

https://www.inflearn.com/course/기본적인-머신러닝-딥러닝-강좌

https://github.com/hunkim/deeplearningzerotoall

https://www.tensorflow.org/api_docs/python/tf/layers

https://www.inflearn.com/course/reinforcement-learning/

저작자표시 비영리 변경금지

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #14 ] 인공지능/딥러닝 실전입문_데이터 확보하기 (0)	2017.08.12
[인공지능 #13 ] q_net_frozenlake / cartpole (0)	2017.08.07
[인공지능 #11 ] hello-rnn /char-seq-rnn /char-seq-softmax-only /rnn_long_char (0)	2017.08.06
[인공지능 #10 ]mnist_cnn/mnist_deep_cnn/mnist_cnn_class/mnist_cnn_layers/mnist_cnn_ensemble_layers (0)	2017.08.06
[인공지능 #9]mnist_softmax /mnist_nn/mnist_nn_xavier/mnist_nn_deep / mnist_nn_dropout (0)	2017.08.06

[인공지능 #11 ] hello-rnn /char-seq-rnn /char-seq-softmax-only /rnn_long_char

프로젝트/인공지능2017. 8. 6. 11:08

뷰어
댓글로
이전글
다음글

인공지능 구현에 대한 글입니다.

글의 순서는 아래와 같습니다.

================================================

1.#lab-12-1-hello-rnn

전 단계의 출력이 다음단계의 출력에 영향을 주는 경우 적용함

- 단어, 연관검색등..

2. # lab-12-2-char-seq-rnn

-rnn 적용 ==> 정확도 높음

. 49 loss: 0.000650434 Prediction: if you want you

. y값 if you want you

3. #lab-12-3-char-seq-softmax-only

rnn 미적용 ==> 정확도 미흡함

2999 loss: 0.277323 Prediction: yf you yant you

y값 if you want you

4. # lab-12-4-rnn_long_char

error : from __future__ import print_function ==> 실행불가로 주석처리함

MultiRNNCell 로 여러단을 만들면 , 정확도가 높아짐

softmax =>reshape 수행

5. # lab-12-5-rnn_stock_prediction"""

내일 주가 예측 : 기존의 7일의 data를 학습

그래프 인쇄않되고 있음.

6. 코드탐구(추가)

==>lab-12-5-rnn_stock_prediction

lab-13-1-mnist_using_scope

lab-13-2-mnist_tensorboard

lab-13-3-mnist_save_restore

7. 참고자료

=================================================

[ #lab-12-1-hello-rnn ]

#lab-12-1-hello-rnn

"""

전 단계의 출력이 다음단계의 출력에 영향을 주는 경우 적용함

- 단어, 연관검색등..

"""

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Lab 12 RNN

import tensorflow as tf

import numpy as np

tf.set_random_seed(777) # reproducibility

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello

x_data = [[0, 1, 0, 2, 3, 3]] # hihell

x_one_hot = [[[1, 0, 0, 0, 0], # h 0

[0, 1, 0, 0, 0], # i 1

[1, 0, 0, 0, 0], # h 0

[0, 0, 1, 0, 0], # e 2

[0, 0, 0, 1, 0], # l 3

[0, 0, 0, 1, 0]]] # l 3

y_data = [[1, 0, 2, 3, 3, 4]] # ihello

num_classes = 5

input_dim = 5 # one-hot size

hidden_size = 5 # output from the LSTM. 5 to directly predict one-hot

batch_size = 1 # one sentence

sequence_length = 6 # |ihello| == 6

learning_rate = 0.1

X = tf.placeholder(

tf.float32, [None, sequence_length, input_dim]) # X one-hot

Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label

cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size, state_is_tuple=True)

initial_state = cell.zero_state(batch_size, tf.float32)

outputs, _states = tf.nn.dynamic_rnn(

cell, X, initial_state=initial_state, dtype=tf.float32)

# FC layer

X_for_fc = tf.reshape(outputs, [-1, hidden_size])

# fc_w = tf.get_variable("fc_w", [hidden_size, num_classes])

# fc_b = tf.get_variable("fc_b", [num_classes])

# outputs = tf.matmul(X_for_fc, fc_w) + fc_b

outputs = tf.contrib.layers.fully_connected(

inputs=X_for_fc, num_outputs=num_classes, activation_fn=None)

# reshape out for sequence_loss

outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])

weights = tf.ones([batch_size, sequence_length])

sequence_loss = tf.contrib.seq2seq.sequence_loss(

logits=outputs, targets=Y, weights=weights)

loss = tf.reduce_mean(sequence_loss)

train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

prediction = tf.argmax(outputs, axis=2)

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

for i in range(50):

l, _ = sess.run([loss, train], feed_dict={X: x_one_hot, Y: y_data})

result = sess.run(prediction, feed_dict={X: x_one_hot})

print(i, "loss:", l, "prediction: ", result, "true Y: ", y_data)

# print char using dic

result_str = [idx2char[c] for c in np.squeeze(result)]

print("\tPrediction str: ", ''.join(result_str))

'''

0 loss: 1.71584 prediction: [[2 2 2 3 3 2]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: eeelle

1 loss: 1.56447 prediction: [[3 3 3 3 3 3]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: llllll

2 loss: 1.46284 prediction: [[3 3 3 3 3 3]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: llllll

3 loss: 1.38073 prediction: [[3 3 3 3 3 3]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: llllll

4 loss: 1.30603 prediction: [[3 3 3 3 3 3]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: llllll

5 loss: 1.21498 prediction: [[3 3 3 3 3 3]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: llllll

6 loss: 1.1029 prediction: [[3 0 3 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: lhlllo

7 loss: 0.982386 prediction: [[1 0 3 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: ihlllo

8 loss: 0.871259 prediction: [[1 0 3 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: ihlllo

9 loss: 0.774338 prediction: [[1 0 2 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: ihello

10 loss: 0.676005 prediction: [[1 0 2 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]]

Prediction str: ihello

...

'''

[# lab-12-2-char-seq-rnn ]

# lab-12-2-char-seq-rnn

"""

rnn 적용 ==> 정확도 높음

- 49 loss: 0.000650434 Prediction: if you want you

- y값 if you want you

"""

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Lab 12 Character Sequence RNN

import tensorflow as tf

import numpy as np

tf.set_random_seed(777) # reproducibility

sample = " if you want you"

idx2char = list(set(sample)) # index -> char

char2idx = {c: i for i, c in enumerate(idx2char)} # char -> idex

# hyper parameters

dic_size = len(char2idx) # RNN input size (one hot size)

hidden_size = len(char2idx) # RNN output size

num_classes = len(char2idx) # final output size (RNN or softmax, etc.)

batch_size = 1 # one sample data, one batch

sequence_length = len(sample) - 1 # number of lstm rollings (unit #)

learning_rate = 0.1

sample_idx = [char2idx[c] for c in sample] # char to index

x_data = [sample_idx[:-1]] # X data sample (0 ~ n-1) hello: hell

y_data = [sample_idx[1:]] # Y label sample (1 ~ n) hello: ello

X = tf.placeholder(tf.int32, [None, sequence_length]) # X data

Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label

x_one_hot = tf.one_hot(X, num_classes) # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0

cell = tf.contrib.rnn.BasicLSTMCell(

num_units=hidden_size, state_is_tuple=True)

initial_state = cell.zero_state(batch_size, tf.float32)

outputs, _states = tf.nn.dynamic_rnn(

cell, x_one_hot, initial_state=initial_state, dtype=tf.float32)

# FC layer

X_for_fc = tf.reshape(outputs, [-1, hidden_size])

outputs = tf.contrib.layers.fully_connected(X_for_fc, num_classes, activation_fn=None)

# reshape out for sequence_loss

outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])

weights = tf.ones([batch_size, sequence_length])

sequence_loss = tf.contrib.seq2seq.sequence_loss(

logits=outputs, targets=Y, weights=weights)

loss = tf.reduce_mean(sequence_loss)

train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

prediction = tf.argmax(outputs, axis=2)

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

for i in range(50):

l, _ = sess.run([loss, train], feed_dict={X: x_data, Y: y_data})

result = sess.run(prediction, feed_dict={X: x_data})

# print char using dic

result_str = [idx2char[c] for c in np.squeeze(result)]

print(i, "loss:", l, "Prediction:", ''.join(result_str))

'''

0 loss: 2.35377 Prediction: uuuuuuuuuuuuuuu

1 loss: 2.21383 Prediction: yy you y you

2 loss: 2.04317 Prediction: yy yoo ou

3 loss: 1.85869 Prediction: yy ou uou

4 loss: 1.65096 Prediction: yy you a you

5 loss: 1.40243 Prediction: yy you yan you

6 loss: 1.12986 Prediction: yy you wann you

7 loss: 0.907699 Prediction: yy you want you

8 loss: 0.687401 Prediction: yf you want you

9 loss: 0.508868 Prediction: yf you want you

10 loss: 0.379423 Prediction: yf you want you

11 loss: 0.282956 Prediction: if you want you

12 loss: 0.208561 Prediction: if you want you

...

'''

[#lab-12-3-char-seq-softmax-only]

#lab-12-3-char-seq-softmax-only

"""

rnn 미적용 ==> 정확도 미흡함

- 2999 loss: 0.277323 Prediction: yf you yant you

- y값 if you want you

"""

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Lab 12 Character Sequence Softmax only

import tensorflow as tf

import numpy as np

tf.set_random_seed(777) # reproducibility

sample = " if you want you"

idx2char = list(set(sample)) # index -> char

char2idx = {c: i for i, c in enumerate(idx2char)} # char -> idex

# hyper parameters

dic_size = len(char2idx) # RNN input size (one hot size)

rnn_hidden_size = len(char2idx) # RNN output size

num_classes = len(char2idx) # final output size (RNN or softmax, etc.)

batch_size = 1 # one sample data, one batch

sequence_length = len(sample) - 1 # number of lstm rollings (unit #)

learning_rate = 0.1

sample_idx = [char2idx[c] for c in sample] # char to index

x_data = [sample_idx[:-1]] # X data sample (0 ~ n-1) hello: hell

y_data = [sample_idx[1:]] # Y label sample (1 ~ n) hello: ello

X = tf.placeholder(tf.int32, [None, sequence_length]) # X data

Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label

# flatten the data (ignore batches for now). No effect if the batch size is 1

X_one_hot = tf.one_hot(X, num_classes) # one hot: 1 -> 0 1 0 0 0 0 0 0 0 0

X_for_softmax = tf.reshape(X_one_hot, [-1, rnn_hidden_size])

# softmax layer (rnn_hidden_size -> num_classes)

softmax_w = tf.get_variable("softmax_w", [rnn_hidden_size, num_classes])

softmax_b = tf.get_variable("softmax_b", [num_classes])

outputs = tf.matmul(X_for_softmax, softmax_w) + softmax_b

# expend the data (revive the batches)

outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])

weights = tf.ones([batch_size, sequence_length])

# Compute sequence cost/loss

sequence_loss = tf.contrib.seq2seq.sequence_loss(

logits=outputs, targets=Y, weights=weights)

loss = tf.reduce_mean(sequence_loss) # mean all sequence loss

train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

prediction = tf.argmax(outputs, axis=2)

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

for i in range(3000):

l, _ = sess.run([loss, train], feed_dict={X: x_data, Y: y_data})

result = sess.run(prediction, feed_dict={X: x_data})

# print char using dic

result_str = [idx2char[c] for c in np.squeeze(result)]

print(i, "loss:", l, "Prediction:", ''.join(result_str))

'''

0 loss: 2.29513 Prediction: yu yny y y oyny

1 loss: 2.10156 Prediction: yu ynu y y oynu

2 loss: 1.92344 Prediction: yu you y u you

2997 loss: 0.277323 Prediction: yf you yant you

2998 loss: 0.277323 Prediction: yf you yant you

2999 loss: 0.277323 Prediction: yf you yant you

'''

[# lab-12-4-rnn_long_char]

# lab-12-4-rnn_long_char

"""

error : from __future__ import print_function ==> 실행불가로 주석처리함

# MultiRNNCell 로 여러단을 만들면 , 정확도가 높아짐

# softmax =>reshape 수행

"""

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# from __future__ import print_function

import tensorflow as tf

import numpy as np

from tensorflow.contrib import rnn

tf.set_random_seed(777) # reproducibility

sentence = ("if you want to build a ship, don't drum up people together to "

"collect wood and don't assign them tasks and work, but rather "

"teach them to long for the endless immensity of the sea.")

char_set = list(set(sentence))

char_dic = {w: i for i, w in enumerate(char_set)}

data_dim = len(char_set)

hidden_size = len(char_set)

num_classes = len(char_set)

sequence_length = 10 # Any arbitrary number

learning_rate = 0.1

dataX = []

dataY = []

for i in range(0, len(sentence) - sequence_length):

x_str = sentence[i:i + sequence_length]

y_str = sentence[i + 1: i + sequence_length + 1]

print(i, x_str, '->', y_str)

x = [char_dic[c] for c in x_str] # x str to index

y = [char_dic[c] for c in y_str] # y str to index

dataX.append(x)

dataY.append(y)

batch_size = len(dataX)

X = tf.placeholder(tf.int32, [None, sequence_length])

Y = tf.placeholder(tf.int32, [None, sequence_length])

# One-hot encoding

X_one_hot = tf.one_hot(X, num_classes)

print(X_one_hot) # check out the shape

# Make a lstm cell with hidden_size (each unit output vector size)

def lstm_cell():

cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True)

return cell

multi_cells = rnn.MultiRNNCell([lstm_cell() for _ in range(2)], state_is_tuple=True)

# 위와 같이.MultiRNNCell 로 여러단을 만들면 , 정확도가 높아짐

# outputs: unfolding size x hidden size, state = hidden size

outputs, _states = tf.nn.dynamic_rnn(multi_cells, X_one_hot, dtype=tf.float32)

# softmax =>reshape 수행

# FC layer

X_for_fc = tf.reshape(outputs, [-1, hidden_size])

outputs = tf.contrib.layers.fully_connected(X_for_fc, num_classes, activation_fn=None)

# reshape out for sequence_loss

outputs = tf.reshape(outputs, [batch_size, sequence_length, num_classes])

# All weights are 1 (equal weights)

weights = tf.ones([batch_size, sequence_length])

sequence_loss = tf.contrib.seq2seq.sequence_loss(

logits=outputs, targets=Y, weights=weights)

mean_loss = tf.reduce_mean(sequence_loss)

train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(mean_loss)

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(500):

_, l, results = sess.run(

[train_op, mean_loss, outputs], feed_dict={X: dataX, Y: dataY})

for j, result in enumerate(results):

index = np.argmax(result, axis=1)

print(i, j, ''.join([char_set[t] for t in index]), l)

# Let's print the last char of each result to check it works

results = sess.run(outputs, feed_dict={X: dataX})

for j, result in enumerate(results):

index = np.argmax(result, axis=1)

if j is 0: # print all for the first result to make a sentence

print(''.join([char_set[t] for t in index]), end='')

else:

print(char_set[index[-1]], end='')

'''

0 167 tttttttttt 3.23111

0 168 tttttttttt 3.23111

0 169 tttttttttt 3.23111

…

499 167 of the se 0.229616

499 168 tf the sea 0.229616

499 169 the sea. 0.229616

g you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.

'''

[# lab-12-5-rnn_stock_prediction]

# lab-12-5-rnn_stock_prediction

"""

내일 주가 예측 : 기존의 7일의 data를 학습

그래프 인쇄않되고 있음.

"""

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

'''

This script shows how to predict stock prices using a basic RNN

'''

import tensorflow as tf

import numpy as np

import matplotlib

import os

tf.set_random_seed(777) # reproducibility

if "DISPLAY" not in os.environ:

# remove Travis CI Error

matplotlib.use('Agg')

import matplotlib.pyplot as plt

def MinMaxScaler(data):

''' Min Max Normalization

Parameters

----------

data : numpy.ndarray

input data to be normalized

shape: [Batch size, dimension]

Returns

----------

data : numpy.ndarry

normalized data

shape: [Batch size, dimension]

References

----------

.. [1] http://sebastianraschka.com/Articles/2014_about_feature_scaling.html

'''

numerator = data - np.min(data, 0)

denominator = np.max(data, 0) - np.min(data, 0)

# noise term prevents the zero division

return numerator / (denominator + 1e-7)

# train Parameters

seq_length = 7

data_dim = 5

hidden_dim = 10

output_dim = 1

learning_rate = 0.01

iterations = 500

# Open, High, Low, Volume, Close

xy = np.loadtxt('data-02-stock_daily.csv', delimiter=',')

xy = xy[::-1] # reverse order (chronically ordered)

xy = MinMaxScaler(xy)

x = xy

y = xy[:, [-1]] # Close as label

# build a dataset

dataX = []

dataY = []

for i in range(0, len(y) - seq_length):

_x = x[i:i + seq_length]

_y = y[i + seq_length] # Next close price

print(_x, "->", _y)

dataX.append(_x)

dataY.append(_y)

# train/test split

train_size = int(len(dataY) * 0.7)

test_size = len(dataY) - train_size

trainX, testX = np.array(dataX[0:train_size]), np.array(

dataX[train_size:len(dataX)])

trainY, testY = np.array(dataY[0:train_size]), np.array(

dataY[train_size:len(dataY)])

# input place holders

X = tf.placeholder(tf.float32, [None, seq_length, data_dim])

Y = tf.placeholder(tf.float32, [None, 1])

# build a LSTM network

cell = tf.contrib.rnn.BasicLSTMCell(

num_units=hidden_dim, state_is_tuple=True, activation=tf.tanh)

outputs, _states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

Y_pred = tf.contrib.layers.fully_connected(

outputs[:, -1], output_dim, activation_fn=None) # We use the last cell's output

# cost/loss

loss = tf.reduce_sum(tf.square(Y_pred - Y)) # sum of the squares

# optimizer

optimizer = tf.train.AdamOptimizer(learning_rate)

train = optimizer.minimize(loss)

# RMSE

targets = tf.placeholder(tf.float32, [None, 1])

predictions = tf.placeholder(tf.float32, [None, 1])

rmse = tf.sqrt(tf.reduce_mean(tf.square(targets - predictions)))

with tf.Session() as sess:

init = tf.global_variables_initializer()

sess.run(init)

# Training step

for i in range(iterations):

_, step_loss = sess.run([train, loss], feed_dict={

X: trainX, Y: trainY})

print("[step: {}] loss: {}".format(i, step_loss))

# Test step

test_predict = sess.run(Y_pred, feed_dict={X: testX})

rmse_val = sess.run(rmse, feed_dict={

targets: testY, predictions: test_predict})

print("RMSE: {}".format(rmse_val))

# Plot predictions

plt.plot(testY)

plt.plot(test_predict)

plt.xlabel("Time Period")

plt.ylabel("Stock Price")

plt.show()

[참고자료]

https://www.inflearn.com/course/기본적인-머신러닝-딥러닝-강좌

https://github.com/hunkim/deeplearningzerotoall

https://www.tensorflow.org/api_docs/python/tf/layers

저작자표시 비영리 변경금지

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #13 ] q_net_frozenlake / cartpole (0)	2017.08.07
[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake (0)	2017.08.06
[인공지능 #10 ]mnist_cnn/mnist_deep_cnn/mnist_cnn_class/mnist_cnn_layers/mnist_cnn_ensemble_layers (0)	2017.08.06
[인공지능 #9]mnist_softmax /mnist_nn/mnist_nn_xavier/mnist_nn_deep / mnist_nn_dropout (0)	2017.08.06
[인공지능 #8]xor / xor-nn / xor-nn-wide-deep (0)	2017.08.06

‹ Prev 1 ··· 7 8 9 10 11 12 13 ··· 28 Next ›

TechTogetWorld

[인공지능 #13 ] q_net_frozenlake / cartpole

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #12 ] Q-Learning / OpenAI gym / frozenlake

'프로젝트 > 인공지능' 카테고리의 다른 글

[인공지능 #11 ] hello-rnn /char-seq-rnn /char-seq-softmax-only /rnn_long_char

'프로젝트 > 인공지능' 카테고리의 다른 글

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

최근에 받은 트랙백

링크

티스토리툴바

« 2024/07 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31