구글 딥마인드 Deep Q-learning 벽돌깨기

2023. 7. 6. 20:35

The most important thing to know is that

all the agent is given is sensory input

(what you see on the screen) and

it was ordered to maximize the score on the screen.

가장 중요한 것은,

에이전트에게 주어진 것은

감각적인 입력(화면에서 보이는 것)뿐이며,

그것은 화면에서의 점수를 최대화하기 위해

명령을 받았다는 것이다.

No domain knowledge is involved!

This means that the algorithm doesn't know

the concept of a ball or what the controls exactly do.

도메인 지식이 존재하지 않습니다!

(도메인=벽돌깨기 게임)

이것은 알고리즘이

공의 개념이나 제어가 정확히

무엇을 하는지를 모른다는 것을 의미합니다.

The algorithm tries to hit the ball back,

but it is yet too clumsy to manage.

알고리즘은 공을 다시 치려고 시도하지만,

아직은 너무 서투르기 때문에 관리하기 어렵습니다.

After 240 minutes of training
This is where the magic happens:
it realizes that digging a tunnel through

the wall is the most effective technique to beat the game.

240분의 훈련 후
마법이 일어나는 곳입니다:
게임을 이기는 가장 효과적인 기술은

벽을 통해 터널을 파는 것을 깨닫습니다.

728x90

위드석_with Seok