万本电子书0元读

万本电子书0元读

顶部广告

Hands-On Q-Learning with Python电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Nazia Habib

出  版  社:Packt Publishing

出版时间:2019-04-19

字       数:23.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Leverage the power of reward-based training for your deep learning models with Python Key Features * Understand Q-learning algorithms to train neural networks using Markov Decision Process (MDP) * Study practical deep reinforcement learning using Q-Networks * Explore state-based unsupervised learning for machine learning models Book Description Q-learning is a machine learning algorithm used to solve optimization problems in artificial intelligence (AI). It is one of the most popular fields of study among AI researchers. This book starts off by introducing you to reinforcement learning and Q-learning, in addition to helping you get familiar with OpenAI Gym as well as libraries such as Keras and TensorFlow. A few chapters into the book, you will gain insights into modelfree Q-learning and use deep Q-networks and double deep Q-networks to solve complex problems. This book will guide you in exploring use cases such as self-driving vehicles and OpenAI Gym’s CartPole problem. You will also learn how to tune and optimize Q-networks and their hyperparameters. As you progress, you will understand the reinforcement learning approach to solving real-world problems. You will also explore how to use Q-learning and related algorithms in real-world applications such as scientific research. Toward the end, you’ll gain a sense of what’s in store for reinforcement learning. By the end of this book, you will be equipped with the skills you need to solve reinforcement learning problems using Q-learning algorithms with OpenAI Gym, Keras, and TensorFlow. What you will learn * Explore the fundamentals of reinforcement learning and the state-action-reward process * Understand Markov decision processes * Get well versed with libraries such as Keras, and TensorFlow * Create and deploy model-free learning and deep Q-learning agents with TensorFlow, Keras, and OpenAI Gym * Choose and optimize a Q-Network’s learning parameters and fine-tune its performance * Discover real-world applications and use cases of Q-learning Who this book is for If you are a machine learning developer, engineer, or professional who wants to delve into the deep learning approach for a complex environment, then this is the book for you. Proficiency in Python programming and basic understanding of decision-making in reinforcement learning is assumed.
目录展开

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Q-Learning: A Roadmap

Brushing Up on Reinforcement Learning Concepts

What is RL?

States and actions

The decision-making process

RL, supervised learning, and unsupervised learning

States, actions, and rewards

States

Actions and rewards

Bellman equations

Key concepts in RL

Value-based versus policy-based iteration

Q-learning hyperparameters – alpha, gamma, and epsilon

Alpha – deterministic versus stochastic environments

Gamma – current versus future rewards

Epsilon – exploration versus exploitation

Decaying epsilon

SARSA versus Q-learning – on-policy or off?

SARSA and the cliff-walking problem

When to choose SARSA over Q-learning

Summary

Questions

Getting Started with the Q-Learning Algorithm

Technical requirements

Demystifying MDPs

Control processes

Markov chains

The Markov property

MDPs and state-action diagrams

Solving MDPs with RL

Your Q-learning agent in its environment

Solving the optimization problem

States and actions in Taxi-v2

Fine-tuning your model – learning, discount, and exploration rates

Decaying epsilon

Decaying alpha

Decaying gamma

MABP – a classic exploration versus exploitation problem

Setting up a bandit problem

Bandit optimization strategies

Other applications for bandit problems

Optimal versus safe paths – revisiting SARSA

Summary

Questions

Setting Up Your First Environment with OpenAI Gym

Technical requirements

Getting started with OpenAI Gym

What is Gym?

Setting up Gym

Gym environments

Setting up an environment

Exploring the Taxi-v2 environment

The state space and valid actions

Choosing an action manually

Setting a state manually

Creating a baseline agent

Stepping through actions

Creating a task loop

Baseline models in Q-learning and machine learning research

Summary

Questions

Teaching a Smartcab to Drive Using Q-Learning

Technical requirements

Getting to know your learning agent

Implementing your agent

The value function – calculating the Q-value of a state-action pair

Implementing Bellman equations

The learning parameters – alpha, gamma, and epsilon

Adding an updated alpha value

Adding an updated epsilon value

Model-tuning and tracking your agent's long-term performance

Comparing your models and statistical performance measures

Training your models

Decaying epsilon

Hyperparameter tuning

Summary

Questions

Section 2: Building and Optimizing Q-Learning Agents

Building Q-Networks with TensorFlow

Technical requirements

A brief overview of neural networks

Extensional versus intensional definitions

Taking a closer look

Input, hidden, and output layers

Perceptron functions

ReLU functions

Implementing a neural network with NumPy

Feedforward

Backpropagation

Neural networks and Q-learning

Policy agents versus value agents

Building your first Q-network

Defining the network

Training the network

Summary

Questions

Further reading

Digging Deeper into Deep Q-Networks with Keras and TensorFlow

Technical requirements

Introducing CartPole-v1

More about CartPole states and actions

Getting started with the CartPole task

Building a DQN to solve the CartPole problem

Gamma

Alpha

Epsilon

Building a DQN class

Choosing actions with epsilon-greedy

Updating the Q-values

Running the task loop

Testing and results

Adding in experience replay

About experience replay

Implementation

Experience replay results

Building further on DQNs

Calculating DQN loss

Fixed Q-targets

Double-deep Q-networks

Dueling deep Q-networks

Summary

Questions

Further reading

Section 3: Advanced Q-Learning Challenges with Keras, TensorFlow, and OpenAI Gym

Decoupling Exploration and Exploitation in Multi-Armed Bandits

Technical requirements

Probability distributions and ongoing knowledge

Iterative probability distributions

Revisiting a simple bandit problem

A sample two-armed bandit iteration

Multi-armed bandit strategy overview

Greedy strategy

Epsilon-greedy strategy

Upper confidence bound

Bandit regret

Utility functions and optimal decisions

Contextual bandits and state diagrams

Thompson sampling and the Bayesian control rule

Thompson sampling

Bayesian control rule

Solving a multi-armed bandit problem in Python – user advertisement clicks

Epsilon-greedy selection

Multi-armed bandits in experimental design

The testing process

Bandits with knapsacks – more multi-armed bandit applications

Summary

Questions

Further reading

Further Q-Learning Research and Future Projects

Google's DeepMind and the future of Q-learning

OpenAI Gym and RL research

The standardization of RL research practice with Gym

Tracking your scores with the Gym leaderboard

More OpenAI Gym environments

Pendulum

Acrobot

MountainCar

Continuous control tasks – MuJoCo

Continuous control tasks – Box2D

Robotics research and development

Algorithms

Toy text

Contextual bandits and probability distributions

Probability and intelligence

Updating probability distributions

State spaces

A/B testing versus multi-armed bandit testing

Testing methodologies

Summary

Questions

Further reading

Assessments

Chapter 1, Brushing Up on Reinforcement Learning Concepts

Chapter 2, Getting Started with the Q-Learning Algorithm

Chapter 3, Setting Up Your First Environment with OpenAI Gym

Chapter 4, Teaching a Smartcab to Drive Using Q-Learning

Chapter 5, Building Q-Networks with TensorFlow

Chapter 6, Digging Deeper into Deep Q-Networks with Keras and TensorFlow

Chapter 7, Decoupling Exploration and Exploitation in Multi-Armed Bandits

Chapter 8, Further Q-Learning Research and Future Projects

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部