


Hands-On Q-Learning with Python电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Nazia Habib

出  版  社:Packt Publishing


字       数:23.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络



  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Leverage the power of reward-based training for your deep learning models with Python Key Features * Understand Q-learning algorithms to train neural networks using Markov Decision Process (MDP) * Study practical deep reinforcement learning using Q-Networks * Explore state-based unsupervised learning for machine learning models Book Description Q-learning is a machine learning algorithm used to solve optimization problems in artificial intelligence (AI). It is one of the most popular fields of study among AI researchers. This book starts off by introducing you to reinforcement learning and Q-learning, in addition to helping you get familiar with OpenAI Gym as well as libraries such as Keras and TensorFlow. A few chapters into the book, you will gain insights into modelfree Q-learning and use deep Q-networks and double deep Q-networks to solve complex problems. This book will guide you in exploring use cases such as self-driving vehicles and OpenAI Gym’s CartPole problem. You will also learn how to tune and optimize Q-networks and their hyperparameters. As you progress, you will understand the reinforcement learning approach to solving real-world problems. You will also explore how to use Q-learning and related algorithms in real-world applications such as scientific research. Toward the end, you’ll gain a sense of what’s in store for reinforcement learning. By the end of this book, you will be equipped with the skills you need to solve reinforcement learning problems using Q-learning algorithms with OpenAI Gym, Keras, and TensorFlow. What you will learn * Explore the fundamentals of reinforcement learning and the state-action-reward process * Understand Markov decision processes * Get well versed with libraries such as Keras, and TensorFlow * Create and deploy model-free learning and deep Q-learning agents with TensorFlow, Keras, and OpenAI Gym * Choose and optimize a Q-Network’s learning parameters and fine-tune its performance * Discover real-world applications and use cases of Q-learning Who this book is for If you are a machine learning developer, engineer, or professional who wants to delve into the deep learning approach for a complex environment, then this is the book for you. Proficiency in Python programming and basic understanding of decision-making in reinforcement learning is assumed.

About Packt

Why subscribe?



About the author

About the reviewers

Packt is searching for authors like you


Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch


Section 1: Q-Learning: A Roadmap

Brushing Up on Reinforcement Learning Concepts

What is RL?

States and actions

The decision-making process

RL, supervised learning, and unsupervised learning

States, actions, and rewards


Actions and rewards

Bellman equations

Key concepts in RL

Value-based versus policy-based iteration

Q-learning hyperparameters – alpha, gamma, and epsilon

Alpha – deterministic versus stochastic environments

Gamma – current versus future rewards

Epsilon – exploration versus exploitation

Decaying epsilon

SARSA versus Q-learning – on-policy or off?

SARSA and the cliff-walking problem

When to choose SARSA over Q-learning



Getting Started with the Q-Learning Algorithm

Technical requirements

Demystifying MDPs

Control processes

Markov chains

The Markov property

MDPs and state-action diagrams

Solving MDPs with RL

Your Q-learning agent in its environment

Solving the optimization problem

States and actions in Taxi-v2

Fine-tuning your model – learning, discount, and exploration rates

Decaying epsilon

Decaying alpha

Decaying gamma

MABP – a classic exploration versus exploitation problem

Setting up a bandit problem

Bandit optimization strategies

Other applications for bandit problems

Optimal versus safe paths – revisiting SARSA



Setting Up Your First Environment with OpenAI Gym

Technical requirements

Getting started with OpenAI Gym

What is Gym?

Setting up Gym

Gym environments

Setting up an environment

Exploring the Taxi-v2 environment

The state space and valid actions

Choosing an action manually

Setting a state manually

Creating a baseline agent

Stepping through actions

Creating a task loop

Baseline models in Q-learning and machine learning research



Teaching a Smartcab to Drive Using Q-Learning

Technical requirements

Getting to know your learning agent

Implementing your agent

The value function – calculating the Q-value of a state-action pair

Implementing Bellman equations

The learning parameters – alpha, gamma, and epsilon

Adding an updated alpha value

Adding an updated epsilon value

Model-tuning and tracking your agent's long-term performance

Comparing your models and statistical performance measures

Training your models

Decaying epsilon

Hyperparameter tuning



Section 2: Building and Optimizing Q-Learning Agents

Building Q-Networks with TensorFlow

Technical requirements

A brief overview of neural networks

Extensional versus intensional definitions

Taking a closer look

Input, hidden, and output layers

Perceptron functions

ReLU functions

Implementing a neural network with NumPy



Neural networks and Q-learning

Policy agents versus value agents

Building your first Q-network

Defining the network

Training the network



Further reading

Digging Deeper into Deep Q-Networks with Keras and TensorFlow

Technical requirements

Introducing CartPole-v1

More about CartPole states and actions

Getting started with the CartPole task

Building a DQN to solve the CartPole problem




Building a DQN class

Choosing actions with epsilon-greedy

Updating the Q-values

Running the task loop

Testing and results

Adding in experience replay

About experience replay


Experience replay results

Building further on DQNs

Calculating DQN loss

Fixed Q-targets

Double-deep Q-networks

Dueling deep Q-networks



Further reading

Section 3: Advanced Q-Learning Challenges with Keras, TensorFlow, and OpenAI Gym

Decoupling Exploration and Exploitation in Multi-Armed Bandits

Technical requirements

Probability distributions and ongoing knowledge

Iterative probability distributions

Revisiting a simple bandit problem

A sample two-armed bandit iteration

Multi-armed bandit strategy overview

Greedy strategy

Epsilon-greedy strategy

Upper confidence bound

Bandit regret

Utility functions and optimal decisions

Contextual bandits and state diagrams

Thompson sampling and the Bayesian control rule

Thompson sampling

Bayesian control rule

Solving a multi-armed bandit problem in Python – user advertisement clicks

Epsilon-greedy selection

Multi-armed bandits in experimental design

The testing process

Bandits with knapsacks – more multi-armed bandit applications



Further reading

Further Q-Learning Research and Future Projects

Google's DeepMind and the future of Q-learning

OpenAI Gym and RL research

The standardization of RL research practice with Gym

Tracking your scores with the Gym leaderboard

More OpenAI Gym environments




Continuous control tasks – MuJoCo

Continuous control tasks – Box2D

Robotics research and development


Toy text

Contextual bandits and probability distributions

Probability and intelligence

Updating probability distributions

State spaces

A/B testing versus multi-armed bandit testing

Testing methodologies



Further reading


Chapter 1, Brushing Up on Reinforcement Learning Concepts

Chapter 2, Getting Started with the Q-Learning Algorithm

Chapter 3, Setting Up Your First Environment with OpenAI Gym

Chapter 4, Teaching a Smartcab to Drive Using Q-Learning

Chapter 5, Building Q-Networks with TensorFlow

Chapter 6, Digging Deeper into Deep Q-Networks with Keras and TensorFlow

Chapter 7, Decoupling Exploration and Exploitation in Multi-Armed Bandits

Chapter 8, Further Q-Learning Research and Future Projects

Other Books You May Enjoy

Leave a review - let other readers know what you think

累计评论(0条) 0个书友正在讨论这本书 发表评论




