万本电子书0元读

万本电子书0元读

顶部广告

Large Scale Machine Learning with Python电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Bastiaan Sjardin,Luca Massaron,Alberto Boschetti

出  版  社:Packt Publishing

出版时间:2016-08-01

字       数:338.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Learn to build powerful machine learning models quickly and deploy large-scale predictive applications About This Book Design, engineer and deploy scalable machine learning solutions with the power of Python Take command of Hadoop and Spark with Python for effective machine learning on a map reduce framework Build state-of-the-art models and develop personalized recommendations to perform machine learning at scale Who This Book Is For This book is for anyone who intends to work with large and complex data sets. Familiarity with basic Python and machine learning concepts is recommended. Working knowledge in statistics and computational mathematics would also be helpful. What You Will Learn Apply the most scalable machine learning algorithms Work with modern state-of-the-art large-scale machine learning techniques Increase predictive accuracy with deep learning and scalable data-handling techniques Improve your work by combining the MapReduce framework with Spark Build powerful ensembles at scale Use data streams to train linear and non-linear predictive models from extremely large datasets using a single machine In Detail Large Python machine learning projects involve new problems associated with specialized machine learning architectures and designs that many data scientists have yet to tackle. But finding algorithms and designing and building platforms that deal with large sets of data is a growing need. Data scientists have to manage and maintain increasingly complex data projects, and with the rise of big data comes an increasing demand for computational and algorithmic efficiency. Large Scale Machine Learning with Python uncovers a new wave of machine learning algorithms that meet scalability demands together with a high predictive accuracy. Dive into scalable machine learning and the three forms of scalability. Speed up algorithms that can be used on a desktop computer with tips on parallelization and memory allocation. Get to grips with new algorithms that are specifically designed for large projects and can handle bigger files, and learn about machine learning in big data environments. We will also cover the most effective machine learning techniques on a map reduce framework in Hadoop and Spark in Python. Style and approach This efficient and practical title is stuffed full of the techniques, tips and tools you need to ensure your large scale Python machine learning runs swiftly and seamlessly. Large-scale machine learning tackles a different issue to what is currently on the market. Those working with Hadoop clusters and in data intensive environments can now learn effective ways of building powerful machine learning models from prototype to production. This book is written in a style that programmers from other languages (R, Julia, Java, Matlab) can follow.
目录展开

Large Scale Machine Learning with Python

Table of Contents

Large Scale Machine Learning with Python

Credits

About the Authors

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. First Steps to Scalability

Explaining scalability in detail

Making large scale examples

Introducing Python

Scale up with Python

Scale out with Python

Python for large scale machine learning

Choosing between Python 2 and Python 3

Installing Python

Step-by-step installation

The installation of packages

Package upgrades

Scientific distributions

Introducing Jupyter/IPython

Python packages

NumPy

SciPy

Pandas

Scikit-learn

The matplotlib package

Gensim

H2O

XGBoost

Theano

TensorFlow

The sknn library

Theanets

Keras

Other useful packages to install on your system

Summary

2. Scalable Learning in Scikit-learn

Out-of-core learning

Subsampling as a viable option

Optimizing one instance at a time

Building an out-of-core learning system

Streaming data from sources

Datasets to try the real thing yourself

The first example – streaming the bike-sharing dataset

Using pandas I/O tools

Working with databases

Paying attention to the ordering of instances

Stochastic learning

Batch gradient descent

Stochastic gradient descent

The Scikit-learn SGD implementation

Defining SGD learning parameters

Feature management with data streams

Describing the target

The hashing trick

Other basic transformations

Testing and validation in a stream

Trying SGD in action

Summary

3. Fast SVM Implementations

Datasets to experiment with on your own

The bike-sharing dataset

The covertype dataset

Support Vector Machines

Hinge loss and its variants

Understanding the Scikit-learn SVM implementation

Pursuing nonlinear SVMs by subsampling

Achieving SVM at scale with SGD

Feature selection by regularization

Including non-linearity in SGD

Trying explicit high-dimensional mappings

Hyperparameter tuning

Other alternatives for SVM fast learning

Nonlinear and faster with Vowpal Wabbit

Installing VW

Understanding the VW data format

Python integration

A few examples using reductions for SVM and neural nets

Faster bike-sharing

The covertype dataset crunched by VW

Summary

4. Neural Networks and Deep Learning

The neural network architecture

What and how neural networks learn

Choosing the right architecture

The input layer

The hidden layer

The output layer

Neural networks in action

Parallelization for sknn

Neural networks and regularization

Neural networks and hyperparameter optimization

Neural networks and decision boundaries

Deep learning at scale with H2O

Large scale deep learning with H2O

Gridsearch on H2O

Deep learning and unsupervised pretraining

Deep learning with theanets

Autoencoders and unsupervised learning

Autoencoders

Summary

5. Deep Learning with TensorFlow

TensorFlow installation

TensorFlow operations

GPU computing

Linear regression with SGD

A neural network from scratch in TensorFlow

Machine learning on TensorFlow with SkFlow

Deep learning with large files – incremental learning

Keras and TensorFlow installation

Convolutional Neural Networks in TensorFlow through Keras

The convolution layer

The pooling layer

The fully connected layer

CNN's with an incremental approach

GPU Computing

Summary

6. Classification and Regression Trees at Scale

Bootstrap aggregation

Random forest and extremely randomized forest

Fast parameter optimization with randomized search

Extremely randomized trees and large datasets

CART and boosting

Gradient Boosting Machines

max_depth

learning_rate

Subsample

Faster GBM with warm_start

Speeding up GBM with warm_start

Training and storing GBM models

XGBoost

XGBoost regression

XGBoost and variable importance

XGBoost streaming large datasets

XGBoost model persistence

Out-of-core CART with H2O

Random forest and gridsearch on H2O

Stochastic gradient boosting and gridsearch on H2O

Summary

7. Unsupervised Learning at Scale

Unsupervised methods

Feature decomposition – PCA

Randomized PCA

Incremental PCA

Sparse PCA

PCA with H2O

Clustering – K-means

Initialization methods

K-means assumptions

Selection of the best K

Scaling K-means – mini-batch

K-means with H2O

LDA

Scaling LDA – memory, CPUs, and machines

Summary

8. Distributed Environments – Hadoop and Spark

From a standalone machine to a bunch of nodes

Why do we need a distributed framework?

Setting up the VM

VirtualBox

Vagrant

Using the VM

The Hadoop ecosystem

Architecture

HDFS

MapReduce

YARN

Spark

pySpark

Summary

9. Practical Machine Learning with Spark

Setting up the VM for this chapter

Sharing variables across cluster nodes

Broadcast read-only variables

Accumulators write-only variables

Broadcast and accumulators together – an example

Data preprocessing in Spark

JSON files and Spark DataFrames

Dealing with missing data

Grouping and creating tables in-memory

Writing the preprocessed DataFrame or RDD to disk

Working with Spark DataFrames

Machine learning with Spark

Spark on the KDD99 dataset

Reading the dataset

Feature engineering

Training a learner

Evaluating a learner's performance

The power of the ML pipeline

Manual tuning

Cross-validation

Final cleanup

Summary

A. Introduction to GPUs and Theano

GPU computing

Theano – parallel computing on the GPU

Installing Theano

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部