售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Cover
Table of Contents
Hadoop: Data Processing and Modelling
Hadoop: Data Processing and Modelling
Hadoop: Data Processing and Modelling
Credits
Preface
What this learning path covers
Hadoop beginners Guide
Hadoop Real World Solutions Cookbook, 2nd edition
Mastering Hadoop
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Part 1. Module 1
Chapter 1. What It's All About
Big data processing
The value of data
Historically for the few and not the many
A different approach
Hadoop
Cloud computing with Amazon Web Services
Too many clouds
A third way
Different types of costs
AWS – infrastructure on demand from Amazon
What this book covers
Summary
Chapter 2. Getting Hadoop Up and Running
Hadoop on a local Ubuntu host
Other operating systems
Time for action – checking the prerequisites
What just happened?
Setting up Hadoop
Time for action – downloading Hadoop
What just happened?
Time for action – setting up SSH
What just happened?
Configuring and running Hadoop
Time for action – using Hadoop to calculate Pi
What just happened?
Three modes
Time for action – configuring the pseudo-distributed mode
What just happened?
Configuring the base directory and formatting the filesystem
Time for action – changing the base HDFS directory
What just happened?
Time for action – formatting the NameNode
What just happened?
Starting and using Hadoop
Time for action – starting Hadoop
What just happened?
Time for action – using HDFS
What just happened?
Time for action – WordCount, the Hello World of MapReduce
What just happened?
Have a go hero – WordCount on a larger body of text
Monitoring Hadoop from the browser
Using Elastic MapReduce
Setting up an account in Amazon Web Services
Time for action – WordCount on EMR using the management console
What just happened?
Have a go hero – other EMR sample applications
Other ways of using EMR
The AWS ecosystem
Comparison of local versus EMR Hadoop
Summary
Chapter 3. Understanding MapReduce
Key/value pairs
What it mean
Why key/value data?
MapReduce as a series of key/value transformations
Pop quiz – key/value pairs
The Hadoop Java API for MapReduce
The 0.20 MapReduce Java API
Writing MapReduce programs
Time for action – setting up the classpath
What just happened?
Time for action – implementing WordCount
What just happened?
Time for action – building a JAR file
What just happened?
Time for action – running WordCount on a local Hadoop cluster
What just happened?
Time for action – running WordCount on EMR
What just happened?
The pre-0.20 Java MapReduce API
Hadoop-provided mapper and reducer implementations
Time for action – WordCount the easy way
What just happened?
Walking through a run of WordCount
Startup
Splitting the input
Task assignment
Task startup
Ongoing JobTracker monitoring
Mapper input
Mapper execution
Mapper output and reduce input
Partitioning
The optional partition function
Reducer input
Reducer execution
Reducer output
Shutdown
That's all there is to it!
Apart from the combiner…maybe
Time for action – WordCount with a combiner
What just happened?
Time for action – fixing WordCount to work with a combiner
What just happened?
Reuse is your friend
Pop quiz – MapReduce mechanics
Hadoop-specific data types
The Writable and WritableComparable interfaces
Introducing the wrapper classes
Time for action – using the Writable wrapper classes
What just happened?
Have a go hero – playing with Writables
Input/output
Files, splits, and records
InputFormat and RecordReader
Hadoop-provided InputFormat
Hadoop-provided RecordReader
OutputFormat and RecordWriter
Hadoop-provided OutputFormat
Don't forget Sequence files
Summary
Chapter 4. Developing MapReduce Programs
Using languages other than Java with Hadoop
How Hadoop Streaming works
Why to use Hadoop Streaming
Time for action – implementing WordCount using Streaming
What just happened?
Differences in jobs when using Streaming
Analyzing a large dataset
Getting the UFO sighting dataset
Getting a feel for the dataset
Time for action – summarizing the UFO data
What just happened?
Time for action – summarizing the shape data
What just happened?
Time for action – correlating of sighting duration to UFO shape
What just happened?
Time for action – performing the shape/time analysis from the command line
What just happened?
Java shape and location analysis
Time for action – using ChainMapper for field validation/analysis
What just happened?
Have a go hero
Time for action – using the Distributed Cache to improve location output
What just happened?
Counters, status, and other output
Time for action – creating counters, task states, and writing log output
What just happened?
Too much information!
Summary
Chapter 5. Advanced MapReduce Techniques
Simple, advanced, and in-between
Joins
When this is a bad idea
Map-side versus reduce-side joins
Matching account and sales information
Time for action – reduce-side join using MultipleInputs
What just happened?
Implementing map-side joins
Have a go hero - Implementing map-side joins
To join or not to join...
Graph algorithms
Graph 101
Graphs and MapReduce – a match made somewhere
Representing a graph
Time for action – representing the graph
What just happened?
Overview of the algorithm
Time for action – creating the source code
What just happened?
Time for action – the first run
What just happened?
Time for action – the second run
What just happened?
Time for action – the third run
What just happened?
Time for action – the fourth and last run
What just happened?
Running multiple jobs
Final thoughts on graphs
Using language-independent data structures
Candidate technologies
Introducing Avro
Time for action – getting and installing Avro
What just happened?
Avro and schemas
Time for action – defining the schema
What just happened?
Time for action – creating the source Avro data with Ruby
What just happened?
Time for action – consuming the Avro data with Java
What just happened?
Using Avro within MapReduce
Time for action – generating shape summaries in MapReduce
What just happened?
Time for action – examining the output data with Ruby
What just happened?
Time for action – examining the output data with Java
What just happened?
Have a go hero – graphs in Avro
Going forward with Avro
Summary
Chapter 6. When Things Break
Failure
Embrace failure
Or at least don't fear it
Don't try this at home
Types of failure
Hadoop node failure
Time for action – killing a DataNode process
What just happened?
Have a go hero – NameNode log delving
Time for action – the replication factor in action
What just happened?
Time for action – intentionally causing missing blocks
What just happened?
Time for action – killing a TaskTracker process
What just happened?
Killing the cluster masters
Time for action – killing the JobTracker
What just happened?
Have a go hero – moving the JobTracker to a new host
Time for action – killing the NameNode process
What just happened?
Task failure due to software
Time for action – causing task failure
What just happened?
Have a go hero – HDFS programmatic access
Have a go hero – causing tasks to fail
Task failure due to data
Time for action – handling dirty data by using skip mode
What just happened?
Summary
Chapter 7. Keeping Things Running
A note on EMR
Hadoop configuration properties
Default values
Time for action – browsing default properties
What just happened?
Additional property elements
Default storage location
Where to set properties
Setting up a cluster
How many hosts?
Special node requirements
Storage types
Hadoop networking configuration
Time for action – examining the default rack configuration
What just happened?
Time for action – adding a rack awareness script
What just happened?
What is commodity hardware anyway?
Pop quiz – setting up a cluster
Cluster access control
The Hadoop security model
Time for action – demonstrating the default security
What just happened?
Working around the security model via physical access control
Managing the NameNode
Configuring multiple locations for the fsimage class
Time for action – adding an additional fsimage location
What just happened?
Swapping to another NameNode host
Time for action – swapping to a new NameNode host
What just happened?
Have a go hero – swapping to a new NameNode host
Managing HDFS
Where to write data
Using balancer
MapReduce management
Command line job management
Have a go hero – command line job management
Job priorities and scheduling
Time for action – changing job priorities and killing a job
What just happened?
Alternative schedulers
Scaling
Adding capacity to a local Hadoop cluster
Have a go hero – adding a node and running balancer
Adding capacity to an EMR job flow
Summary
Chapter 8. A Relational View on Data with Hive
Overview of Hive
Why use Hive?
Thanks, Facebook!
Setting up Hive
Prerequisites
Getting Hive
Time for action – installing Hive
What just happened?
Using Hive
Time for action – creating a table for the UFO data
What just happened?
Time for action – inserting the UFO data
What just happened?
Validating the data
Time for action – validating the table
What just happened?
Time for action – redefining the table with the correct column separator
What just happened?
Hive tables – real or not?
Time for action – creating a table from an existing file
What just happened?
Time for action – performing a join
What just happened?
Have a go hero – improve the join to use regular expressions
Hive and SQL views
Time for action – using views
What just happened?
Handling dirty data in Hive
Have a go hero – do it!
Time for action – exporting query output
What just happened?
Partitioning the table
Time for action – making a partitioned UFO sighting table
What just happened?
Bucketing, clustering, and sorting... oh my!
User-Defined Function
Time for action – adding a new User Defined Function (UDF)
What just happened?
To preprocess or not to preprocess...
Hive versus Pig
What we didn't cover
Hive on Amazon Web Services
Time for action – running UFO analysis on EMR
What just happened?
Using interactive job flows for development
Have a go hero – using an interactive EMR cluster
Integration with other AWS products
Summary
Chapter 9. Working with Relational Databases
Common data paths
Hadoop as an archive store
Hadoop as a preprocessing step
Hadoop as a data input tool
The serpent eats its own tail
Setting up MySQL
Time for action – installing and setting up MySQL
What just happened?
Did it have to be so hard?
Time for action – configuring MySQL to allow remote connections
What just happened?
Don't do this in production!
Time for action – setting up the employee database
What just happened?
Be careful with data file access rights
Getting data into Hadoop
Using MySQL tools and manual import
Have a go hero – exporting the employee table into HDFS
Accessing the database from the mapper
A better way – introducing Sqoop
Time for action – downloading and configuring Sqoop
What just happened?
Time for action – exporting data from MySQL to HDFS
What just happened?
Importing data into Hive using Sqoop
Time for action – exporting data from MySQL into Hive
What just happened?
Time for action – a more selective import
What just happened?
Time for action – using a type mapping
What just happened?
Time for action – importing data from a raw query
What just happened?
Have a go hero
Getting data out of Hadoop
Writing data from within the reducer
Writing SQL import files from the reducer
A better way – Sqoop again
Time for action – importing data from Hadoop into MySQL
What just happened?
Have a go hero
Time for action – importing Hive data into MySQL
What just happened?
Time for action – fixing the mapping and re-running the export
What just happened?
AWS considerations
Considering RDS
Summary
Chapter 10. Data Collection with Flume
A note about AWS
Data data everywhere...
Types of data
Getting network traffic into Hadoop
Time for action – getting web server data into Hadoop
What just happened?
Have a go hero
Getting files into Hadoop
Hidden issues
Introducing Apache Flume
A note on versioning
Time for action – installing and configuring Flume
What just happened?
Using Flume to capture network data
Time for action – capturing network traffic in a log file
What just happened?
Time for action – logging to the console
What just happened?
Writing network data to log files
Time for action – capturing the output of a command to a flat file
What just happened?
Time for action – capturing a remote file in a local flat file
What just happened?
Sources, sinks, and channels
Understanding the Flume configuration files
Have a go hero
It's all about events
Time for action – writing network traffic onto HDFS
What just happened?
Time for action – adding timestamps
What just happened?
To Sqoop or to Flume...
Time for action – multi level Flume networks
What just happened?
Time for action – writing to multiple sinks
What just happened?
Selectors replicating and multiplexing
Handling sink failure
Have a go hero - Handling sink failure
Next, the world
Have a go hero - Next, the world
The bigger picture
Data lifecycle
Staging data
Scheduling
Summary
Chapter 11. Where to Go Next
What we did and didn't cover in this book
Upcoming Hadoop changes
Alternative distributions
Why alternative distributions?
Other Apache projects
HBase
Oozie
Whir
Mahout
MRUnit
Other programming abstractions
Pig
Cascading
AWS resources
HBase on EMR
SimpleDB
DynamoDB
Sources of information
Source code
Mailing lists and forums
LinkedIn groups
HUGs
Conferences
Summary
Appendix A. Pop Quiz Answers
Chapter 3, Understanding MapReduce
Pop quiz – key/value pairs
Pop quiz – walking through a run of WordCount
Chapter 7, Keeping Things Running
Pop quiz – setting up a cluster
Part 2. Module 2
Chapter 1. Getting Started with Hadoop 2.X
Introduction
Installing a single-node Hadoop Cluster
Getting ready
How to do it...
How it works...
There's more
Installing a multi-node Hadoop cluster
Getting ready
How to do it...
How it works...
Adding new nodes to existing Hadoop clusters
Getting ready
How to do it...
How it works...
Executing the balancer command for uniform data distribution
Getting ready
How to do it...
How it works...
There's more...
Entering and exiting from the safe mode in a Hadoop cluster
How to do it...
How it works...
Decommissioning DataNodes
Getting ready
How to do it...
How it works...
Performing benchmarking on a Hadoop cluster
Getting ready
How to do it...
How it works...
Chapter 2. Exploring HDFS
Introduction
Loading data from a local machine to HDFS
Getting ready
How to do it...
How it works...
Exporting HDFS data to a local machine
Getting ready
How to do it...
How it works...
Changing the replication factor of an existing file in HDFS
Getting ready
How to do it...
How it works...
Setting the HDFS block size for all the files in a cluster
Getting ready
How to do it...
How it works...
Setting the HDFS block size for a specific file in a cluster
Getting ready
How to do it...
How it works...
Enabling transparent encryption for HDFS
Getting ready
How to do it...
How it works...
Importing data from another Hadoop cluster
Getting ready
How to do it...
How it works...
Recycling deleted data from trash to HDFS
Getting ready
How to do it...
How it works...
Saving compressed data in HDFS
Getting ready
How to do it...
How it works...
Chapter 3. Mastering Map Reduce Programs
Introduction
Writing the Map Reduce program in Java to analyze web log data
Getting ready
How to do it...
How it works...
Executing the Map Reduce program in a Hadoop cluster
Getting ready
How to do it
How it works...
Adding support for a new writable data type in Hadoop
Getting ready
How to do it...
How it works...
Implementing a user-defined counter in a Map Reduce program
Getting ready
How to do it...
How it works...
Map Reduce program to find the top X
Getting ready
How to do it...
How it works
Map Reduce program to find distinct values
Getting ready
How to do it
How it works...
Map Reduce program to partition data using a custom partitioner
Getting ready
How to do it...
How it works...
Writing Map Reduce results to multiple output files
Getting ready
How to do it...
How it works...
Performing Reduce side Joins using Map Reduce
Getting ready
How to do it
How it works...
Unit testing the Map Reduce code using MRUnit
Getting ready
How to do it...
How it works...
Chapter 4. Data Analysis Using Hive, Pig, and Hbase
Introduction
Storing and processing Hive data in a sequential file format
Getting ready
How to do it...
How it works...
Storing and processing Hive data in the ORC file format
Getting ready
How to do it...
How it works...
Storing and processing Hive data in the ORC file format
Getting ready
How to do it...
How it works...
Storing and processing Hive data in the Parquet file format
Getting ready
How to do it...
How it works...
Performing FILTER By queries in Pig
Getting ready
How to do it...
How it works...
Performing Group By queries in Pig
Getting ready
How to do it...
How it works...
Performing Order By queries in Pig
Getting ready
How to do it..
How it works...
Performing JOINS in Pig
Getting ready
How to do it...
How it works
Writing a user-defined function in Pig
Getting ready
How to do it...
How it works...
There's more...
Analyzing web log data using Pig
Getting ready
How to do it...
How it works...
Performing the Hbase operation in CLI
Getting ready
How to do it
How it works...
Performing Hbase operations in Java
Getting ready
How to do it
How it works...
Executing the MapReduce programming with an Hbase Table
Getting ready
How to do it
How it works
Chapter 5. Advanced Data Analysis Using Hive
Introduction
Processing JSON data in Hive using JSON SerDe
Getting ready
How to do it...
How it works...
Processing XML data in Hive using XML SerDe
Getting ready
How to do it...
How it works
Processing Hive data in the Avro format
Getting ready
How to do it...
How it works...
Writing a user-defined function in Hive
Getting ready
How to do it
How it works...
Performing table joins in Hive
Getting ready
How to do it...
How it works...
Executing map side joins in Hive
Getting ready
How to do it...
How it works...
Performing context Ngram in Hive
Getting ready
How to do it...
How it works...
Call Data Record Analytics using Hive
Getting ready
How to do it...
How it works...
Twitter sentiment analysis using Hive
Getting ready
How to do it...
How it works
Implementing Change Data Capture using Hive
Getting ready
How to do it
How it works
Multiple table inserting using Hive
Getting ready
How to do it
How it works
Chapter 6. Data Import/Export Using Sqoop and Flume
Introduction
Importing data from RDMBS to HDFS using Sqoop
Getting ready
How to do it...
How it works...
Exporting data from HDFS to RDBMS
Getting ready
How to do it...
How it works...
Using query operator in Sqoop import
Getting ready
How to do it...
How it works...
Importing data using Sqoop in compressed format
Getting ready
How to do it...
How it works...
Performing Atomic export using Sqoop
Getting ready
How to do it...
How it works...
Importing data into Hive tables using Sqoop
Getting ready
How to do it...
How it works...
Importing data into HDFS from Mainframes
Getting ready
How to do it...
How it works...
Incremental import using Sqoop
Getting ready
How to do it...
How it works...
Creating and executing Sqoop job
Getting ready
How to do it...
How it works...
Importing data from RDBMS to Hbase using Sqoop
Getting ready
How to do it...
How it works...
Importing Twitter data into HDFS using Flume
Getting ready
How to do it...
How it works
Importing data from Kafka into HDFS using Flume
Getting ready
How to do it...
How it works
Importing web logs data into HDFS using Flume
Getting ready
How to do it...
How it works...
Chapter 7. Automation of Hadoop Tasks Using Oozie
Introduction
Implementing a Sqoop action job using Oozie
Getting ready
How to do it...
How it works
Implementing a Map Reduce action job using Oozie
Getting ready
How to do it...
How it works...
Implementing a Java action job using Oozie
Getting ready
How to do it
How it works
Implementing a Hive action job using Oozie
Getting ready
How to do it...
How it works...
Implementing a Pig action job using Oozie
Getting ready
How to do it...
How it works
Implementing an e-mail action job using Oozie
Getting ready
How to do it...
How it works...
Executing parallel jobs using Oozie (fork)
Getting ready
How to do it...
How it works...
Scheduling a job in Oozie
Getting ready
How to do it...
How it works...
Chapter 8. Machine Learning and Predictive Analytics Using Mahout and R
Introduction
Setting up the Mahout development environment
Getting ready
How to do it...
How it works...
Creating an item-based recommendation engine using Mahout
Getting ready
How to do it...
How it works...
Creating a user-based recommendation engine using Mahout
Getting ready
How to do it...
How it works...
Using Predictive analytics on Bank Data using Mahout
Getting ready
How to do it...
How it works...
Clustering text data using K-Means
Getting ready
How to do it...
How it works...
Performing Population Data Analytics using R
Getting ready
How to do it...
How it works...
Performing Twitter Sentiment Analytics using R
Getting ready
How to do it...
How it works...
Performing Predictive Analytics using R
Getting ready
How to do it...
How it works...
Chapter 9. Integration with Apache Spark
Introduction
Running Spark standalone
Getting ready
How to do it...
How it works...
Running Spark on YARN
Getting ready
How to do it...
How it works...
Olympics Athletes analytics using the Spark Shell
Getting ready
How to do it...
How it works...
Creating Twitter trending topics using Spark Streaming
Getting ready
How to do it...
How it works...
Twitter trending topics using Spark streaming
Getting ready
How to do it...
How it works...
Analyzing Parquet files using Spark
Getting ready
How to do it...
How it works...
Analyzing JSON data using Spark
Getting ready
How to do it...
How it works...
Processing graphs using Graph X
Getting ready
How to do it...
How it works...
Conducting predictive analytics using Spark MLib
Getting ready
How to do it...
How it works...
Chapter 10. Hadoop Use Cases
Introduction
Call Data Record analytics
Getting ready
How to do it...
How it works...
Web log analytics
Getting ready
How to do it...
How it works...
Sensitive data masking and encryption using Hadoop
Getting ready
How to do it...
How it works...
Part 3. Module 3
Chapter 1. Hadoop 2.X
The inception of Hadoop
The evolution of Hadoop
Hadoop's genealogy
Hadoop 2.X
Yet Another Resource Negotiator (YARN)
Storage layer enhancements
Support enhancements
Hadoop distributions
Which Hadoop distribution?
Available distributions
Summary
Chapter 2. Advanced MapReduce
MapReduce input
The InputFormat class
The InputSplit class
The RecordReader class
Hadoop's "small files" problem
Filtering inputs
The Map task
The dfs.blocksize attribute
Sort and spill of intermediate outputs
Node-local Reducers or Combiners
Fetching intermediate outputs – Map-side
The Reduce task
Fetching intermediate outputs – Reduce-side
Merge and spill of intermediate outputs
MapReduce output
Speculative execution of tasks
MapReduce job counters
Handling data joins
Reduce-side joins
Map-side joins
Summary
Chapter 3. Advanced Pig
Pig versus SQL
Different modes of execution
Complex data types in Pig
Compiling Pig scripts
The logical plan
The physical plan
The MapReduce plan
Development and debugging aids
The DESCRIBE command
The EXPLAIN command
The ILLUSTRATE command
The advanced Pig operators
The advanced FOREACH operator
Specialized joins in Pig
User-defined functions
The evaluation functions
The load functions
The store functions
Pig performance optimizations
The optimization rules
Measurement of Pig script performance
Combiners in Pig
Memory for the Bag data type
Number of reducers in Pig
The multiquery mode in Pig
Best practices
The explicit usage of types
Early and frequent projection
Early and frequent filtering
The usage of the LIMIT operator
The usage of the DISTINCT operator
The reduction of operations
The usage of Algebraic UDFs
The usage of Accumulator UDFs
Eliminating nulls in the data
The usage of specialized joins
Compressing intermediate results
Combining smaller files
Summary
Chapter 4. Advanced Hive
The Hive architecture
The Hive metastore
The Hive compiler
The Hive execution engine
The supporting components of Hive
Data types
File formats
Compressed files
ORC files
The Parquet files
The data model
Dynamic partitions
Indexes on Hive tables
Hive query optimizers
Advanced DML
The GROUP BY operation
ORDER BY versus SORT BY clauses
The JOIN operator and its types
Advanced aggregation support
Other advanced clauses
UDF, UDAF, and UDTF
Summary
Chapter 5. Serialization and Hadoop I/O
Data serialization in Hadoop
Writable and WritableComparable
Hadoop versus Java serialization
Avro serialization
Avro and MapReduce
Avro and Pig
Avro and Hive
Comparison – Avro versus Protocol Buffers / Thrift
File formats
The Sequence file format
The MapFile format
Other data structures
Compression
Splits and compressions
Scope for compression
Summary
Chapter 6. YARN – Bringing Other Paradigms to Hadoop
The YARN architecture
Resource Manager (RM)
Application Master (AM)
Node Manager (NM)
YARN clients
Developing YARN applications
Writing YARN clients
Writing the Application Master entity
Monitoring YARN
Job scheduling in YARN
CapacityScheduler
FairScheduler
YARN commands
User commands
Administration commands
Summary
Chapter 7. Storm on YARN – Low Latency Processing in Hadoop
Batch processing versus streaming
Apache Storm
Architecture of an Apache Storm cluster
Computation and data modeling in Apache Storm
Use cases for Apache Storm
Developing with Apache Storm
Apache Storm 0.9.1
Storm on YARN
Installing Apache Storm-on-YARN
Installation procedure
Summary
Chapter 8. Hadoop on the Cloud
Cloud computing characteristics
Hadoop on the cloud
Amazon Elastic MapReduce (EMR)
Provisioning a Hadoop cluster on EMR
Summary
Chapter 9. HDFS Replacements
HDFS – advantages and drawbacks
Amazon AWS S3
Hadoop support for S3
Implementing a filesystem in Hadoop
Implementing an S3 native filesystem in Hadoop
Summary
Chapter 10. HDFS Federation
Limitations of the older HDFS architecture
Architecture of HDFS Federation
Benefits of HDFS Federation
Deploying federated NameNodes
HDFS high availability
Secondary NameNode, Checkpoint Node, and Backup Node
High availability – edits sharing
Useful HDFS tools
Three-layer versus four-layer network topology
HDFS block placement
Pluggable block placement policy
Summary
Chapter 11. Hadoop Security
The security pillars
Authentication in Hadoop
Kerberos authentication
The Kerberos architecture and workflow
Kerberos authentication and Hadoop
Authentication via HTTP interfaces
Authorization in Hadoop
Authorization in HDFS
Limiting HDFS usage
Service-level authorization in Hadoop
Data confidentiality in Hadoop
HTTPS and encrypted shuffle
Audit logging in Hadoop
Summary
Chapter 12. Analytics Using Hadoop
Data analytics workflow
Machine learning
Apache Mahout
Document analysis using Hadoop and Mahout
Term frequency
Document frequency
Term frequency – inverse document frequency
Tf-Idf in Pig
Cosine similarity distance measures
Clustering using k-means
K-means clustering using Apache Mahout
RHadoop
Summary
Chapter 13. Hadoop for Microsoft Windows
Deploying Hadoop on Microsoft Windows
Prerequisites
Building Hadoop
Configuring Hadoop
Deploying Hadoop
Summary
Appendix A. Bibliography
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜