万本电子书0元读

万本电子书0元读

顶部广告

Practical Real-time Data Processing and Analytics电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Shilpi Saxena,Saurabh Gupta

出  版  社:Packt Publishing

出版时间:2017-09-28

字       数:37.9万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario About This Book ? Learn about the various challenges in real-time data processing and use the right tools to overcome them ? This book covers popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems ? A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time Who This Book Is For If you are a Java developer who would like to be equipped with all the tools required to devise an end-to-end practical solution on real-time data streaming, then this book is for you. Basic knowledge of real-time processing would be helpful, and knowing the fundamentals of Maven, Shell, and Eclipse would be great. What You Will Learn ? Get an introduction to the established real-time stack ? Understand the key integration of all the components ? Get a thorough understanding of the basic building blocks for real-time solution designing ? Garnish the search and visualization aspects for your real-time solution ? Get conceptually and practically acquainted with real-time analytics ? Be well equipped to apply the knowledge and create your own solutions In Detail With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible. This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you’ll be equipped with a clear understanding of how to solve challenges on your own. We’ll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You’ll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case. By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner. Style and Approach In this practical guide to real-time analytics, each chapter begins with a basic high-level concept of the topic, followed by a practical, hands-on implementation of each concept, where you can see the working and execution of it. The book is written in a DIY style, with plenty of practical use cases, well-explained code examples, and relevant screenshots and diagrams.
目录展开

Title Page

Copyright

Practical Real-Time Data Processing and Analytics

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Introducing Real-Time Analytics

What is big data?

Big data infrastructure

Real–time analytics – the myth and the reality

Near real–time solution – an architecture that works

NRT – The Storm solution

NRT – The Spark solution

Lambda architecture – analytics possibilities

IOT – thoughts and possibilities

Edge analytics

Cloud – considerations for NRT and IOT

Summary

Real Time Applications – The Basic Ingredients

The NRT system and its building blocks

Data collection

Stream processing

Analytical layer – serve it to the end user

NRT – high-level system view

NRT – technology view

Event producer

Collection

Broker

Transformation and processing

Storage

Summary

Understanding and Tailing Data Streams

Understanding data streams

Setting up infrastructure for data ingestion

Apache Kafka

Apache NiFi

Logstash

Fluentd

Flume

Taping data from source to the processor - expectations and caveats

Comparing and choosing what works best for your use case

Do it yourself

Setting up Elasticsearch

Summary

Setting up the Infrastructure for Storm

Overview of Storm

Storm architecture and its components

Characteristics

Components

Stream grouping

Setting up and configuring Storm

Setting up Zookeeper

Installing

Configuring

Standalone

Cluster

Running

Setting up Apache Storm

Installing

Configuring

Running

Real-time processing job on Storm

Running job

Local

Cluster

Summary

Configuring Apache Spark and Flink

Setting up and a quick execution of Spark

Building from source

Downloading Spark

Running an example

Setting up and a quick execution of Flink

Build Flink source

Download Flink

Running example

Setting up and a quick execution of Apache Beam

Beam model

Running example

MinimalWordCount example walk through

Balancing in Apache Beam

Summary

Integrating Storm with a Data Source

RabbitMQ – messaging that works

RabbitMQ exchanges

Direct exchanges

Fanout exchanges

Topic exchanges

Headers exchanges

RabbitMQ setup

RabbitMQ — publish and subscribe

RabbitMQ – integration with Storm

AMQPSpout

PubNub data stream publisher

String together Storm-RMQ-PubNub sensor data topology

Summary

From Storm to Sink

Setting up and configuring Cassandra

Setting up Cassandra

Configuring Cassandra

Storm and Cassandra topology

Storm and IMDB integration for dimensional data

Integrating the presentation layer with Storm

Setting up Grafana with the Elasticsearch plugin

Downloading Grafana

Configuring Grafana

Installing the Elasticsearch plugin in Grafana

Running Grafana

Adding the Elasticsearch datasource in Grafana

Writing code

Executing code

Visualizing the output on Grafana

Do It Yourself

Summary

Storm Trident

State retention and the need for Trident

Transactional spout

Opaque transactional Spout

Basic Storm Trident topology

Trident internals

Trident operations

Functions

map and flatMap

peek

Filters

Windowing

Tumbling window

Sliding window

Aggregation

Aggregate

Partition aggregate

Persistence aggregate

Combiner aggregator

Reducer aggregator

Aggregator

Grouping

Merge and joins

DRPC

Do It Yourself

Summary

Working with Spark

Spark overview

Spark framework and schedulers

Distinct advantages of Spark

When to avoid using Spark

Spark – use cases

Spark architecture - working inside the engine

Spark pragmatic concepts

RDD – the name says it all

Spark 2.x – advent of data frames and datasets

Summary

Working with Spark Operations

Spark – packaging and API

RDD pragmatic exploration

Transformations

Actions

Shared variables – broadcast variables and accumulators

Broadcast variables

Accumulators

Summary

Spark Streaming

Spark Streaming concepts

Spark Streaming - introduction and architecture

Packaging structure of Spark Streaming

Spark Streaming APIs

Spark Streaming operations

Connecting Kafka to Spark Streaming

Summary

Working with Apache Flink

Flink architecture and execution engine

Flink basic components and processes

Integration of source stream to Flink

Integration with Apache Kafka

Example

Integration with RabbitMQ

Running example

Flink processing and computation

DataStream API

DataSet API

Flink persistence

Integration with Cassandra

Running example

FlinkCEP

Pattern API

Detecting pattern

Selecting from patterns

Example

Gelly

Gelly API

Graph representation

Graph creation

Graph transformations

DIY

Summary

Case Study

Introduction

Data modeling

Tools and frameworks

Setting up the infrastructure

Implementing the case study

Building the data simulator

Hazelcast loader

Building Storm topology

Parser bolt

Check distance and alert bolt

Generate alert Bolt

Elasticsearch Bolt

Complete Topology

Running the case study

Load Hazelcast

Generate Vehicle static value

Deploy topology

Start simulator

Visualization using Kibana

Summary

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部