万本电子书0元读

万本电子书0元读

顶部广告

Mastering Elasticsearch - Second Edition电子书

售       价:¥

4人正在读 | 0人评论 9.8

作       者:Rafa? Ku?

出  版  社:Packt Publishing

出版时间:2015-02-27

字       数:921.3万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book is for Elasticsearch users who want to extend their knowledge and develop new skills. Prior knowledge of the Query DSL and data indexing is expected.
目录展开

Mastering Elasticsearch Second Edition

Table of Contents

Mastering Elasticsearch Second Edition

Credits

About the Author

Acknowledgments

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Elasticsearch

Introducing Apache Lucene

Getting familiar with Lucene

Overall architecture

Getting deeper into Lucene index

Norms

Term vectors

Posting formats

Doc values

Analyzing your data

Indexing and querying

Lucene query language

Understanding the basics

Querying fields

Term modifiers

Handling special characters

Introducing Elasticsearch

Basic concepts

Index

Document

Type

Mapping

Node

Cluster

Shard

Replica

Key concepts behind Elasticsearch architecture

Workings of Elasticsearch

The startup process

Failure detection

Communicating with Elasticsearch

Indexing data

Querying data

The story

Summary

2. Power User Query DSL

Default Apache Lucene scoring explained

When a document is matched

TF/IDF scoring formula

Lucene conceptual scoring formula

Lucene practical scoring formula

Elasticsearch point of view

An example

Query rewrite explained

Prefix query as an example

Getting back to Apache Lucene

Query rewrite properties

Query templates

Introducing query templates

Templates as strings

The Mustache template engine

Conditional expressions

Loops

Default values

Storing templates in files

Handling filters and why it matters

Filters and query relevance

How filters work

Bool or and/or/not filters

Performance considerations

Post filtering and filtered query

Choosing the right filtering method

Choosing the right query for the job

Query categorization

Basic queries

Compound queries

Not analyzed queries

Full text search queries

Pattern queries

Similarity supporting queries

Score altering queries

Position aware queries

Structure aware queries

The use cases

Example data

Basic queries use cases

Searching for values in range

Simplified query for multiple terms

Compound queries use cases

Boosting some of the matched documents

Ignoring lower scoring partial queries

Not analyzed queries use cases

Limiting results to given tags

Efficient query time stopwords handling

Full text search queries use cases

Using Lucene query syntax in queries

Handling user queries without errors

Pattern queries use cases

Autocomplete using prefixes

Pattern matching

Similarity supporting queries use cases

Finding terms similar to a given one

Finding documents with similar field values

Score altering queries use cases

Favoring newer books

Decreasing importance of books with certain value

Pattern queries use cases

Matching phrases

Spans, spans everywhere

Structure aware queries use cases

Returning parent documents having a certain nested document

Affecting parent document score with the score of nested documents

Summary

3. Not Only Full Text Search

Query rescoring

What is query rescoring?

An example query

Structure of the rescore query

Rescore parameters

Choosing the scoring mode

To sum up

Controlling multimatching

Multimatch types

Best fields matching

Cross fields matching

Most fields matching

Phrase matching

Phrase with prefixes matching

Significant terms aggregation

An example

Choosing significant terms

Multiple values analysis

Significant terms aggregation and full text search fields

Additional configuration options

Controlling the number of returned buckets

Background set filtering

Minimum document count

Execution hint

More options

There are limits

Memory consumption

Shouldn't be used as top-level aggregation

Counts are approximated

Floating point fields are not allowed

Documents grouping

Top hits aggregation

An example

Additional parameters

Relations between documents

The object type

The nested documents

Parent–child relationship

Parent–child relationship in the cluster

A few words about alternatives

Scripting changes between Elasticsearch versions

Scripting changes

Security issues

Groovy – the new default scripting language

Removal of MVEL language

Short Groovy introduction

Using Groovy as your scripting language

Variable definition in scripts

Conditionals

Loops

An example

There is more

Scripting in full text context

Field-related information

Shard level information

Term level information

More advanced term information

Lucene expressions explained

The basics

An example

There is more

Summary

4. Improving the User Search Experience

Correcting user spelling mistakes

Testing data

Getting into technical details

Suggesters

Using the _suggest REST endpoint

Understanding the REST endpoint suggester response

Including suggestion requests in query

The term suggester

Configuration

Common term suggester options

Additional term suggester options

The phrase suggester

Usage example

Configuration

Basic configuration

Configuring smoothing models

Configuring candidate generators

Configuring direct generators

The completion suggester

The logic behind the completion suggester

Using the completion suggester

Indexing data

Querying data

Custom weights

Additional parameters

Improving the query relevance

Data

The quest for relevance improvement

The standard query

The multi match query

Phrases comes into play

Let's throw the garbage away

Now, we boost

Performing a misspelling-proof search

Drill downs with faceting

Summary

5. The Index Distribution Architecture

Choosing the right amount of shards and replicas

Sharding and overallocation

A positive example of overallocation

Multiple shards versus multiple indices

Replicas

Routing explained

Shards and data

Let's test routing

Indexing with routing

Routing in practice

Querying

Aliases

Multiple routing values

Altering the default shard allocation behavior

Allocation awareness

Forcing allocation awareness

Filtering

What include, exclude, and require mean

Runtime allocation updating

Index level updates

Cluster level updates

Defining total shards allowed per node

Defining total shards allowed per physical server

Inclusion

Requirement

Exclusion

Disk-based allocation

Query execution preference

Introducing the preference parameter

Summary

6. Low-level Index Control

Altering Apache Lucene scoring

Available similarity models

Setting a per-field similarity

Similarity model configuration

Choosing the default similarity model

Configuring the chosen similarity model

Configuring the TF/IDF similarity

Configuring the Okapi BM25 similarity

Configuring the DFR similarity

Configuring the IB similarity

Configuring the LM Dirichlet similarity

Configuring the LM Jelinek Mercer similarity

Choosing the right directory implementation – the store module

The store type

The simple filesystem store

The new I/O filesystem store

The MMap filesystem store

The hybrid filesystem store

The memory store

Additional properties

The default store type

The default store type for Elasticsearch 1.3.0 and higher

The default store type for Elasticsearch versions older than 1.3.0

NRT, flush, refresh, and transaction log

Updating the index and committing changes

Changing the default refresh time

The transaction log

The transaction log configuration

Near real-time GET

Segment merging under control

Choosing the right merge policy

The tiered merge policy

The log byte size merge policy

The log doc merge policy

Merge policies' configuration

The tiered merge policy

The log byte size merge policy

The log doc merge policy

Scheduling

The concurrent merge scheduler

The serial merge scheduler

Setting the desired merge scheduler

When it is too much for I/O – throttling explained

Controlling I/O throttling

Configuration

The throttling type

Maximum throughput per second

Node throttling defaults

Performance considerations

The configuration example

Understanding Elasticsearch caching

The filter cache

Filter cache types

Node-level filter cache configuration

Index-level filter cache configuration

The field data cache

Field data or doc values

Node-level field data cache configuration

Index-level field data cache configuration

The field data cache filtering

Adding field data filtering information

Filtering by term frequency

Filtering by regex

Filtering by regex and term frequency

The filtering example

Field data formats

String-based fields

Numeric fields

Geographical-based fields

Field data loading

The shard query cache

Setting up the shard query cache

Using circuit breakers

The field data circuit breaker

The request circuit breaker

The total circuit breaker

Clearing the caches

Index, indices, and all caches clearing

Clearing specific caches

Summary

7. Elasticsearch Administration

Discovery and recovery modules

Discovery configuration

Zen discovery

Multicast Zen discovery configuration

The unicast Zen discovery configuration

Master node

Configuring master and data nodes

Configuring data-only nodes

Configuring master-only nodes

Configuring the query processing-only nodes

The master election configuration

Zen discovery fault detection and configuration

The Amazon EC2 discovery

The EC2 plugin installation

The EC2 plugin's generic configuration

Optional EC2 discovery configuration options

The EC2 nodes scanning configuration

Other discovery implementations

The gateway and recovery configuration

The gateway recovery process

Configuration properties

Expectations on nodes

The local gateway

Low-level recovery configuration

Cluster-level recovery configuration

Index-level recovery settings

The indices recovery API

The human-friendly status API – using the Cat API

The basics

Using the Cat API

Common arguments

The examples

Getting information about the master node

Getting information about the nodes

Backing up

Saving backups in the cloud

The S3 repository

The HDFS repository

The Azure repository

Federated search

The test clusters

Creating the tribe node

Using the unicast discovery for tribes

Reading data with the tribe node

Master-level read operations

Writing data with the tribe node

Master-level write operations

Handling indices conflicts

Blocking write operations

Summary

8. Improving Performance

Using doc values to optimize your queries

The problem with field data cache

The example of doc values usage

Knowing about garbage collector

Java memory

The life cycle of Java objects and garbage collections

Dealing with garbage collection problems

Turning on logging of garbage collection work

Using JStat

Creating memory dumps

More information on the garbage collector work

Adjusting the garbage collector work in Elasticsearch

Using a standard start up script

Service wrapper

Avoid swapping on Unix-like systems

Benchmarking queries

Preparing your cluster configuration for benchmarking

Running benchmarks

Controlling currently run benchmarks

Very hot threads

Usage clarification for the Hot Threads API

The Hot Threads API response

Scaling Elasticsearch

Vertical scaling

Horizontal scaling

Automatically creating replicas

Redundancy and high availability

Cost and performance flexibility

Continuous upgrades

Multiple Elasticsearch instances on a single physical machine

Preventing the shard and its replicas from being on the same node

Designated nodes' roles for larger clusters

Query aggregator nodes

Data nodes

Master eligible nodes

Using Elasticsearch for high load scenarios

General Elasticsearch-tuning advices

Choosing the right store

The index refresh rate

Thread pools tuning

Adjusting the merge process

Data distribution

Advices for high query rate scenarios

Filter caches and shard query caches

Think about the queries

Using routing

Parallelize your queries

Field data cache and breaking the circuit

Keeping size and shard_size under control

High indexing throughput scenarios and Elasticsearch

Bulk indexing

Doc values versus indexing speed

Keep your document fields under control

The index architecture and replication

Tuning write-ahead log

Think about storage

RAM buffer for indexing

Summary

9. Developing Elasticsearch Plugins

Creating the Apache Maven project structure

Understanding the basics

The structure of the Maven Java project

The idea of POM

Running the build process

Introducing the assembly Maven plugin

Creating custom REST action

The assumptions

Implementation details

Using the REST action class

The constructor

Handling requests

Writing response

The plugin class

Informing Elasticsearch about our REST action

Time for testing

Building the REST action plugin

Installing the REST action plugin

Checking whether the REST action plugin works

Creating the custom analysis plugin

Implementation details

Implementing TokenFilter

Implementing the TokenFilter factory

Implementing the class custom analyzer

Implementing the analyzer provider

Implementing the analysis binder

Implementing the analyzer indices component

Implementing the analyzer module

Implementing the analyzer plugin

Informing Elasticsearch about our custom analyzer

Testing our custom analysis plugin

Building our custom analysis plugin

Installing the custom analysis plugin

Checking whether our analysis plugin works

Summary

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部