Taste of Tech Topics

Acroquest Technology株式会社のエンジニアが書く技術ブログ

Elastic{ON} 2017 レポート 3日目 | What will be improved with new Elasticsearch releases #elasticon

Elastic{ON}2017 レポートのまとめはこちら!!

 Hello! Again.
This is Aung Satt and today, I will share summary of attending to "Elasticsearch Search Improvements" held on last day of Elastic{ON} 2017.

The big categories were

  1. Removing _all field
  2. Unified Highlighter
  3. Multi-token Synonyms

As details,

1. Removing _all field
In today's session, _all field will soon have the time to say good-bye.
You may already know that _all field is added for search ability for you to search even if you don't know anything about mappings.
Although that _all was useful, it has drawbacks which Elastic should not ignore to deal with for performance factor.

The reasons why _ all will be removed are as follow.

  1. Date is duplicated in _all and your other fields.
  2. Numeric data does not compress well since _all is interpreted as a string.
  3. _all has only one analyzer and does not use the pre-field analysis and when querying.
  4. Highlighting faces problems since _all is not being a real field.

So, for 5.1.1+ version of Kibana, _all field might be seen as default-disable and if owner wanna use it will need to change configuration setting.
And this will be absolutely removed after when 6.0 of ELK is released.
By removing _all field, you may get faster performance for indexing since you will not face restrictions in your party.

2. Unified-Highlighter
Highlighter, too, was improved for its performance. Even though there were three Highlighter with three different features such as Query analysis, Sniping , and Scoring.
So that kind of different highlighters make things difficult to maintain.
So, here Unified-Highlighter in order to reduce overheads of previous Highlighter and make things simple to manage.

3. Multi-token Synonyms

This is a big change for elasticsearch, because Lucene Analyzer token stream will give you a problem for your search when you give synonyms or multi-synonyms name to you data.

Since the problem was shown as below. When you add synonyms, previous Analyzer cannot decide synonyms and search words relation properly.

But on near future release, that problem will be solved since the concept and development architecture will be changed as similar as Graph, and thus the relations between data-documents can be checked well.


Fig : Problem with Lucence's Token String and Synonyms


Fig : Problem with Lucence's Token String and Multiple Synonyms


Fig : Concept to resolve Token String and Synonyms problems

It is interesting to watch how elastic will improve its performance upon search when it is released with the features I wrote above.
Thank you.

Elastic{ON}2017 レポートのまとめはこちら!!

Acroquest Technologyでは、キャリア採用を行っています。

  • ビッグデータHadoop/Spark、NoSQL)、データ分析(Elasticsearch、Python関連)、Web開発(SpringCloud/SpringBoot、AngularJS)といった最新のOSSを利用する開発プロジェクトに関わりたい。
  • マイクロサービスDevOpsなどの技術を使ったり、データ分析機械学習などのスキルを活かしたい。
  • 社会貢献性の高いプロジェクトや、顧客の価値を創造するようなプロジェクトで、提案からリリースまで携わりたい。
  • 書籍・雑誌等の執筆や、対外的な勉強会の開催・参加を通した技術の発信、社内勉強会での技術情報共有により、エンジニアとして成長したい。

 
少しでも上記に興味を持たれた方は、是非以下のページをご覧ください。

Elasticsearchを仕事で使いこみたいデータ分析エンジニア募集中! - Acroquest Technology株式会社のエンジニア中途・インターンシップ・契約・委託の求人 - Wantedlywww.wantedly.com