Concept "Document Text Analysis"

From dataspects::Wiki
Jump to navigation Jump to search

What makes up an "Analyzer"?

Character filtering

  • E.g. strip HTML tags or "I love u 2" → "I love you too", "&" → "and"


Token filtering

  • E.g. lowercasing

Token indexing

How are analyzers (to be used by fields) specified?

(Analyzer usage by fields is specified in mapping.)

  • for the index
  • for a node
  • for a type
  • for a field
  • for a query
  • globally: in Elasticsearch's configuration file

What performs analysis?

 TypePerforms analysis?
Concept "Full text queries > match_phrase (aka proximity)"QueryTypetrue
Concept "full_text/match_phrase_prefix"QueryTypetrue
Concept "full_text/query_string"QueryTypetrue