Concept "Document Text Analysis"

From Dataspects
Jump to: navigation, search

What makes up an "Analyzer"?[edit | edit source]

Character filtering[edit | edit source]

  • E.g. strip HTML tags or "I love u 2" → "I love you too", "&" → "and"

Tokenization[edit | edit source]

Token filtering[edit | edit source]

  • E.g. lowercasing

Token indexing[edit | edit source]

How are analyzers (to be used by fields) specified?[edit | edit source]

(Analyzer usage by fields is specified in mapping.)

  • for the index
  • for a node
  • for a type
  • for a field
  • for a query
  • globally: in Elasticsearch's configuration file

What performs analysis?[edit | edit source]

 TypePerforms analysis?
Concept "Full text queries > match_phrase (aka proximity)"QueryTypetrue
Concept "full_text/match_phrase_prefix"QueryTypetrue
Concept "full_text/query_string"QueryTypetrue