Configuration "Wikitext Search"
C0927114931
Jump to navigation
Jump to search
|
We are interested in 2 things: the MediaWiki page name and the full wikitext.
Content to MongoDB
They are fed to MongoDB as follows:
- MediaWiki page name →
namespace
andpagename
- full wikitext →
full.wikitext
Indexed to Elasticsearch
Subsequently these fields are indexed as follows:
namespace
andpagename
→HasEntityName: "#{pageMongodoc['namespace']}#{pageMongodoc['pagename']}"
full.wikitext
→HasEntityContentTEXT: pageMongodoc['full']["wikitext"]
Index Settings and Mappings
The index uses the following settings and mappings:
Settings | Mappings |
---|---|
{
"analysis": {
"analyzer": {
"camel_case": {
"type": "pattern",
"pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])"
},
"my_ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "my_ngram_tokenizer",
"filter": "lowercase"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
}
}
}
|
{
"properties": {
"HasEntityName": {
"type": "text",
"analyzer": "camel_case"
},
"HasEntityContentTEXT": {
"type": "text",
"fielddata": true,
"analyzer": "my_ngram_tokenizer_analyzer",
"term_vector": "with_positions_offsets_payloads",
"copy_to": [
"completion"
]
}
}
}
|
Query
The query is:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"default_field": "HasEntityName",
"query": "{{query.params.queryString}}",
"boost": "10"
}
},
{
"query_string": {
"default_field": "HasEntityContentTEXT",
"query": "{{query.params.queryString}}"
}
}
]
}
}
}