Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note

For ease of use and to facilitate migration of existing applications, the ElasticSearch indexing inherits Lucene’s configuration.

...

From there, the ElasticSearch indexation brings new functionalities and some behavioural changes.

Sub-object indexing

First, it is possible to add subobject properties to the full text index of a parent object. This feature is limited to one level of depth.

...

  • Set the Image object as "indexed",

  • Set the "photographer" property of the image object as "Indexed" or "Indexed and stored",

  • Configure the fields to be retrieved from the photographer object as "name","first name", etc.

Note

There is no need to index the object "photographer". This makes it possible to retrieve images by photographer but not to do a full text search directly on "photographer".

Remember: this indexing of sub-objects is limited to one level. Starting from the previous example, this means that if the photographer has a child on camera, it is not possible to do a full text search on photos made with a type of camera unless you reference that camera directly from the photo.

Note

The re-indexing of the full text index is automatic and it is therefore not necessary to go to the ElasticSearch administration (/admin/fulltext) to start a re-indexation manually.

Supported properties

Only text, file, image and date fields are indexed.

...

If there is a need to index this type of field like id, it is necessary to denormalize the information in the object and index this denormalization. For example, to allow to find an object by its id via a full text search, it is enough to denormalize the id in a searchableId field by prefixing it with a' #'. This then allows you to search for' #1234' to find an instance with id 1234.

Note

There is no need to implement the above example because it is a native part of the full text search syntax that will be discussed below.

Full text indexing of file contents

Wedia supports indexing the content of binaries whose "indexed" property has been checked in the "Structure" administration interface. The supported file types are:

  • doc

  • docx

  • xls

  • xlsx

  • ppt

  • pptx

  • pptx

  • rtf

  • txt

  • pdf

  • wxml (content of InDesign files when the Wedia instance is connected to an InDesign Server)

Language management

ElasticSearch indexation is natively multilingual. This multi-lingual management allows you to independently configure the language of indexing and property search by property. Then, any full text search is analyzed field by field in the indexing language of this field to obtain relevant results. For example, if a title field is indexed in French and titleen in English, the search will be analyzed in French before it is done on the title field and in English on the titleen field. Result: stopwords english will delete' car' from the search on title which will not return results. Only the English results will be released.

There are different levels of indexing/search language configuration. They are listed below by decreasing priority. The former are therefore given priority.

  1. If a property has a label been tagged with elasticsearch/analyzer/[analyzer name] (e. g. elasticsearch/analyzer/simple), this parser is forced to be used regardless of language. Consult the documentation of your version of Elasticsearch to know the available analyzers.

  2. If a property has a label been tagged with elasticsearch/default_lang/[lang/[language code] (e. g. elasticsearch/default_lang/en) this language is taken.

  3. If the property is an internationalization: the language is extracted from its suffix. (e. g. titreen is an internationalisation of the title analysed in English).

  4. If the object has an elasticsearch/default_lang/[lang/[language code] tag, this language is taken.

  5. Finally, we take the default indexing language configured in the administration.

Note

Any modification of this language information automatically leads to a re-indexation of the contents.

SKU, product references, identifier indexation

When the property contains numbers, separators like “_”, “/”… the ElasticSearch standard analyzer will remove these numbers in it search index.

To index these type of properties, tag it with the elasticsearch/analyzer/words_and_numbers tag.

...

If you want to search a part of a SKU, for example that you query matches ABC-123 and ABC-124, don’t forget to use the tokens in your query : ABC*. If you query “ABC”, il will only match the exact string.

Using a exact keyword Analyzer from ElasticSearch

If you need to index the EXACT string in the property, you can use an existing Analyzer from ElasticSearch such as “keyword” : https://www.elastic.co/guide/en/elasticsearch/reference/6.8/analysis-keyword-analyzer.html

To index these type of properties, tag it with the elasticsearch/analyzer/keyword tag.