300 Indexation

Note

For ease of use and to facilitate migration of existing applications, the ElasticSearch indexing inherits Lucene’s configuration.

Object properties marked as "indexed" or "indexed and stored" in objects are searchable by full text search. The others are excluded.

The full text search of the back-office is only available for objects marked as indexed.

To sum-up: in order for full text searches to be possible on the title and text fields of an article object, the article object must be configured as "indexed" and its properties "title" and "text" must be configured as "indexed" or "indexed and stored".

From there, the ElasticSearch indexation brings new functionalities and some behavioural changes.

Sub-object indexing

First, it is possible to add subobject properties to the full text index of a parent object. This feature is limited to one level of depth.

For example: if the object "Image" has a property of type' child',' childmultiXXX' or' collection' pointing to another object' Photographer', it is possible to find the images taken by a photographer via the full text search. To configure this behaviour, it is necessary:

Set the Image object as "indexed",
Set the "photographer" property of the image object as "Indexed" or "Indexed and stored",
Configure the fields to be retrieved from the photographer object as "name","first name", etc.

Note

There is no need to index the object "photographer". This makes it possible to retrieve images by photographer but not to do a full text search directly on "photographer".

Remember: this indexing of sub-objects is limited to one level. Starting from the previous example, this means that if the photographer has a child on camera, it is not possible to do a full text search on photos made with a type of camera unless you reference that camera directly from the photo.

Note

The re-indexing of the full text index is automatic and it is therefore not necessary to go to the ElasticSearch administration (/admin/fulltext) to start a re-indexation manually.

Supported properties

Only text, file, image and date fields are indexed.

Even if fields id, integer, decimal, money are configured as "indexed", they will not be taken into account in full text searches.

By excluding these property types, we avoid corrupting the full text index with irrelevant information in full text searches.

If there is a need to index this type of field like id, it is necessary to denormalize the information in the object and index this denormalization. For example, to allow to find an object by its id via a full text search, it is enough to denormalize the id in a searchableId field by prefixing it with a' #'. This then allows you to search for' #1234' to find an instance with id 1234.

Note

There is no need to implement the above example because it is a native part of the full text search syntax that will be discussed below.

Full text indexing of file contents

Wedia supports indexing the content of binaries whose "indexed" property has been checked in the "Structure" administration interface. The supported file types are:

doc
docx
xls
xlsx
ppt
pptx
pptx
rtf
txt
pdf
wxml (content of InDesign files when the Wedia instance is connected to an InDesign Server)

Language management

ElasticSearch indexation is natively multilingual. This multi-lingual management allows you to independently configure the language of indexing and property search by property. Then, any full text search is analyzed field by field in the indexing language of this field to obtain relevant results. For example, if a title field is indexed in French and titleen in English, the search will be analyzed in French before it is done on the title field and in English on the titleen field. Result: stopwords english will delete' car' from the search on title which will not return results. Only the English results will be released.

There are different levels of indexing/search language configuration. They are listed below by decreasing priority. The former are therefore given priority.

If a property has a label elasticsearch/analyzer/[analyzer name] (e. g. elasticsearch/analyzer/simple), this parser is forced to be used regardless of language. Consult the documentation of your version of Elasticsearch to know the available analyzers.
If a property has a label elasticsearch/default_lang/[lang/[language code] (e. g. elasticsearch/default_lang/en) this language is taken.
If the property is an internationalization: the language is extracted from its suffix. (e. g. titreen is an internationalisation of the title analysed in English).
If the object has an elasticsearch/default_lang/[lang/[language code] tag, this language is taken.
Finally, we take the default indexing language configured in the administration.

Note

Any modification of this language information automatically leads to a re-indexation of the contents.