Metadata and AI extraction with DAM_Utils

Metadata and AI extraction with DAM_Utils

General concepts of Metadata extraction and AI setup.

When inserting a new asset, Wedia can extract and populate metadata from the asset itself, or a AI analysis of the asset.

This process is handled by the PACKAGED_DAM_Utils. In its configuration file, you can setup automatic rules that will feed you metadata with information coming from the asset.

Some top level parameters are needed to setup which data objects should be part of the processing, the property that does actually holds the asset binary, and the extraction / transformation rules.

This plugin also performs other tasks such as :

  • activating the creative review on DAM Assets

  • restricting in the file view which variation are displayed…

Plugin Parameters

  • dam_objects_selector (String) : This is a selector for data structures that should be processed. Typical setup include either the objectname (damimport,transfer,massimportitem), or a tag prefixed by a # to select the data structures (“#damobject”) .

  • dam_resource_field_selector (String) : Selector for the property that stores the binary file : ‘binary’

  • async_denorm_thread_pool_size (int): since 2021.4 Maximum number of threads for asynchronous operations

  • dam_denormalization_config (JSON) : A long JSON that will describes the extraction and transformations that will be applied to an asset.

  • enable_async_transforms (Boolean) : Activates the Async transformations

  • dam_denormalization_force_update (Boolean) : Will activate the transformations at each save of the asset.

This property, when set to true, will overwrite metadata of an existing asset with IPTC, EXIF information, this is frequent issue in development setup.

 

Other options :

  • dam_enable_creareview_objects_selector (String) : In some cases, you would like to activate the “Comment” feature coming from the Creative Review for DAM assets. This is the data structure selector if you want to activate this feature. The dam_resource_field_selector will be used to find the binary property.

     

  • dam_fileview_config (JSON) : Most often, you do not want all variations visible in the FileView : this JSON property allows for selecting which variation you want to display for each role.

  • dam_denorm_buttons (JSON) : If you need to show a button to manually trigger the metadata extraction, this JSON will setup for which asset and which role it will show up in the toolbar.

  • dam_denorm_buttons_bypass_security (Boolean) : This allows to bypass security rules to display the “Extract Metadata” button.

Setting Up metadata Extraction.

Understanding selectors.

Selectors can reference data structures or properties based on their names or tags.

A selector can be a comma separated list.

A selector can be setup with a data structure name, a property, or a tag attached to the data structure or the field, with a # prefix :

Sample :

A structure selector such as #damobject,damimport will select all the objects named damimport, or all objects with a tag damobject attached.

How metadata extract works.

The plugin relies on extensions points AbstractObjetcTriggerBusinessServices that the Engine API offers to trigger the automatic metadata extraction based on the plugin configuration.

Creating :

When a object instance which structure name matches with dam_objects_selector, the following process is triggered :

  • On the hook onBeforeInsert_beforeValidate the synchronized transforms are performed. This means that the updated properties are available for the next hooks. If you implement other triggers on these objects, do use a hook that starts after onBeforeInsert_beforeValidate. If you need to use the hook onBeforeInsert_beforeValidate, with, in your process, the result of the extractions of the plugin, then setup the priority order of the plugins in the plugin interface.

  • On the hook onAfterInsert_after, the async transforms are started, in a separate process. The data instance is unlocked immediately. When the async process is over, the data instance is updated without triggering any hook.

Updating :

When a object instance which structure name matches with dam_objects_selector, and the property name setup in dam_resource_field_selector, or the option dam_denormalization_force_update is set to true.

  • On the hook onBeforeUpdate_beforeValidate the synchronized transforms are performed. This means that the updated properties are available for the next hooks. If you implement other triggers on these objects, do use a hook that starts after onBeforeInsert_beforeValidate. If you need to use the hook onBeforeUpdate_beforeValidate, with, in your process, the result of the extractions of the plugin, then setup the priority order of the plugins in the plugin interface.

  • On the hook onAfterUpdate_after, the async transforms are started, in a separate process. The data instance is unlocked immediately. When the async process is over, the data instance is updated without triggering any hook.

Setting up the transformations

The value to be applied to each property of an instance is defined by executing a chain of transformations. Each transformation receives a context object allowing to get the current user, the modified object, the previous object (in case of update).

Transformations can also obtain the result of the execution of nested transformations. This architecture allows to realize very simple classes of transformations and to adapt to different requests by chaining the transformations.

The JSON configuration object

The configuration of the transformation rules is done by defining a JSON object. The plugin uses the following properties:

aliases

An object allowing to define aliases to transformation classes.

Aliases are here for convenience: rather than repetitively writing the full class name, it is easier to just use the class alias.

For each entry in this object, the key is the name of the alias. The value can be:

  • A string : the fully qualified class name : “the.complete.name.of.the.transformation.class"

  • Since 11.25 An array of 1 or 2 string : [“the.complete.name.of.the.transformation.class“, “ThePluginNameThatHoldsTheTransformationClass“]. If the plugin name is omitted, the class will be loaded from the current plugin.

  • Since 11.25 An Object with the following properties:

    • class a String : the fully qualified class name : “the.complete.name.of.the.transformation.class"

    • plugin a String : the plugin name that holds the class.

presets

An array that defines common transformation chains.
The choice of an array definition may seem impractical, but this structure guarantees an orderly definition of presets. It is thus possible to reference another preset in a preset as long as the referenced preset has been defined before in the array.

Each entry in the array is a JSON object describing a preset.

  • key (required) : the name to give to the preset

  • class or classAlias : respectively the full name of the class or the name of the class alias to be used (one of the two is mandatory. class has priority)

  • init (optional) a JSON element (free) to initialize the transform

  • input (optional) a JSON object or a JSON array to define the transformation(s) that will be used as input.

transformations

A JSON array describing the transformations to be carried out. Each entry in the array contains an object with 2 properties :

  • objectSelector : a String allowing to define a selector of the objects which must apply the transformations

  • propertiesTransforms an array to define transformations on properties. Each entry of the array is an object containing 2 properties :

    • fieldSelector: a String allowing to define a selector of the property on which the transformations must be applied.

    • transformers the array of transformations to be applied. Each entry in this array is an object describing either a preset or a transformation

      • A transformation is defined with following properties:

        • class or classAlias, the full name of the class or the alias name of the class to be used (one of the two is mandatory. class has priority)

        • init (optional) a JSON element (free) to initialize the transform

        • input (optional) a JSON object or a JSON array to define the transformation(s) that will be used as input.

      • A preset is defined with following properties:

        • preset, the name of the preset.

  • Since 11.26 workflowTrigger (optional) a string or an array of string giving the ordered list of workflow actions that can be ran after the transformations. Only first valid action will be ran.

  • Since 11.27 preventGuard (optional) a preset or transformation resolving to a boolean. Allows to condition the execution of propertiesTransforms and/or workflowTrigger to the result of the execution of this guard
    If guard is not defined OR guard does not resolve to a BooleanOR resolved value is false, propertiesTransforms will be ran, workflowTrigger will be executed
    If guard resolves totrue OR guard execution throws an Exception, none of propertiesTransforms will be ran, workflowTrigger will not be executed.

asyncTransformations

identical to transformations. The transformations described in this section are performed asynchronously.

manualTransformations Since 11.25

identical to transformations. The transformations described in this input are performed only by a manual call.

Example

{ "aliases": { "resourceFromFieldPattern": "com.wedia.packaged.dam.triggers.datatransformers.impl.PropertyToResourceTransformer", "resourceToDimension": "com.wedia.packaged.dam.triggers.datatransformers.impl.ResourceToDimensionTransformer", "dimensionProperty": "com.wedia.packaged.dam.triggers.datatransformers.impl.DimensionToPropertyTransformer", "resourceProperty": "com.wedia.packaged.dam.triggers.datatransformers.impl.ResourceToPropertyTransformer" }, "presets": [ { "key": "getBinary", "classAlias": "resourceFromFieldPattern" }, { "key": "getDimensions", "classAlias": "resourceToDimension", "input": { "preset": "getBinary" } } ], "transformations": [ { "objectSelector": "#damobject,damimport", "propertiesTransforms": [ { "fieldSelector": "filesize", "transformers": [ { "classAlias": "resourceProperty", "init": "filesize", "input": { "preset": "getBinary" } } ] } ] } ], "asyncTransformations": [ { "objectSelector": "#damobject,damimport", "propertiesTransforms": [ { "fieldSelector": "width", "transformers": [ { "classAlias": "dimensionProperty", "init": "pxwidth", "input": { "preset": "getDimensions" } } ] }, { "fieldSelector": "height", "transformers": [ { "classAlias": "dimensionProperty", "init": "pxheight", "input": { "preset": "getDimensions" } } ] } ] } ] }

In the example above,

  • we declare 4 aliases, so we can write our presets or transformations by referencing the aliases rather than the full names of the classes. Thus, the alias resourceFromFieldPattern references the class "com.wedia.packaged.dam.triggers.datatransformers.impl.PropertyToResourceTransformer".

  • 2 presets are declared:

  • getBinary which will call the transformation referenced by the alias resourceFromFieldPattern

  • getDimensions which will call the transformation referenced by the alias resourceToDimension with as input the result of the transformation referenced by the preset getBinary

  • A synchronous transformation is declared

  • We declare 2 asynchronous transformations

Out of the box Transformations

This is the standard list of the Transformations offered by the plugin :

  • booleanToActivated

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.BooleanToActivatedTransformer

    • init : none

    • input : a transformation returning java.lang.Boolean

    • output : java.lang.String Value for a property of type child and nature activated (true → "1", false → "2"). If the input is not a Boolean, we parse in Boolean (if input is "true" then the output will be 1). If the input is null, null is returned (and can be processed by the caller).

  • blurHash

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.BlurhashTransformer

    • init : org.json.JSONObject (optional) may contain :

      • baseVariations : String or array of strings ; the base variations used to compute the hash. By default: ["photo", "poster", "thumbnailtiny"]

      • components: Integer or array of Integer, the number of components to be included in the hash. If an array is provided, the first one represents the number of components on the X axis, the second one on the Y axis. Default: 4

    • output : java.lang.String the asset blurhash

  • dimensionProperty

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.DimensionToPropertyTransformer

    • init : java.lang.String (required)

    • input : a transformation returning a com.noheto.image.interfaces.IDimension object (cf resourceToDimension)

    • output : According to the past init :

      • pxwidth : java.lang.Long the width in pixels com.noheto.image.interfaces.IDimension.getPixelWidth

      • pxheight : java.lang.Long the height en pixelcom.noheto.image.interfaces.IDimension.getPixelHeight

      • duration : java.lang.Double the duration in secondscom.noheto.image.interfaces.IDimension.getDurationVideo

      • colorspace : java.lang.String The color spacecom.noheto.image.interfaces.IDimension.getColorspace

      • ratio : java.lang.Double the image ratio or -1 if width or height <= 0

      • xdpi : java.lang.Integer pixels density on X axiscom.noheto.image.interfaces.IDimension.getXDpi

      • ydpi : java.lang.Integerpixels density on Y axis com.noheto.image.interfaces.IDimension.getYDpi

  • dimensionTag

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.DimensionToTagValueTransformer

    • init : java.lang.String (required), the name of the tag to return (ex: “exif:artist”)

    • input : a transformation returning a com.noheto.image.interfaces.IDimension object (cf resourceToDimension)

    • output : java.lang.String the value of the tag passed in init. Returns null if the value of the tag is empty or null.

  • dimensionMultiTag

    • class: com.wedia.packaged.dam.triggers.datatransformers.impl.DimensionToMultiTagValue

    • init: java.lang.String (required), the name of the tag to return (ex: “iptc:2:25”)

    • input: a transformation returning a com.noheto.image.interfaces.IDimension object (cf resourceToDimension)

    • output : java.util.List<java.lang.String> containing the value of the tag as well as those of tags with the same prefix (ex: “iptc:2:25_1”, “iptc:2:25_2”, …)

  • duplicatesFinder

    • class: com.wedia.packaged.dam.triggers.datatransformers.impl.DuplicatesFinderTransformer

    • init: org.json.JSONObject or java.lang.String or null

      • if a JSONObject, valid properties are:

        • objects: java.lang.String a selector on structures to check (default: #damobject)

        • properties: java.lang.String a coma separated list of properties to check (default: phdiff)

        • Since 11.28 properties can be defined as a

          • JSONObject. Each key is treated as a fieldSelector. Value is a boolean defining if a proximity search should be done on this field

          • JSONArray, each value can be

            • a String

            • a JSONObject (as above)

          • default value Since 11.28 is: {"phavg,phdiff": true, "sha": false}

      • if a String, the value will define properties to check

    • input: none

    • output: A JSONArray containing found duplicates. Each item in the array is a JSON object containing:

      • $resource: String, name of the structure having a duplicate of current instance

      • id: Long, id of instance in the structure

  • parseExifDate

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.ExifToDateTransformer

    • init : java.lang.String name of the tag to read (don't forget the prefix "exif:")

    • input : a transformation returning a com.noheto.image.interfaces.IDimension object (cf resourceToDimension)

    • output : java.util.Date if the parse did not fail. Otherwise, the input is returned.

  • parseBoolean

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.StringToBooleanTransformer

    • init : none

    • input : a transformation returning a java.lang.String

    • output : the result of a Boolean.parseBoolean on the String passed in parameter. Returns the input if the input is not a String

  • resourceFromFieldPattern

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.PropertyToResourceTransformer

    • init : java.lang.String if null or empty, we use the value of the dam_resource_field_selector parameter of the plugin

    • input : none

    • output : com.noheto.remote.interfaces.IResourceDimensionExtendedobtained using the first field which is validated by the selector in init and which is a file or image type

  • resourceToDimension

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.ResourceToDimensionTransformer

    • init : none

    • input : a transformation returning a com.noheto.remote.interfaces.IResourceDimensionExtended(cf resourceFromFieldPattern)

    • output : com.noheto.image.interfaces.IDimension the IDimension object obtained from the input resource

  • resourceProperty

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.ResourceToPropertyTransformer

    • init : java.lang.String (required)

    • input : a transformation returning a com.noheto.remote.interfaces.IResourceDimensionExtended (cf resourceFromFieldPattern)

    • output : According to the input init :

      • filesize : java.lang.Long the size of the resource in bytescom.noheto.remote.interfaces.IResourceDimensionExtended.getSize

      • contenttype : java.lang.String the content-type of the resourcecom.noheto.remote.interfaces.IResourceDimensionExtended.getContentType

      • pagecount : java.lang.Integer the number of pages in the resourcecom.noheto.image.interfaces.IImaging.getNumberOfPages

      • imagehash : java.lang.String the hash of the imagecom.noheto.image.interfaces.IImaging.getHashImage

      • sha256 : java.lang.String the sha256 of the resource (beware this processing is expensive)

      • phavg : java.lang.Long the average perceptual hash of the resource com.noheto.image.interfaces.IImaging.getHashImageAverage

      • phdiff : java.lang.Long the hash of perceptual difference resource com.noheto.image.interfaces.IImaging.getHashImageDifference

  • asChild

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.StringToChild

    • init :

      • java.lang.String the property that will host the value to create in a Wedia structure (by default "name")

      • Since 11.25, it is possible to pass a JSON object which will be extended with the following values:

        • lookupProperty (default to: “name”) the name of the property that will host the value to create in a Wedia structure.

        • createIfNotExist (default to: true) do we must create the instance if the instance is not found ?

        • withoutSecurity (default to false) must override the security at the creation of the target instance (we use the surfer who is at the origin of the denormalization)

        • withoutTriggers (default to false) should we ignore the triggers ?

    • input : A transformation returning java.lang.String ou unejava.util.Collection<java.lang.String>
      Since 11.25 also accepts :

      • com.wedia.packaged.utils.i18n.ITranslated

      • com.wedia.packaged.utils.i18n.ITranslatable

      • java.util.Collection<com.wedia.packaged.utils.i18n.ITranslated>

      • java.util.Collection<com.wedia.packaged.utils.i18n.ITranslatable>
        which allow to create target instances in different locales (i18n fields management)

    • output : wsnoheto.engine.IObjectReadOnly or java.util.List<wsnoheto.engine.IObjectReadOnly>For each value of the input, search in the nature of the transformed property the instance whose property passed in init has this value. If not found, create the instance (with the current user).

  • resourcePath

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.SelectorToPath

    • init : init : java.lang.String (optional Since 11.25) if null or empty, the value of the dam_resource_field_selector parameter of the plugin is used.

    • input : none

    • output : java.lang.String representing the server path to the resource

  • regexExtract

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.RegExpPatternMatcher

      • .matches() is used in the java service, so it is important that the entire input must match the regular expression.

      • The regular expression should be a capturing one with “( )”, and as said before, it should be made for the entire input (by using .*).

        • Example : Regex to match the first 7 characters of your input : "^(.{7}).*"

    • init : java.lang.String (a regular expression) or an org.json.JSONObjectthat can define :

      • regexp : java.lang.String a regular expression

      • outputNullIfNoMatch : boolean (default false)

      • outputFirstNotEmptyGroup : boolean (default false)

    • input : a transformation returning java.lang.String

    • output : Checks the regular expression on the input. Depending on the options :

      • If the regular expression does not matches, returns null if outputNullIfNoMatch is true, returns the input otherwise

      • If the regular expression match

        • returns the value of the first group capturing non null and non empty if outputFirstNotEmptyGroup (group 0 excluded)

        • returns the value of index group 1 otherwise

 

  • relFinder

    • class : com.wedia.packaged.dam.triggers.datatransformers.impl.KeyToChildTransformer