Table of Contents

Preprocessing

What is preprocessing?

Preprocessing in larod can be used to process input data so that it has the format, size and shape a neural network expects. For optimal performance the processing operations can be offloaded to specialized preprocessing hardware. Each supported preprocessing hardware accelerator is exposed to applications through a larodDevice struct in larod.

Currently only image processing operations are supported.

Preprocessing job configuration

Preprocessing jobs are configured with key-value parameters in a larodMap. The configuration describes the data that you have and how you want the data to be. The selected backend will crop, scale and convert according to what the description requires.

Below is an example job configuration. It describes a job that takes a 1280x720 input image in NV12 format, crops out 200x200 from the center of the image (X offset 540 and Y offset 260), converts it to the RGB interleaved format and scales it down to 48x48. In this case the libyuv backend will be used to perform these operations. In the interest of brevity error handling has been omitted.

larodMap* modelParams = larodCreateMap(NULL);
larodMapSetStr(modelParams, "image.input.format", "nv12", NULL);
larodMapSetIntArr2(modelParams, "image.input.size", 1280, 720, NULL);
larodMapSetStr(modelParams, "image.output.format", "rgb-interleaved", NULL);
larodMapSetIntArr2(modelParams, "image.output.size", 48, 48, NULL);
// Our modelParams larodMap replaces the model fd as a model description.
larodModel *model = larodLoadModel(conn, -1,
                                   larodGetDevice(conn, "cpu-proc", 0, NULL),
                                   LAROD_ACCESS_PRIVATE, "model-name",
                                   modelParams, NULL);
larodMap* jobParams = larodCreateMap(NULL);
// We can change the value of "image.input.crop" in our map between running jobs
// on the same model if we wish to.
larodMapSetIntArr4(jobParams, "image.input.crop", 540, 260, 200, 200, NULL);
larodJobRequest* jobReq = larodCreateJobRequest(...);
larodSetJobRequestParams(jobReq, jobParams, NULL);
larodRunJob(conn, jobReq, NULL);

Note that should one be interested in just scaling the original image (from 1280x720 to 48x48) without cropping it first, one could simply neglect to provide a larodMap altogether in the larodJobRequest used.

Common operations

Image preprocessing backends may support the following common image processing operations. Backends are not required to support all operations and image formats.

Operation	Description
Image crop	Crop out a part of an image.
Image scale	Scale up or down an image.
Image convert	Convert an image between two color formats.

Common configuration parameters

Image preprocessing backends may support the common image processing parameters in the tables below to describe processing jobs. Backends are not required to support all parameters and values.

Load model parameters

The following are parameters that can be set on a larodMap provided when loading a model using e.g. larodLoadModel.

Key	Value
image.input.format*	String, describing input image format.
image.input.size*	2-integer-tuple, describing input image width and height.
image.input.row-pitch	Integer, describing input image width including padding, in bytes. Inferred if not explicilty given.
image.output.format*	String, describing output image format.
image.output.size*	2-integer-tuple, describing output image width and height.
image.output.row-pitch	Integer, describing output image width including padding, in bytes. Inferred if not explicitly given.

*: This parameter is mandatory for all preprocessing backends outlined in this document.

Job request parameters

The following are parameters that can be set on a larodMap of a larodJobRequest. The map can be attached to a job request upon its creation (larodCreateJobRequest) or later using larodSetJobRequestParams. The parameters of the map will then be used in a subsequent call to e.g. larodRunJob using this job request.

Since these parameters are not attached to a model it's possible to send job requests having larodMaps with different values for these parameters to the same model.

Key	Value
image.input.crop	4-integer-tuple, describing the crop window. The elements in the tuple are: X offset in input image, Y offset in input image, crop window width, crop window height.

Supported backends

Currently the following image preprocessing backends/devices are supported by larod.

libyuv

The libyuv backend uses the open source library libyuv. It runs on most CPUs and in particular uses the SIMD technology Neon on Arm architectures to accelerate parallelizable computation. It supports image crop, scale and format conversion.

The device name of this backend is "cpu-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().

libyuv backend constraints

The width, height and row pitch, for both the input and the output image, must be a multiple of 2.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the libyuv backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor) the libyuv backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

ACE

The ACE backend uses Axis Compute Engine in Axis ARTPEC series chips. It only supports image format conversion.

The device name of this backend is "axis-ace-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().

ACE backend constraints

The input image width must be a multiple of 8.
The input image height must be 4 or larger.
The input image row pitch must be equal to input image width.
The output image row pitch must be equal to output image width.
The input image size must be small enough so that width*height < 4,194,304.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the ACE backend to map (using mmap) the tensor's file descriptor instead of reading or writing directly from it. Combined with tensor tracking (e.g. using larodTrackTensor) the ACE backend may be able to cache a tensor's mapping. The backend does not support zero copy, meaning that data will still be copied from the memory mapping to the actual buffer for the job.

The access type LAROD_FD_PROP_READWRITE will do a memory copy through read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

VProc

The VProc backend uses VPROC in Ambarella CV series chips. It supports image crop, scale and format conversion.

The device name of this backend is "ambarella-cvflow-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().

VProc backend constraints

The input image width must be a multiple of 2 when input format is nv12.
The input image height must be a multiple of 2 when input format is nv12.
The input image row pitch must be a multiple of 32.
The output image width must be a multiple of 2 when output format is nv12.
The output image height must be a multiple of 2 when output format is nv12.
The output image row pitch must be a multiple of 32.
For operations requiring both color format conversion and scaling the scale factor must be at most 4.
For operations not requiring color format conversion the scale factor must be at most 256.

Supported buffer properties for running jobs

This backend supports the fd access types LAROD_FD_PROP_DMABUF and LAROD_FD_PROP_READWRITE.

The access type LAROD_FD_PROP_DMABUF provides less overhead since the buffer will be passed directly the underlying processing framework without extra copies in larod. When using LAROD_FD_PROP_DMABUF, the application is responsible for ensuring that the external RAM is up to date with CPU cache before the inference is started (the service will not initiate any cache flush operations). Refer to About dma-buf for more information about dma-buf and user space synchronization.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE is not set as required in the call, then tensors with mappable file descriptors based on Cavalry Mem dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set. If however LAROD_FD_PROP_READWRITE is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

OpenCL

OpenCL is a compute framework which enables programmers to write programs that execute across heterogeneous platforms such as CPUs, GPUs and more. larod contains predefined OpenCL programs which lets a larod user through its OpenCL backend conveniently run image crop, scale and format conversion.

The platform larod runs on may have several devices supporting the OpenCL framework. larod can run its operations on any of these devices; each OpenCL device has a unique device name.

Choosing a specific device

There are currently three available OpenCL backends; these are "axis-a7-gpu-proc", "axis-a8-dlpu-proc" and "axis-a8-gpu-proc". A device handle to one of these backends can be retrieved by providing the respective device name to larodGetDevice().

The names of the three device are rather self explanatory; it is possible to run OpenCL on the GPU for Artpec-7 and both GPU and DLPU on Artpec-8 on platforms using one of these SoCs.

Supported buffer properties for running jobs

The backends only support the fd access type LAROD_FD_PROP_READWRITE.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Supported operations

The following table describes supported operations for each backend.

Backend	crop	convert	scale
libyuv	Yes	Yes	Yes
ACE		Yes
VProc	Yes	Yes	Yes
OpenCL	Yes	Yes	Yes

Supported formats

Operations requiring a color format conversion

The following table describes supported input formats for operations requiring a color format conversion.

Backend	nv12	rgb-interleaved
libyuv	Yes	Yes
ACE	Yes
VProc	Yes
OpenCL	Yes

The following table describes supported output formats for operations requiring a color format conversion.

Backend	nv12	rgb-interleaved	rgb-planar
libyuv	Yes	Yes	Yes
ACE		Yes
VProc			Yes
OpenCL		Yes

Operations not requiring a color format conversion

The following table describes supported image formats for operations not requiring a color format conversion, i.e. the input and output formats are identical. This could be e.g. a pure scaling operation.

Backend	nv12	rgb-interleaved	rgb-planar
libyuv	Yes	Yes
VProc	Yes		Yes
OpenCL	Yes

Supported buffer properties

This is an overview of what file descriptor properties are supported by the various preprocessing backends. Note that the LAROD_FD_PROP_ prefix have been omitted from the table headers in the interest of brevity. Please see larod.h for more info about the LAROD_FD_PROP_* flags.

When running jobs

Please note that though several properties may be supported by a backend, a tensor buffer supplied for running a job need only have at least one of the backend's supported properties to be usable for the job. Having said that, each property comes with different implications on memory access performance.

Input tensors

Backend	READWRITE	MAP	DMABUF
libyuv	Yes	Yes
ACE	Yes	Yes
VProc	Yes	Yes	Yes
OpenCL	Yes

Output tensors

Backend	READWRITE	MAP	DMABUF
libyuv	Yes	Yes
ACE	Yes	Yes
VProc	Yes	Yes	Yes
OpenCL	Yes

When allocating tensors

Please note that though several properties may be supported by a backend, it may not be possible to allocate buffers having all the properties at the same time.

Input tensors

Backend	READWRITE	MAP	DMABUF
libyuv	Yes	Yes
ACE	Yes	Yes
VProc	Yes	Yes	Yes
OpenCL	Yes	Yes

Output tensors

Backend	READWRITE	MAP	DMABUF
libyuv	Yes	Yes
ACE	Yes	Yes
VProc	Yes	Yes	Yes
OpenCL	Yes	Yes