liblarod
3.1.28
|
Table of Contents
Preprocessing in larod can be used to process input data so that it has the format, size and shape a neural network expects. For optimal performance the processing operations can be offloaded to specialized preprocessing hardware. Each supported preprocessing hardware accelerator is exposed to applications through a larodDevice
struct in larod.
Currently only image processing operations are supported.
Preprocessing jobs are configured with key-value parameters in a larodMap
. The configuration describes the data that you have and how you want the data to be. The selected backend will crop, scale and convert according to what the description requires.
Below is an example job configuration. It describes a job that takes a 1280x720 input image in NV12 format, crops out 200x200 from the center of the image (X offset 540 and Y offset 260), converts it to the RGB interleaved format and scales it down to 48x48. In this case the libyuv backend will be used to perform these operations. In the interest of brevity error handling has been omitted.
Note that should one be interested in just scaling the original image (from 1280x720 to 48x48) without cropping it first, one could simply neglect to provide a larodMap
altogether in the larodJobRequest
used.
Image preprocessing backends may support the following common image processing operations. Backends are not required to support all operations and image formats.
Operation | Description |
---|---|
Image crop | Crop out a part of an image. |
Image scale | Scale up or down an image. |
Image convert | Convert an image between two color formats. |
Image preprocessing backends may support the common image processing parameters in the tables below to describe processing jobs. Backends are not required to support all parameters and values.
The following are parameters that can be set on a larodMap
provided when loading a model using e.g. larodLoadModel
.
Key | Value |
---|---|
image.input.format* | String, describing input image format. |
image.input.size* | 2-integer-tuple, describing input image width and height. |
image.input.row-pitch | Integer, describing input image width including padding, in bytes. Inferred if not explicilty given. |
image.output.format* | String, describing output image format. |
image.output.size* | 2-integer-tuple, describing output image width and height. |
image.output.row-pitch | Integer, describing output image width including padding, in bytes. Inferred if not explicitly given. |
*: This parameter is mandatory for all preprocessing backends outlined in this document.
The following are parameters that can be set on a larodMap
of a larodJobRequest
. The map can be attached to a job request upon its creation (larodCreateJobRequest
) or later using larodSetJobRequestParams
. The parameters of the map will then be used in a subsequent call to e.g. larodRunJob
using this job request.
Since these parameters are not attached to a model it's possible to send job requests having larodMaps with different values for these parameters to the same model.
Key | Value |
---|---|
image.input.crop | 4-integer-tuple, describing the crop window. The elements in the tuple are: X offset in input image, Y offset in input image, crop window width, crop window height. |
Currently the following image preprocessing backends/devices are supported by larod.
The libyuv backend uses the open source library libyuv. It runs on most CPUs and in particular uses the SIMD technology Neon on Arm architectures to accelerate parallelizable computation. It supports image crop, scale and format conversion.
The device name of this backend is "cpu-proc"
; a device handle can be retrieved by providing this device name to larodGetDevice()
.
This backend only supports the fd access types LAROD_FD_PROP_MAP
and LAROD_FD_PROP_READWRITE
.
As the name indicates the former fd prop will allow the libyuv backend to map (using mmap
) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor
) the libyuv backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.
The access type LAROD_FD_PROP_READWRITE
will introduce a memory copy and read()
/write()
calls for each input and output tensor buffer - these extra operations will degrade performance.
Tensors allocated using the calls larodAllocModelInputs()
and larodAllocModelOutputs()
with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE
and LAROD_FD_PROP_MAP
set.
The ACE backend uses Axis Compute Engine in Axis ARTPEC series chips. It only supports image format conversion.
The device name of this backend is "axis-ace-proc"
; a device handle can be retrieved by providing this device name to larodGetDevice()
.
This backend only supports the fd access types LAROD_FD_PROP_MAP
and LAROD_FD_PROP_READWRITE
.
As the name indicates the former fd prop will allow the ACE backend to map (using mmap
) the tensor's file descriptor instead of reading or writing directly from it. Combined with tensor tracking (e.g. using larodTrackTensor
) the ACE backend may be able to cache a tensor's mapping. The backend does not support zero copy, meaning that data will still be copied from the memory mapping to the actual buffer for the job.
The access type LAROD_FD_PROP_READWRITE
will do a memory copy through read()
/write()
calls for each input and output tensor buffer - these extra operations will degrade performance.
Tensors allocated using the calls larodAllocModelInputs()
and larodAllocModelOutputs()
with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE
and LAROD_FD_PROP_MAP
set.
The VProc backend uses VPROC in Ambarella CV series chips. It supports image crop, scale and format conversion.
The device name of this backend is "ambarella-cvflow-proc"
; a device handle can be retrieved by providing this device name to larodGetDevice()
.
This backend supports the fd access types LAROD_FD_PROP_DMABUF
and LAROD_FD_PROP_READWRITE
.
The access type LAROD_FD_PROP_DMABUF
provides less overhead since the buffer will be passed directly the underlying processing framework without extra copies in larod. When using LAROD_FD_PROP_DMABUF
, the application is responsible for ensuring that the external RAM is up to date with CPU cache before the inference is started (the service will not initiate any cache flush operations). Refer to About dma-buf for more information about dma-buf and user space synchronization.
The access type LAROD_FD_PROP_READWRITE
will introduce a memory copy and read()
/write()
calls for each input and output tensor buffer - these extra operations will degrade performance.
Using the calls larodAllocModelInputs()
and larodAllocModelOutputs()
with a model loaded to this backend two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE
is not set as required in the call, then tensors with mappable file descriptors based on Cavalry Mem dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP
and LAROD_FD_PROP_DMABUF
set. If however LAROD_FD_PROP_READWRITE
is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE
and LAROD_FD_PROP_MAP
set.
OpenCL is a compute framework which enables programmers to write programs that execute across heterogeneous platforms such as CPUs, GPUs and more. larod contains predefined OpenCL programs which lets a larod user through its OpenCL backend conveniently run image crop, scale and format conversion.
The platform larod runs on may have several devices supporting the OpenCL framework. larod can run its operations on any of these devices; each OpenCL device has a unique device name.
There are currently three available OpenCL backends; these are "axis-a7-gpu-proc"
, "axis-a8-dlpu-proc"
and "axis-a8-gpu-proc"
. A device handle to one of these backends can be retrieved by providing the respective device name to larodGetDevice()
.
The names of the three device are rather self explanatory; it is possible to run OpenCL on the GPU for Artpec-7 and both GPU and DLPU on Artpec-8 on platforms using one of these SoCs.
The backends only support the fd access type LAROD_FD_PROP_READWRITE
.
Tensors allocated using the calls larodAllocModelInputs()
and larodAllocModelOutputs()
with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE
and LAROD_FD_PROP_MAP
set.
The following table describes supported operations for each backend.
Backend | crop | convert | scale |
---|---|---|---|
libyuv | Yes | Yes | Yes |
ACE | Yes | ||
VProc | Yes | Yes | Yes |
OpenCL | Yes | Yes | Yes |
The following table describes supported input formats for operations requiring a color format conversion.
Backend | nv12 | rgb-interleaved | rgb-planar |
---|---|---|---|
libyuv | Yes | Yes | |
ACE | Yes | ||
VProc | Yes | ||
OpenCL | Yes |
The following table describes supported output formats for operations requiring a color format conversion.
Backend | nv12 | rgb-interleaved | rgb-planar |
---|---|---|---|
libyuv | Yes | Yes | Yes |
ACE | Yes | ||
VProc | Yes | ||
OpenCL | Yes |
The following table describes supported image formats for operations not requiring a color format conversion, i.e. the input and output formats are identical. This could be e.g. a pure scaling operation.
Backend | nv12 | rgb-interleaved | rgb-planar |
---|---|---|---|
libyuv | Yes | Yes | |
VProc | Yes | Yes | |
OpenCL | Yes |
This is an overview of what file descriptor properties are supported by the various preprocessing backends. Note that the LAROD_FD_PROP_
prefix have been omitted from the table headers in the interest of brevity. Please see larod.h
for more info about the LAROD_FD_PROP_*
flags.
Please note that though several properties may be supported by a backend, a tensor buffer supplied for running a job need only have at least one of the backend's supported properties to be usable for the job. Having said that, each property comes with different implications on memory access performance.
Backend | READWRITE | MAP | DMABUF |
---|---|---|---|
libyuv | Yes | Yes | |
ACE | Yes | Yes | |
VProc | Yes | Yes | Yes |
OpenCL | Yes |
Backend | READWRITE | MAP | DMABUF |
---|---|---|---|
libyuv | Yes | Yes | |
ACE | Yes | Yes | |
VProc | Yes | Yes | Yes |
OpenCL | Yes |
Please note that though several properties may be supported by a backend, it may not be possible to allocate buffers having all the properties at the same time.
Backend | READWRITE | MAP | DMABUF |
---|---|---|---|
libyuv | Yes | Yes | |
ACE | Yes | Yes | |
VProc | Yes | Yes | Yes |
OpenCL | Yes | Yes |
Backend | READWRITE | MAP | DMABUF |
---|---|---|---|
libyuv | Yes | Yes | |
ACE | Yes | Yes | |
VProc | Yes | Yes | Yes |
OpenCL | Yes | Yes |