Lua logo

Preface

This is the reference manual of MoonCL, which is a Lua binding library for the Khronos OpenCL API. [1]

It is assumed that the reader is familiar with both OpenCL and the Lua programming language.

For convenience of reference, this document contains external (deep) links to the Lua Reference Manual and the OpenCL Reference Pages.

Getting and installing

For installation intructions, refer to the README file in the MoonCL official repository on GitHub.

Module organization

The MoonCL module is loaded using Lua’s require() and returns a table containing the functions it provides (as usual with Lua modules). This manual assumes that such table is named cl, i.e. that it is loaded with:

 cl = require("mooncl")

but nothing forbids the use of a different name.

Examples

Complete examples can be found in the examples/ directory of the release package.

License

MoonCL is released under the MIT/X11 license (same as Lua, and with the same only requirement to give proper credits to the original author). The copyright notice is in the LICENSE file in the base directory of the official repository on GitHub.

See also

MoonCL is part of MoonLibs, a collection of Lua libraries for graphics and audio programming.

Introduction

MoonCL is an (almost) one-to-one Lua binding library to the OpenCL API. It provide means to implement the host part of OpenCL applications using Lua instead of C or C++, with all its pros and cons.

This section gives a brief overview, while the details of the bindings are given in the sections that follow.

As a general rule, OpenCL API functions are bound to by MoonCL functions whose names are snake_case versions of the original ones (for example, clGetPlatformIDs( ) is bound to by cl.get_platform_ids( )).

If not stated otherwise, on error all MoonCL functions raise a Lua error. If needed, this behaviour can be overridden by wrapping function calls in the standard Lua pcall( ).

MoonCL binds OpenCL objects (platform, device, etc.) to Lua userdata, which are returned by the creating or getter functions (cl.get_platform_ids( ), cl.create_context( ), etc) and are then used to refer to objects in Lua in the same way as one would use OpenCL handles in C.

In the rest of this manual we will refer to userdata bound to OpenCL objects as to just 'objects', or as 'MoonCL objects' (vs. 'OpenCL objects') when there is need for disambiguation.

Objects are garbage collected at exit (which includes on error), and automatically released at the OpenCL level, so there is no need to explicitly invoke the bindings to clReleaseXxx( ) at exit for cleanup.

Apart from at exit, however, objects are not automatically garbage collected [2] and one must release them explicitly when needed, e.g. to release resources when the application is not exiting and some objects are no longer needed.

Releasing an object causes the automatic (pre) destruction of all its children objects, and the invalidation of any reference to the object and to its children. [3]

OpenCL structs (and lists, and arrays) used to pass parameters and results across the OpenCL API are mapped in MoonCL to tables, having more or less the same contents as their C counterparts but again with snake_case named fields. Enumerations are mapped to/from sets of string literals, while flags bitmasks are represented as plain integers encoded in the same way as in C. More details are given in the respective sections of this document (structs, enums, flags).

In addition to the bindings to the OpenCL API, which are described in the sections that follow, MoonCL also provides a few other utilities and object 'methods' that do not correspond to OpenCL functions. These are described mainly in the 'Miscellanea' subsections.

Objects

The following tree shows the MoonCL (OpenCL) objects and their parent-child relationships.

platform (cl_platform_id)
├─ device (cl_device_id)
│   └─ sub device (cl_device_id)
└─ context (cl_context)
    ├─ queue (cl_command_queue)
    ├─ program (cl_program)
    │   └─ kernel (cl_kernel)
    ├─ event (cl_event)
    ├─ buffer (cl_mem)
    │   └─ sub buffer (cl_mem)
    ├─ image (cl_mem)
    ├─ pipe (cl_mem)
    ├─ sampler (cl_sampler)
    └─ svm (void*)
hostmem (host accessible memory)

platform

  • boolean = is_extension_supported(platform|device, extensionname)
    extensionname: string (e.g. 'cl_khr_gl_sharing')
    Returns true if the given extension is supported by the given platform or device, false otherwise.
    Rfr: EXTENSION.

device

  • boolean, endianness = check_endianness(device)
    Returns true if device has the same endianness of the host, false otherwise.
    The second return value is a string indicating the endianness of the device ('little' or 'big').

context

queue

program

  • program = create_program_with_source(context, source)
    program = create_program_with_sourcefile(context, filename)
    source: string containing the program source code,
    filename: name of a file containing the program source code.
    Rfr: clCreateProgramWithSource.

  • build_program(program, [{device}], [options])
    options: string.
    Raises an error if the underlying call to clBuildProgram( ) fails.
    If the error is a 'build program failure', also collects and prints the logs from all the devices in the given list.
    Rfr: clBuildProgram.


  • program, {errmsg} = create_program_with_binary(context, {device}, {binary})
    Returns nil followed by a list of error messages (one per device/binary) if an invalid binary is encountered for any device.
    Rfr: clCreateProgramWithBinary.


  • ok, errmsg = build_program_(program, [{device}], [options])
    options: string.
    Returns false followed by an error message if the underlying call to clBuildProgram( ) fails, otherwise it returns true.
    Rfr: clBuildProgram.

  • ok, errmsg = compile_program_(program, [{device}], [options], [headers, headernames])
    options: string,
    headers: {program},
    headernames: {string}.
    Returns false followed by an error message if the underlying call to clCompileProgram( ) fails, otherwise it returns true.
    Rfr: clCompileProgram.

kernel

  • set_kernel_arg(kernel, argindex, size, [ptr])
    set_kernel_arg(kernel, argindex, nil, data)
    set_kernel_arg(kernel, argindex, primtype, value)
    set_kernel_arg(kernel, argindex, primtype, value1, …​, valueN)
    set_kernel_arg(kernel, argindex, primtype, {value1, …​, valueN})
    set_kernel_arg(kernel, argindex, buffer)
    set_kernel_arg(kernel, argindex, image)
    set_kernel_arg(kernel, argindex, pipe)
    set_kernel_arg(kernel, argindex, sampler)
    set_kernel_arg(kernel, argindex, queue)
    argindex: 0-based argument index,
    ptr : lightuserdata containing a valid void* to size bytes of data, or nil to allocate size bytes of local memory,
    data: binary string,
    value, valuei: integer or number (according to primtype).
    Rfr: clSetKernelArg.

  • set_kernel_arg_svm_pointer(kernel, argindex, ptr)
    set_kernel_arg_svm_pointer(kernel, argindex, svm, offset)
    argindex: 0-based argument index.
    Rfr: clSetKernelArgSVMPointer.

event

Event objects can be created in two ways: with the create_user_event( ) function (user events), or by setting the ge parameter to true when issuing a command with a enqueue_xxx( ) call (command events).

  • set_user_event_status(event, status)
    status: 'complete' or negative integer error code.
    (Note: the default status for event is 'submitted', and this function can be called on event only once.)
    Rfr: clSetUserEventStatus.

  • value = get_event_profiling_info(event, profilinginfo)
    To use this function, the command identified by event must have been enqueued in a command queue created with the 'profiling enabled' flag set.
    Rfr: clGetEventProfilingInfo.

  • set_event_callback(event, type)
    status = check_event_callback(event, type)
    type: 'submitted', 'running', or 'complete'.
    status: boolean | integer (error code).
    The set_event_callback( ) function registers a C callback that, when called by the OpenCL driver, just stores the execution status information passed to it. The application can then poll the status information using the check_event_callback( ) function.
    Rfr: clSetEventCallback.

buffer

  • buffer = create_buffer(context, memflags, size, [ptr])
    ptr: lightuserdata,
    If the 'use host ptr' or the 'copy host ptr' flags are set, then ptr must contain a valid pointer (void*) to at least size bytes of host memory.
    (Such a pointer can be obtained, for example, using a hostmem object.)
    Rfr: clCreateBuffer.

image

pipe

GL objects

svm

The following methods are also available for the svm object type:

  • size = svm:size( )

  • alignment = svm:alignment( )

  • ptr = svm:ptr( )
    Returns a lightuserdata encapsulating a pointer to the beginning of the SVM.

  • memflags = svm:memflags( )

sampler

hostmem

An hostmem object encapsulates (a pointer to) host accessible memory, with methods to access it from Lua and to retrieve pointers to any of its locations in form of lightuserdata.

(Note that hostmem objects are specific to MoonCL, i.e. they do not correspond to OpenCL objects).

The memory encapsulated by an hostmem object may be either host memory allocated via the cl.malloc( ) or the cl.aligned_alloc( ) functions, or memory obtained by other means (e.g. mapped memory or shared virtual memory) and passed to the cl.hostmem( ) constructor.

Hostmem objects are automatically deleted at exit, but they may also be deleted manually via the cl.free( ) function (or the corresponding method).

Hostmem objects come in handy, for example, when creating buffers, to back them with host memory, and when using asynchronous commands to read from, write to, or map/unmap OpenCL memory objects, i.e. in those operations where a pointer to a memory area must be passed to or is returned by OpenCL.

  • hostmem = malloc(size)
    hostmem = malloc(data)
    hostmem = malloc(primtype, {value1, …​, valueN})
    hostmem = malloc(primtype, value1, …​, valueN)
    Allocates host memory and creates an hostmem object to encapsulate it.
    malloc(size), where size is an integer, allocates size bytes of contiguous memory and initializes them to 0;
    malloc(data), where data is a binary string, allocates #data bytes of contiguous memory and initializes them with the contents of data;
    malloc(primtype, …​) is functionally equivalent to malloc(cl.pack(primtype, …​)).

  • hostmem = aligned_alloc(alignment, size)
    hostmem = aligned_alloc(alignment, data)
    hostmem = aligned_alloc(alignment, primtype, {value1, …​, valueN})
    hostmem = aligned_alloc(alignment, primtype, value1, …​, valueN)
    Same as malloc( ), with the additional alignment parameter to control memory address alignment (rfr. aligned_alloc(3)).

  • hostmem = hostmem(size, ptr)
    hostmem = hostmem(data)
    Creates a hostmem object encapsulating user provided memory.
    Such memory may be provided in form of a lightuserdata (ptr) containing a valid pointer to size bytes of contiguous memory, or as a binary string (data).
    In both cases, care must be taken that the memory area remains valid until the hostmem object is deleted, or at least until it is accessed via its methods. In the data case, this means ensuring that data is not garbage collected during the hostmem object lifetime.
    (Note that malloc(data) and hostmem(data) differ in that the former allocates memory and copies data in it, while the latter just stores a pointer to data).

  • free(hostmem)
    hostmem:free( )
    Deletes the hostmem object. If hostmem was created with cl.malloc( ) or cl.aligned_alloc( ), this function also releases the encapsulated memory.

  • ptr = hostmem:ptr([offset=0], [nbytes=0])
    Returns a pointer (lightuserdata) to the location at offset bytes from the beginning of the encapsulated memory.
    Raises an error if the requested location is beyond the boundaries of the memory area, or if there are not at least nbytes of memory after it.

  • nbytes = hostmem:size([offset=0])
    nbytes = hostmem:size(ptr)
    Returns the number of bytes of memory available after offset bytes from the beginning of the encapsulated memory area, or after ptr (a lightuserdata obtained with hostmem:ptr( )).

  • data = hostmem:read([offset], [nbytes])
    {val1, …​, valN} = hostmem:read([offset], [nbytes], primtype)
    Reads nbytes of data starting from offset, and returns it as a binary string or as a table of primitive values.
    The offset parameter defaults to 0, and nbytes defaults to the memory size minus offset.
    hostmem:read(offset, nbytes, primtype) is functionally equivalent to cl.unpack(primtype, hostmem:read(offset, nbytes)).

  • hostmem:write(offset, nil, data)
    hostmem:write(offset, primtype, val1, …​, valN)
    hostmem:write(offset, primtype, {value1, …​, valueN})
    Writes to the encapsulated memory area, starting from the byte at offset.
    write(offset, nil, data) writes the contents of data (a binary string);
    write(offset, primtype, …​) is equivalent to write(offset, nil, cl.pack(primtype, …​)).

  • hostmem:copy(offset, size, srcptr)
    hostmem:copy(offset, size, srchostmem, srcoffset)
    Copies size bytes to the encapsulated memory area, starting from the byte at offset.
    copy(offset, size, srcptr), copies the size bytes pointed to by srcptr (a lightuserdata).
    copy(offset, size, srchostmem, srcoffset), copies size bytes from the memory encapsulated by srchostmem (a hostmem object), starting from the location at srcoffset.

  • hostmem:clear(offset, nbytes, [val=0])
    Clears nbytes of memory starting from offset. If the val parameter is given, the bytes are set to its value instead of 0 (val may be an integer or a character, i.e. a string of length 1).

Commands

This section describes the bindings to API functions that enqueue commands in command queues for execution on devices.

Unless otherwise specified, the following facts apply to any enqueue_xxx( ) function described in this section:

  • - the blocking parameter is a boolean indicating if the call must block until the command has been executed (true), or return immediately after the command has been enqueued (false);

  • - the {we} parameter ('wait events') is an optional list of events on which to wait for before command execution;

  • - the ge parameter ('generate event') is an optional boolean indicating if the function must generate and return an event identifying the command (ge=true), or if it must not (ge=false or nil, in which case the returned event is nil);

  • - the {integer}[3] notation denotes an array of up to 3 integers (missing entries default to 0, e.g. {5} is equivalent to {5, 0, 0});

  • - the ptr parameter or return value is a lightuserdata encapsulating a valid pointer (void*) to memory;

  • - the semantics and optionality of functions' parameters are the same as for the corresponding parameters of the underlying clEnqueueXxxx( ) functions.

buffer commands




image commands

  • event = enqueue_fill_image(queue, image, fillcolor, origin, region, [{we}, ge])
    fillcolor: binary string,
    origin, region: {integer}[3].
    The fillcolor binary string may encode either a float, a float[4], a int[4], or a uint[4].
    Rfr: clEnqueueFillImage.


GL objects commands

svm commands



kernel commands

  • event = enqueue_task(queue, kernel, [{we}, ge])
    Equivalent to cl.enqueue_ndrange_kernel(cq, kernel, 1, {0}, {1}, {1}, {we}, ge).

synchronization

  • flush(queue)
    Blocks until all previously enqueued commands have been submitted.
    Rfr: clFlush.

  • finish(queue)
    Blocks until all previously enqueued commands have been completed.
    Rfr: clFinish.

Structs

  • contextproperties = {
    platform: platform (opt.),
    interop_user_sync: boolean (opt.),
    context_terminate: boolean (opt.),
    gl_context: lightuserdata or 0 (opt.),
    cgl_sharegroup: lightuserdata or 0 (opt.),
    glx_display: lightuserdata or 0 (opt.),
    egl_display: lightuserdata or 0 (opt.),
    wgl_hdc: lightuserdata or 0 (opt.),
    } (rfr: clCreateContext)

  • devicepartitionproperty = {
    -- Note: the following fields are mutually exclusive:
    equally: integer,
    by_affinity_domain: affinitydomainflags,
    by_counts: {integer},
    } (rfr: clCreateSubDevices)

  • devicesupportedpartitionproperty = {
    equally: boolean,
    by_affinity_domain: boolean,
    by_counts: boolean,
    } (rfr: clGetDeviceInfo)

  • imagedesc = {
    type: memobjecttype,
    width: integer (defaults to 1),
    height: integer (defaults to 1),
    depth: integer (defaults to 1),
    array_size: integer (defaults to 1),
    row_pich: integer (defaults to 0),
    slice_pitch: integer (defaults to 0),
    num_mip_levels: integer (default to 0),
    num_samples: integer (default to 0),
    buffer: buffer (opt.),
    image: image (opt., ignored if buffer is present),
    } (rfr: cl_image_desc)

  • imageformat = {
    channel_order: channelorder,
    channel_type: channeltype,
    } (rfr: cl_image_format)

  • pipeproperties = {
    -- reserved for future use
    }

  • queueproperties = {
    properties: queueflags (opt.),
    size: integer (opt.),
    queue_priority: queuepriority (opt.),
    queue_throttle: queuethrottle (opt.),
    } (rfr: clCreateCommandQueueWithProperties)

  • samplerproperties = {
    normalized_coords: boolean,
    addressing_mode: addressingmode,
    filter_mode: filtermode,
    } (rfr: clCreateSamplerWithProperties)

Enums

OpenCL enums are mapped in MoonCL to sets of string literals (as is customary in Lua). Admitted literals are available in the cl table (e.g. cl.PLATFORM_XXX for CL_PLATFORM_XXX), and can also be inferred from the corresponding C enum names. For example, given the cl.PLATFORM_XXX hint for the platform enum type, the literals it admits are obtained by lowercasing the XXX part of the name and replacing any underscore with a space.

The example contained in the code snippets section should hopefully be clear enough.

If needed, the following function can be used to obtain the list of literals admitted by a particular enum type.

  • {literal} = cl.enum(enumtype)
    Returns a table listing the literals admitted by enumtype (given as a string, e.g. 'blendop', 'format', etc).

Below is the list of the enum types, each with its hint, the list of string values it admits (if not too long), and a reference to the original OpenCL enum type where to look for semantic and usage information.

For enum types denoting object info, the type of the info value is also listed.

addressingmode: cl.ADDRESS_XXX (cl_addressing_mode)
Values: 'none', 'clamp to edge', 'clamp', 'repeat', 'mirrored repeat'.

argaccessqualifier: cl.KERNEL_ARG_ACCESS_XXX (cl_kernel_arg_access_qualifier)
Values: 'read only', 'write only', 'read write', 'none'.

argaddressqualifier: cl.KERNEL_ARG_ADDRESS_XXX (cl_kernel_arg_address_qualifier)
Values: 'global', 'local', 'constant', 'private'.

buffercreatetype: cl.BUFFER_CREATE_TYPE_XXX (cl_buffer_create_type)
Values: 'region'.

buildstatus: cl.BUILD_XXX (cl_build_status)
Values: 'success', 'none', 'error', 'in progress'.

channelorder: cl.XXX (cl_channel_order)
Values: 'r', 'a', 'rg', 'ra', 'rgb', 'rgba', 'bgra', 'argb', 'intensity', 'luminance', 'rx', 'rgx', 'rgbx', 'depth', 'depth stencil', 'srgb', 'srgbx', 'srgba', 'sbgra', 'abgr'.

channeltype: cl.XXX (cl_channel_type)
Values: 'snorm int8', 'snorm int16', 'unorm int8', 'unorm int16', 'unorm short 565', 'unorm short 555', 'unorm int 101010', 'signed int8', 'signed int16', 'signed int32', 'unsigned int8', 'unsigned int16', 'unsigned int32', 'half float', 'float', 'unorm int24', 'unorm int 101010 2'.

commandqueueinfo: cl.QUEUE_XXX (cl_command_queue_info)
'reference count': integer
'size': integer
'properties': queueflags
'context': context
'device': device
'device default': queue
Rfr: clGetCommandQueueInfo.

commandtype: cl.COMMAND_XXX (cl_command_type)
Values: 'ndrange kernel', 'task', 'native kernel', 'read buffer', 'write buffer', 'copy buffer', 'read image', 'write image', 'copy image', 'copy image to buffer', 'copy buffer to image', 'map buffer', 'map image', 'unmap mem object', 'marker', 'acquire gl objects', 'release gl objects', 'read buffer rect', 'write buffer rect', 'copy buffer rect', 'user', 'barrier', 'migrate mem objects', 'fill buffer', 'fill image', 'svm free', 'svm memcpy', 'svm memfill', 'svm map', 'svm unmap', 'gl fence sync object'.

contextinfo: cl.CONTEXT_XXX (cl_context_info)
'reference count': integer
'num devices': integer
'devices': {device}
'properties': contextproperties
Rfr: clGetContextInfo.

deviceinfo: cl.DEVICE_XXX (cl_device_info)
'type': devicetypeflags
'vendor id': integer
'max compute units': integer
'max work item dimensions': integer
'max work item sizes': {integer}
'max work group size': integer
'preferred vector width char': integer
'preferred vector width short': integer
'preferred vector width int': integer
'preferred vector width long': integer
'preferred vector width float': integer
'preferred vector width double': integer
'preferred vector width half': integer
'native vector width char': integer
'native vector width short': integer
'native vector width int': integer
'native vector width long': integer
'native vector width float': integer
'native vector width double': integer
'native vector width half': integer
'max clock frequency': integer
'address bits': integer
'max mem alloc size': integer
'image support': boolean
'max read image args': integer
'max write image args': integer
'max read write image args': integer
'image2d max width': integer
'image2d max height': integer
'image3d max width': integer
'image3d max height': integer
'image3d max depth': integer
'image max buffer size': integer
'image max array size': integer
'max samplers': integer
'image pitch alignment': integer
'image base address alignment': integer
'max pipe args': integer
'pipe max active reservations': integer
'pipe max packet size': integer
'max parameter size': integer
'mem base addr align': integer
'single fp config': fpflags
'double fp config': fpflags
'half fp config': fpflags
'global mem cache type': devicememcachetype
'global mem cacheline size': integer
'global mem cache size': integer
'global mem size': integer
'max constant buffer size': integer
'max constant args': integer
'max global variable size': integer
'global variable preferred total size': integer
'local mem type': devicelocalmemtype
'local mem size': integer
'error correction support': boolean
'profiling timer resolution': integer
'endian little': boolean
'available': boolean
'compiler available': boolean
'linker available': boolean
'execution capabilities': execflags
'queue on host properties': queueflags
'queue on device properties': queueflags
'queue on device preferred size': integer
'queue on device max size': integer
'max on device queues': integer
'max on device events': integer
'built in kernels': string
'platform': platform
'name': string
'vendor': string
'profile': string
'version': string
'opencl c version': string
'extensions': string
'printf buffer size': integer
'preferred interop user sync': boolean
'parent device': device
'partition max sub devices': integer
'partition properties': devicesupportedpartitionproperty
'partition affinity domain': affinitydomainflags
'partition type': devicepartitionproperty
'reference count': integer
'svm capabilities': svmflags
'preferred platform atomic alignment': integer
'preferred global atomic alignment': integer
'preferred local atomic alignment': integer
'min data type align size': integer
'il version': string
'max num sub groups': integer
'sub group independent forward progress': boolean
Rfr: clGetDeviceInfo.

devicelocalmemtype: cl.XXX (cl_device_local_mem_type)
Values: 'local', 'global'.

devicememcachetype: cl.XXX (cl_device_mem_cache_type)
Values: 'none', 'read only cache', 'read write cache'.

eventinfo: cl.EVENT_XXX (cl_event_info)
'reference count': integer
'command type': commandtype
'command queue': queue
'context': context
'command execution status': executionstatus or an integer code
Rfr: clGetEventInfo.

executionstatus: cl.XXX (command execution status)
Values: 'complete', 'running', 'submitted', 'queued'.

filtermode: cl.FILTER_XXX (cl_filter_mode)
Values: 'nearest', 'linear'.

glcontextinfo: cl.GL_CONTEXT_INFO (cl_gl_context_info)
'current device': device
'devices': {device}
Rfr: clGetGLContextInfoKHR.

globjecttype: - (GL_OBJECT_XXX)
Values: 'buffer', 'texture 2d', 'texture 3d', 'renderbuffer', 'texture 2d array', 'texture 1d', 'texture 1d array', 'texture buffer'.
Rfr: clGetGLObjectInfo.

gltextureinfo: cl.GL_TEXTURE_INFO (cl_gl_texture_info)
'texture target': gltexturetarget
'mipmap level': integer
'num samples': integer
Rfr: clGetGLTextureInfo.

gltexturetarget: - (GL_TEXTURE_XXX)
Values: '1d', '2d', '3d', '1d array', '2d array', 'buffer', 'cube map', 'cube map positive x', 'cube map negative x', 'cube map positive y', 'cube map negative y', 'cube map positive z', 'cube map negative z', 'rectangle', '2d multisample', '2d multisample array'.
Rfr: clCreateFromGLTexture.

imageinfo: cl.IMAGE_XXX (cl_image_info)
'format': imageformat
'element size', 'array size': integer
'row pitch', 'slice pitch': integer
'width', 'height', 'depth': integer
'num mip levels': integer
'num samples': integer
Rfr: clGetImageInfo.

kernelarginfo: cl.KERNEL_ARG_XXX (cl_kernel_arg_info)
'type name': string
'name: string
'address qualifier': argaddressqualifier
'access qualifier': argaccessqualifier
'type qualifier': argtypeflags
Rfr: clGetKernelArgInfo.

kernelexecinfo: cl.KERNEL_EXEC_INFO_XXX (cl_kernel_exec_info)
'svm fine grain system': boolean
'svm ptrs': {lightuserdata} (containing valid svm void* pointers)
Rfr: clSetKernelExecInfo.

kernelinfo: cl.KERNEL_XXX (cl_kernel_info)
'function name' : string
'attributes' : string
'reference count': integer
'num args': integer
'max num sub groups': integer
'compile num sub groups': integer
'program': program
'context': context
Rfr: clGetKernelInfo.

kernelsubgroupinfo: cl.KERNEL_XXX (cl_kernel_sub_group_info)
(inputvalue → value)
'max sub group size for ndrange': {integer} → integer
'sub group count for ndrange': {integer} → integer
'local size for sub group count': integer → {integer}
'max num sub groups': nil → integer
'compile num sub groups': nil → integer
Rfr: clGetKernelSubGroupInfo.

kernelworkgroupinfo: cl.KERNEL_XXX (cl_kernel_work_group_info)
'work group size': integer
'preferred work group size multiple': integer
'local mem size': integer
'private mem size': integer
'global work size': {integer}
'compile work group size': {integer}
Rfr: clGetKernelWorkGroupInfo.

meminfo: cl.MEM_XXX (cl_mem_info)
'type': memobjecttype
'reference count': integer
'flags': memflags
'size': integer
'offset': integer
'map count': integer
'context': context
'associated memobject': buffer or image or pipe or nil
'uses svm pointer': boolean
'host ptr': lightuserdata (void*) or nil
Rfr: clGetMemObjectInfo.

memobjecttype: cl.MEM_OBJECT_XXX (cl_mem_object_type)
Values: 'buffer', 'image2d', 'image3d', 'image2d array', 'image1d', 'image1d array', 'image1d buffer', 'pipe'.

pipeinfo: cl.PIPE_XXX (cl_pipe_info)
'packet size': integer
'max packets': integer
Rfr: clGetPipeInfo.

platforminfo: cl.PLATFORM_XXX (cl_platform_info)
'profile': string
'version': string
'name': string
'vendor': string
'extensions': string
'host timer resolution': integer
Rfr: clGetPlatformInfo.

primtype: OpenCL primitive types (cl_char, cl_int, etc.)
Values: 'char', 'uchar', 'short', 'ushort', 'int', 'uint', 'long', 'ulong', 'half', 'float', 'double'.

profilinginfo: cl.PROFILING_XXX (cl_profiling_info)
'command queued', 'command submit', 'command start', 'command end', 'command complete': integer (denoting time in nanoseconds)
Rfr: clGetEventProfilingInfo.

programbinarytype: cl.PROGRAM_BINARY_TYPE_XXX (cl_program_binary_type)
Values: 'none', 'compiled object', 'library', 'executable', 'intermediate'.

programbuildinfo: cl.PROGRAM_BUILD_XXX (cl_program_build_info)
'status': buildstatus
'binary type': programbinarytype
'options': string
'log': string
'global variable total size': integer
Rfr: clGetProgramBuildInfo.

programinfo: cl.PROGRAM_XXX (cl_program_info)
'reference count': integer
'context': context
'num devices': integer
'num kernels': integer
'devices': {device}
'kernel names': string
'source': string
'binary sizes': {integer}
'binaries': {string}
'il': binary string
Rfr: clGetProgramInfo.

queuepriority: cl.QUEUE_PRIORITY_XXX_KHR (cl_queue_priority_khr)
Values: 'high', 'medium', 'low'.
Rfr: clCreateCommandQueueWithProperties.

queuethrottle: cl.QUEUE_THROTTLE_XXX_KHR (cl_queue_throttle_khr)
Values: 'high', 'med', 'low'.
Rfr: clCreateCommandQueueWithProperties.

samplerinfo: cl.SAMPLER_XXX (cl_sampler_info)
'context': context
'reference count': integer
'normalized coords': boolean
'addressing mode': addressingmode
'filter mode': filtermode
'mip filter mode': filtermode
'lod min': integer
'lod max': integer
Rfr: clGetSamplerInfo.

Flags

Flags in MoonCL functions and structs are always represented as plain integers, and encoded in the same way as the corresponding flags in the C OpenCL API.

The cl table contains the CL_XXX values, renamed as cl.XXX (e.g. cl.DEVICE_TYPE_CPU, cl.DEVICE_TYPE_GPU, etc.).

For each flags type (see the list below), a utility function is also available to map an integer code to a list of string literals, each corresponding to an individual bit set in the code, and viceversa to encode an integer value from the individual bits given as a list of string literals. The generic definition of such functions is the following, where xxxflags stands for devicetypeflags, memflags, etc:

  • code = xxxflags(s1, s2, …​)
    s1, s2, …​ = xxxflags(code)
    Maps the integer code to/from the list of string values s1, s2, …​.

See also the example contained in the code snippets section.

affinitydomainflags: cl.DEVICE_AFFINITY_DOMAIN_XXX (cl_device_affinity_domain)
Values: 'numa', 'l4 cache', 'l3 cache', 'l2 cache', 'l1 cache', 'next partitionable'.

argtypeflags: cl.KERNEL_ARG_TYPE_XXX (cl_kernel_arg_type_qualifier)
Values: 'const', 'restrict', 'volatile', 'pipe'.

devicetypeflags: cl.DEVICE_TYPE_XXX (cl_device_type)
Values: 'default', 'cpu', 'gpu', 'accelerator', 'custom', 'all'.

execflags: cl.EXEC_XXX (cl_device_exec_capabilities)
Values: 'kernel', 'native kernel'.

fpflags: cl.FP_XXX (cl_device_fp_config)
Values: 'denorm', 'inf nan', 'round to nearest', 'round to zero', 'round to inf', 'fma', 'float', 'correctly rounded divide sqrt'.

mapflags: cl.MAP_XXX (cl_map_flags)
Values: 'read', 'write', 'write invalidate region'.

memflags: cl.MEM_XXX (cl_mem_flags, cl_svm_mem_flags)
Values: 'read write', 'write only', 'read only', 'use host ptr', 'alloc host ptr', 'copy host ptr', 'host write only', 'host read only', 'host no access', 'fine grain buffer', 'atomics', 'kernel read and write'.

migrateflags: cl.MIGRATE_MEM_OBJECT_XXX (cl_mem_migration_flags)
Values: 'host', 'content undefined'.

queueflags: cl.QUEUE_XXX (cl_command_queue_properties)
Values: 'out of order exec mode enable', 'profiling enable', 'on device', 'on device default'.

svmflags: cl.DEVICE_SVM_XXX (cl_device_svm_capabilities)
Values: 'coarse grain buffer', 'fine grain buffer', 'fine grain system', 'atomics'.

Miscellanea

Version handling

The cl table contains the following version-related information:

  • cl._VERSION: MoonCL version (a string).

  • cl.CL_VERSION_n_m: OpenCL versions supported by MoonCL (e.g. cl.CL_VERSION_1_2 = true if OpenCL v1.2 is supported).
    Note that this indicates only that a particular version is supported by MoonCL, while it may or may not be supported by the available devices (use get_device_info( ) to check it).

  • cl.CL_VERSIONS: a table listing the supported versions as strings (e.g. { 'CL_VERSION_1_0', 'CL_VERSION_2_0', …​ }).

Common methods

The following methods are common to MoonCL objects.

  • handle = object:raw( )
    Returns the raw OpenCL handle for the given object in a lightuserdata (e.g. platform:raw( ) returns a lightuserdara containing a cl_platform_id).

  • typestring = object:type( )
    typestring = cl.type(object)
    object: any object type.
    Returns a string denoting the type of the given object (e.g. platform:type( ) returns 'platform').
    The cl.type( ) function returns nil if object is not a valid MoonCL object.

  • parent = object:parent( )
    object: any object type.
    Returns the parent object of the given object, or nil if it has no parent.

  • device = object:device( )
    object: queue.
    Returns the device the given object belongs to.

  • program = object:program( )
    object: kernel.
    Returns the program the given object belongs to.

Data handling

This section describes additional utilities that can be used to encode data from Lua variables to binary strings and viceversa.

  • val1, …​, valN = flatten(table)
    Flattens out the given table and returns the terminal elements in the order they are found.
    Similar to Lua’s table.unpack( ), but it also unpacks any nested table. Only the array part of the table and of nested tables is considered.

  • {val1, …​, valN} = flatten_table(table)
    Same as flatten( ), but returns the values in a flat table. Unlike flatten( ), this function can be used also with very large tables.

  • size = sizeof(primtype)
    Returns the size in bytes of the given primtype.

  • data = pack(primtype, val1, …​, valN)
    data = pack(primtype, table)
    Packs the numbers val1, …​, valN, encoding them according to the given primtype, and returns the resulting binary string.
    The values may also be passed in a (possibly nested) table. Only the array part of the table (and of nested tables) is considered.

  • {val1, …​, valN} = unpack(primtype, data)
    Unpacks the binary string data, interpreting it as a sequence of values of the given primtype, and returns the extracted values in a flat table.
    The length of data must be a multiple of sizeof(primtype).

Tracing utilities

  • trace_objects(boolean)
    Enable/disable tracing of objects creation and destruction (which by default is disabled).
    If enabled, a printf is generated whenever an object is created or deleted, indicating the object type and the value of its raw OpenCL handle.

  • t = now( )
    Returns the current time in seconds (a Lua number).
    This is implemented with monotonic clock_gettime(3), if available, or with gettimeofday(3) otherwise.

  • dt = since(t)
    Returns the time in seconds (a Lua number) elapsed since the time t, previously obtained with the now( ) function.

Code snippets

Dealing with enums
 -- Alternative and equivalent ways to assign values of the 'filtermode' enum type
 -- (whose hint is cl.FILTER_XXX):
 op1 = cl.FILTER_NEAREST
 op2 = 'nearest' -- XXX=NEAREST
 op3 = cl.FILTER_LINEAR
 op4 = 'linear' -- XXX=LINEAR

 print(op1) --> 'nearest'
 print(op2) --> 'nearest'
 print(op3) --> 'linear'
 print(op4) --> 'linear'
 print(op1 == 'nearest') --> true
 print(op3 == op4) --> true

 -- List the literals admitted by the 'filtermode' enum type:
 ops = cl.enum('filtermode')
 print(table.concat(ops, ', ')) --> 'nearest', 'linear'
Dealing with flags
 -- Two alternative equivalent ways to produce a 'devicetypeflags' value:
 code1 = cl.DEVICE_TYPE_CPU | cl.DEVICE_TYPE_GPU
 code2 = cl.devicetypeflags('cpu', 'gpu')

 assert(code1 == code2) -- true
 print(code1) --> 3
 print(cl.devicetypeflags(code1)) --> cpu gpu
 print(cl.devicetypeflags(code2)) --> cpu gpu

 if (code1 & cl.DEVICE_TYPE_GPU) ~= 0 then -- NB: 0 is not false in Lua
   print("'gpu' bit is set")
 else
   print("'gpu' bit is not set")
 end

1. This manual is written in AsciiDoc, rendered with AsciiDoctor and a CSS from the AsciiDoctor Stylesheet Factory.
2. Objects are anchored to the Lua registry at their creation, so even if the script does not have references to an object, a reference always exists on the registry and this prevents the GC to collect it.
3. It is good practice to not leave invalid references to objects around, because they prevent the GC to collect the memory associated with the userdata.