Efficient rendering of geometric
data
using OpenGL VBOs in SPECviewperf
Introduction
The goal of SPECviewperf is to be a good predictor of graphics performance
for real-world applications. The testing files (viewsets) within SPECviewperf
generate OpenGL command streams that are similar to those used by the
application. SPECviewperf provides a measure of graphics subsystem performance
and its impact on the complete system, without the full overhead of
an application.
Given its connection to real applications, it is important that SPECviewperf
can provide performance measurement based on new technologies implemented
in those applications. An example of such a development is vertex buffer
objects (VBOs), which are included in OpenGL 1.5.
VBOs offer a way to obtain performance and flexibility benefits for
OpenGL applications. This white paper details some of the motivations
behind VBOs, as well as the specific OpenGL functions related to their
use. It also touches on potential performance implications and shows
where VBO functions are placed within the SPECviewperf test-harness
code.
Background
OpenGL traditionally provides two main approaches for rendering geometric
data – immediate mode and display lists.
When using immediate mode, applications send all the geometric data
to the graphics processor (GPU) every frame, which is advantageous in
situations such as modeling or animation where geometry is frequently
created or modified. If geometric data does not change frequently, however,
immediate mode can result in wasted data transfer when compared with
storing the same geometric data within graphics memory.
Because immediate mode transfers data as individual elements, such
as a single vertex or normal, it typically creates significant traffic
to and from system memory and over the CPU’s front-side bus. This
translates into increased CPU cycles to perform the actual transfer.
These effects are further compounded by greater function call overhead
in the API, which at a hardware level results in increased traffic and
CPU cycles.
As attributes such as colors and texture coordinates are associated
with vertices to improve visual quality, the problem compounds. Triangle
strips, triangle fans and line strips attempt to mitigate some of the
data transfer needs by allowing individual vertices to define a triangle/line.
In spite of this, however, immediate mode frequently causes data retrieval,
transfer and CPU bottlenecks that inhibit overall graphics performance.
As an alternative to immediate mode, OpenGL provides display lists.
These enable a series of graphics commands to be grouped together. This
gives OpenGL implementations more opportunity to process and store data
in ways that can improve overall graphics performance. Display lists
can be stored within graphics memory, for example, to avoid transfer
over the graphics bus.
Display lists also make it attractive for OpenGL implementations to
allow GPUs to pull data directly from system memory with DMA transfers.
While it is possible to transfer an individual vertex by a DMA transfer,
the benefits of reduced CPU cycles and front-side bus traffic are more
than outweighed by the setup costs involved. Display lists allow more
data and/or commands to be transferred in one transfer and setup.
Despite these benefits, display lists do have some disadvantages. In
some situations, geometric data changes require creating a new display
list. Depending on the frequency with which geometric data is updated,
the potential performance advantages may be outweighed by the complexities
of managing creation/deletion of display lists. Similarly, for best
performance it is assumed some OpenGL states will not change within
the display list. If a state does in fact change, then the benefits
of display lists may not apply because it forces OpenGL implementations
to potentially store values in system memory and/or update the GPU’s
settings. This would prevent commands and data from being processed
as a block and require CPU intervention.
Display lists are created by a program and issued to the OpenGL client.
Ultimately, however, they are processed by the GPU from a copy stored
by the OpenGL server. This creates a doubling of data when compared
with immediate mode. It also raises another issue: The size of the OpenGL
server copy of the display list is not visible to the OpenGL program.
This can cause issues when memory space is constrained.
As an alternative to display lists, OpenGL also implements vertex arrays.
These allow vertex and attribute data to be grouped and treated as a
block, which promotes some of the data transfer efficiencies afforded
by display lists. Vertex arrays also allow data such as geometry and
color to be interleaved, which can be convenient when creating and referencing.
Unfortunately, vertex arrays prohibit assuming that any individual piece
of data will not change. As a result, when drawing an object using vertex
arrays, the data in the array must be validated each time it is referenced.
This adds overhead into data transfer. Vertex arrays do not suffer,
however, from the limitation of storing two copies of all data.
VBOs are intended to enhance the capabilities of OpenGL by providing
many of the benefits of immediate mode, display lists and vertex arrays,
while avoiding some of the limitations. They allow data to be grouped
and stored efficiently like vertex arrays to promote efficient data
transfer. They also provide a mechanism for programs to give hints about
data usage patterns so that OpenGL implementations can make decisions
about the form in which data should be stored and its location. VBOs
give applications the flexibility to be able to modify data without
causing overhead in transfer due to validation. When combined with programmability,
VBOs extend OpenGL’s capabilities into new areas, such as modifying
vertex data with previously rendered pixel data, and render to vertex
array.
Detailed description of VBOs
The idea behind VBOs is to provide regions of memory (buffers) accessible
through identifiers. A buffer is made active through binding, following
the same pattern as other OpenGL entities such as display lists or textures.
VBOs provide control over the mappings and unmappings of buffer objects
and define the usage type of the buffers. This allows graphics drivers
to optimize internal memory management and choose the best type of memory
– such as cached/uncached system memory or graphics memory –
in which to store the buffers.
The binding operation converts each pointer in the client-state function
into offsets relative to the current bound buffer. As a result, the
bind operation turns a client-state function into a server-state function.
The scope of data used by client-state functions is only accessible
by the OpenGL client itself and other OpenGL clients are not able to
access that client’s data. Because the VBO mechanism changes client-state
functions into server-state functions, it is now possible to share VBO
data among various clients. As a result, OpenGL clients are able to
bind common buffers in the same way as textures or display lists.
The following is an outline of the key OpenGL calls associated with
VBO usage:
- glBindBuffer: This allows client-state functions to use
binding buffers instead of working in absolute memory on the client
side. Binding the buffer #0 switches off VBO and reverts to the usual
client-state mode with absolute pointers.
- glBufferData, glBufferSubData, and glGetBufferSubData:
These functions control the size of the buffer data, provide usage
hints, and allow copying to a buffer.
- glMapBuffer and glUnmapBuffer: These functions
lock and unlock buffers, allowing data to be loaded into them or relinquishing
control to the server. A temporary pointer is returned as an entry
to the beginning of the buffer, which also maps the buffer into client
memory. OpenGL is responsible for how this mapping into the client’s
absolute memory occurs. Because of this, mapping must be done for
a short operation, and the pointer is not persistent and should be
stored for further use.
VBOs are intended to work with the following OpenGL target objects:
- Array buffers (ARRAY_BUFFER): These buffers contain
vertex attributes, such as vertex coordinates, texture coordinate
data, per vertex-color data, and normals. They can be interleaved
(using the stride parameter) or sequential, with one array after another
(write 1,000 vertices, then 1,000 normals, and so on). glVertexPointer
and glNormalPointer each point to the appropriate offsets.
- Element array buffers (ELEMENT_ARRAY_BUFFER):
This type of buffer is used mainly for the element pointer in glDraw[Range]Elements().
It contains only indices of elements.
These two targets should be set up so that the element arrays are available
at the same time as array buffers in glDraw[Range]Elements().
The targets enable users to switch among various element buffers while
keeping the same vertex array buffer. This can be used to implement
LOD and other effects by changing the elements table while working on
the same database of vertices.
New procedures, functions and tokens
Usage flags
- STREAM_DRAW
- STREAM_READ
- STREAM_COPY
- STATIC_DRAW
- STATIC_READ
- STATIC_COPY
- DYNAMIC_DRAW
- DYNAMIC_READ
- DYNAMIC_COPY
Access flags
- READ_ONLY
- WRITE_ONLY
- READ_WRITE
Targets
- ARRAY_BUFFER
- ELEMENT_ARRAY_BUFFER
void BindBuffer (enum target, uint
buffer):
The BindBuffer function is used to bind a buffer ID as the
actual buffer to use. It switches off the use of buffers if the ID is
zero.
void *MapBuffer (enum target, enum
access);
boolean UnmapBuffer (enum target);
The function MapBuffer provides a pointer corresponding to
the mapped area of the current buffer object. UnmapBuffer releases
the mapping.
void BufferData (enum target, sizeiptr
size, const void *data, enum usage);
The BufferData function can be used two ways:
- To set up the memory amount and usage for the current buffer object
with data set to NULL. The user can map the buffer later to set up
its data.
- To allocate memory, set the usage, and copy data; typically used
when dealing with a static memory model.
void BufferSubData (enum target,
intptr offset, sizeiptr size, const void *data);
The BufferSubData function copies data in a specific range
inside the buffer object.
void GetBufferSubData (enum target,
intptrARB offset, sizeiptrARB size, void *data);
The GetBufferSubData function retrieves sub-data from a specific
range in the current buffer object.
void DeleteBuffers (sizei n, const
uint *buffers);
void GenBuffers (sizei n, uint *buffers);
boolean IsBuffer (uint buffer);
These three functions are similar to display list/textures identifiers;
they can allocate, free or query identifiers for buffer objects.
void GetBufferParameteriv (enum target,
enum pname, int *params);
The GetBufferParameteriv function returns various parameters
concerning the current buffer object. Pname can be:
- BUFFER_SIZE: Returns the size of the buffer object.
- BUFFER_USAGE: Returns the usage of the buffer object.
- BUFFER_ACCESS: Returns the access flag of the buffer object.
- BUFFER_MAPPED: Indicates if this buffer is mapped.
void GetBufferPointerv (enum target,
enum pname, void **params);
The GetBufferPointerv function returns the actual pointer
of the buffer if it has been mapped (MapBuffer). Pname can
only be BUFFER_MAP_POINTER for this time.
Tokens for Get{Boolean, Integer, Float, Double}v
The buffer object ID zero is reserved, and when buffer object zero
is bound to a given target, the commands affected by that buffer binding
behave normally. When a nonzero buffer ID is bound, then the pointer
represents an offset, and will go through VBO management.
These tokens show which buffers are bound as VBO offsets:
- ARRAY_BUFFER_BINDING
- ELEMENT_ARRAY_BUFFER_BINDING
- VERTEX_ARRAY_BUFFER_BINDING
- NORMAL_ARRAY_BUFFER_BINDING
- COLOR_ARRAY_BUFFER_BINDING
- INDEX_ARRAY_BUFFER_BINDING
- TEXTURE_COORD_ARRAY_BUFFER_BINDING
- EDGE_FLAG_ARRAY_BUFFER_BINDING
- SECONDARY_COLOR_ARRAY_BUFFER_BINDING
- FOG_COORDINATE_ARRAY_BUFFER_BINDING
- WEIGHT_ARRAY_BUFFER_BINDING
Token for GetVertexAttribiv:
When working with VBOs and vertex programs, some attributes can have
arbitrary meanings. An array of normals, for example, can be used to
store other information. Instead of using a token from the previous
section, the index of the attribute can be used. This token allows the
user to query which attribute number is being used by VBOs through an
offset system.
- VERTEX_ATTRIB_ARRAY_BUFFER_BINDING
Purposes of various
VBO functions
glBufferData()
This function is an abstraction layer between the memory and the application.
Behind each buffer object is a complex memory management system. The
glBufferData() function looks at the size and type of the data
store, reserves storage, and optionally initializes the data from the
user’s pointer. If storage space was previously allocated for
this buffer, an individual implementation may choose to either reuse
the previous storage or discard the current storage and allocate a new
storage. If the data pointer specified is not NULL, the storage for
the buffer is initialized with size machine units (typically
bytes) from the data pointer. For specifics on when memory associated
with the buffer is freed instead of resized, please consult documentation
from individual GPU vendors.
Usage flags
The usage argument is a key value for helping the VBO memory manager
fully optimize buffers. While these values are only hints, and they
can be ignored by the implementation, applications are strongly encouraged
to provide correct usage flags. Additional implementation-specific information
on the interpretation of hints may be available from GPU vendors.
Name of flag
|
Definition
|
STATIC_...
|
Assumed to be a 1-to-n update-to-draw. Means the data is specified
once, or possibly very rarely.
|
DYNAMIC_...
|
Assumed to be an n-to-n update-to-draw. Generally, it means
data that is updated frequently, but is drawn multiple times per
update, such as any dynamic data that is updated every few frames
or so.
|
STREAM_...
|
Assumed to be a 1-to-1 update-to-draw. Can be thought of as
data that is updated about once each time it’s drawn. STREAM
is like DYNAMIC: Data will be changed over time. Data
is expected to change frequently.
|
..._READ_...
|
Means there must be easy access to read the data. This option
is typically not meaningful for VBOs by themselves.
|
..._COPY_...
|
Means _READ_ and _DRAW_ operations will be
used on this buffer. This option is typically not meaningful for
VBOs by themselves.
|
..._DRAW_...
|
Means the buffer will be used for sending data to the GPU.
|
Table 1: List of usage flags
This combination of memory usage can help an implementation’s
memory manager balance between different kinds of memory, such as system,
uncached and video. Since different categories of memory have different
access characteristics for the CPU and GPU, these usage hints allow
the proper selection to occur. On the client side, these are not hard
restrictions, but suggestions that help graphics drivers decide where
to store the data and how to manage it. Nothing prevents creating a
STATIC data store and then updating it every frame. Nor is
there any reason the user can’t create a STREAMING data
store that is never modified, although such usage patterns in conflict
with supplied hints are strongly discouraged.
glBufferSubData()
This function gives the user a way to replace a range of data in an
existing buffer. It works much in the same ways as glCopyTexSubImage().
An individual implementation may either interlock or queue the update
to ensure that all previous rendering operations from this buffer have
completed.
glBindBuffer()
This sets the current buffer object. All subsequent calls to set array
pointers will refer to this object, and all updates will occur to this
buffer. Binding the special buffer name to zero tells the driver not
to use buffer objects.
glMapBuffer()
This function maps the buffer object into the client’s memory,
if it is possible. The pointer returned can be both read from and written
to directly by the CPU, allowing arbitrary updates. To maintain the
proper OpenGL semantics, where operations always appear to occur in
order, the implementation may be required to either stall or make a
copy of the buffer to allow the mapping to occur, if the buffer is still
in use by the GPU. When the buffer cannot be mapped, the implementation
will return a NULL pointer.
glUnmapBuffer()
This function unmaps the buffer object from the client’s memory.
It returns a success code that the application should check to ensure
the update occurred correctly. When a failure is reported, the contents
of the buffer may have become undefined due to an extraordinary event
occurring while the buffer was mapped. In this case, the data should
be resubmitted by the application.
glVertexPointer()
This function sets up the offset (originally a pointer), depending
on the current buffer object.
Suggestions for efficient VBO usage
Keep in mind that the driver cannot guess what to do with the memory
pointer returned by glMapBuffer(). Will a few bytes be changed,
or will the whole buffer be updated? The pointer returned by glMapBuffer()
refers to the actual location of the data. It is possible that the
GPU could be working with this data, so requesting it for an update
will force the driver to wait for the GPU to finish its task.
To solve this conflict, glBufferData() can be called with
a NULL pointer to discard the previous buffer, or the glBufferSubData()
function can be used instead to specify the exact subregion. Calling
glMapBuffer() tells the driver that the previous data is
no longer valid. As a consequence, if the GPU is still working on
the data, there will not be a conflict, and the implementation may
allocate a new buffer. The glMapBuffer() function may return
this new pointer that can be used while the GPU is working on the
previous set of data. In the glBufferSubData() case, the
data must be updated in a contiguous block. No reading of the data
is allowed, so the implementation may be able to queue the update.
While vertex buffer objects offer great potential in the efficiency
of providing data to the GPU, they are often highly inefficient when
coupled with operations that require CPU processing. As a result,
feedback and selection may not perform well when combined with vertex
buffer objects. Additionally, building display lists from data in
a vertex buffer object or using glArrayElement() with vertex
buffer objects will typically be highly inefficient.
- Utilize GPU-friendly data types and alignment
With vertex buffer objects, it is now the job of the GPU to directly
interpret the data, whereas the CPU could previously reformat it as
needed during submission. If the data format that is placed in a vertex
buffer object cannot be directly handled by the GPU, the implementation
may have to read the data back to the CPU for processing, which is
often highly inefficient. It is best to check with GPU vendors for
the full list of optimal formats, but most common data types are presently
supported, as long as the attribute is aligned on a 32-bit boundary.
In the function:
glDrawArrays (GLenum mode, GLint first, GLsizei count);
Instead of changing glVertexPointer() to a specific offset
and leaving “first” to NULL, it can be more efficient
to change the “first” argument of glDrawArrays.
- Use glDrawRangeElements instead of glDrawElements
Using range elements is more efficient for two reasons:
- If the specified range can fit into a 16-bit integer, the driver
can optimize the format of indices to pass to the GPU. It can turn
a 32-bit integer format into a 16-bit integer format. In this case,
performance doubles.
- The range is precious information for the VBO manager, which can
use it to optimize its internal memory configuration.
Implementing VBOs within the SPECviewperf test
harness
As mentioned in the introduction, the goal of SPECviewperf is to test
graphics hardware by delivering OpenGL command streams taken from real
applications. As a performance evaluation tool, SPECviewperf has to
be able to use VBOs in a wide variety of ways to reflect application
usage.
The usage patterns of many of the applications covered with the current
SPECviewperf viewsets would typically use a static data model for VBOs,
where the data is defined once and drawn many times. Because of this,
the GL_STATIC_DRAW usage hint is the default. As applications
adopt and use VBOs, SPECviewperf can easily accommodate different usage
patterns.
The current implementation of VBOs within SPECviewperf doesn’t
transfer data into or out of buffers, so the glMapBuffer and
glUnMapBuffer calls are not made. As VBOs become adopted and
implemented within applications, it is expected that SPECviewperf will
be modified accordingly.
Here are the key places where VBOs are implemented within SPECviewperf:
Step 1 - Create pointers, allocate memory, generate buffer
handles, and define buffer attributes (viewperf.c):
……
mode.useVertexBufferObjects = 0;
mode.vboUsageMode = GL_STATIC_DRAW;
mode.vboMaxSize = 0;
mode.vboMaxPrims = 0;
……
unsigned int currVboID;
int vtxSize = numVertsInVBO * sizeof
( struct vector);
int colSize = numVertsInVBO * sizeof
( struct colorvector);
int nmlSize = numVertsInVBO * sizeof
( struct vector);
int texSize = numVertsInVBO * sizeof
( struct texvector);
int vtxOffs = 0;
int colOffs = vtxOffs + vtxSize;
int nmlOffs = colOffs + colSize;
int texOffs = nmlOffs + nmlSize;
int vboSize = texOffs + texSize;
int vtxDelta = prevVertexPointer - vertexData;
if (numVertexBufferObjects
>= allocVertexBufferObjects){
allocVertexBufferObjects = 2 * allocVertexBufferObjects + 16;
vertexBufferObjects = realloc(vertexBufferObjects, allocVertexBufferObjects
* sizeof (GLuint));
if (!vertexBufferObjects)
{
printf("Error: could not allocate memory for vertexBufferObjects\n");
exit(0);
}
}
glGenBuffers(1, (GLuint *) &currVboID);
vertexBufferObjects[numVertexBufferObjects++] = currVboID;
glBindBuffer(GL_ARRAY_BUFFER, currVboID);
glBufferData(GL_ARRAY_BUFFER, vboSize, NULL, mode.vboUsageMode);
glBufferSubData(GL_ARRAY_BUFFER, vtxOffs, vtxSize, prevVertexPointer);
glBufferSubData(GL_ARRAY_BUFFER, colOffs, colSize, prevColorPointer);
glBufferSubData(GL_ARRAY_BUFFER, nmlOffs, nmlSize, prevNormalPointer);
glBufferSubData(GL_ARRAY_BUFFER, texOffs, texSize, prevTexturePointer);
for (i = prevdb; i <=
db; i++) {
pDataBlock[i].vertexIndex -= vtxDelta;
pDataBlock[i].vertexBufferID = currVboID;
pDataBlock[i].texCoordOffset = texOffs;
pDataBlock[i].normalOffset = nmlOffs;
pDataBlock[i].vertexOffset = vtxOffs;
}
pevent->rb->vertexBufferObjects = vertexBufferObjects;
pevent->rb->numVertexBufferObjects = numVertexBufferObjects;
……
Step 2 - Within the draw loop bind current buffer and set
appropriate pointers (viewperf.c):
……
if (pDb->vertexBufferID != currVertexBufferID)
{
glBindBuffer(GL_ARRAY_BUFFER, pDb->vertexBufferID);
if (pDb->colorOffset >= 0) {
glColorPointer(4, GL_FLOAT, 0, ( const
GLvoid *) pDb->colorOffset);
}
if (pDb->normalOffset
>= 0) {
glNormalPointer(GL_FLOAT, 0, ( const
GLvoid *) pDb->normalOffset);
}
if (pDb->texCoordOffset
>= 0) {
glTexCoordPointer(2, GL_FLOAT, 0, ( const
GLvoid *) pDb->texCoordOffset);
}
if (pDb->vertexOffset
>= 0) {
glVertexPointer(3, GL_FLOAT, 0, ( const
GLvoid *) pDb->vertexOffset);
}
currVertexBufferID = pDb->vertexBufferID;
}
(void) pPrimitiveLoop(tb, pDb, pDb + 1);
pDb++;
……
Step 3 - Upon exit, free VBO buffers (viewperf.c):
glBindBuffer(GL_ARRAY_BUFFER, 0);
glDeleteBuffers(renderblock.numVertexBufferObjects, renderblock.vertexBufferObjects);
free(renderblock.vertexBufferObjects);
renderblock.numVertexBufferObjects = 0;
renderblock.vertexBufferObjects = NULL;
Preparing for the future
SPEC’s OpenGL Performance Characterization (SPECopc) project
group, the developers of SPECviewperf, expect VBOs to be an integral
part of the rendering path for future graphics-intensive applications.
VBOs have been added to SPECviewperf 8.1 to enable users and vendors
to begin testing performance for graphics applications that will potentially
use VBOs. No performance results using VBOs will be published on the
SPEC/GPC web site until VBOs become a part of applications represented
by viewsets within SPECviewperf.
This document was written by Ian Williams of NVIDIA (SPECopc chair)
and Evan Hart of ATI.