The PAX 2D Rendering Engine
Selected papers from: IBM RISC System/6000 Technology: Volume II, Sept. 23, 1993
SA23-2619-00

Charles R. Johns and Taggart Robertson
   The POWER GXT100 and GXT150 are 2D graphics adapters based on the Pixel Accelerator for X (PAX) 2D rendering engine and the IBM RGB 530 Palette Digital to Analog Converter (Palette–DAC). The PAX engine draws or renders the graphical data sent from the 601 processor complex and the RGB 530 displays the data on the screen.
   The GXT100 has enough memory to support 1024 x 768, GXT150 has added memory to support up to a 1280 x 1024 screen and to utilize the multiple color palettes in the IBM RGB 530 Palette–DAC. Both the GXT100 and the GXT150 are 8–bit adapters which require AIXwindows Environment/6000 Version 1.2.5 2D and AIX Version 3.2.5.

Design Goals and Considerations
   The major goal of the PAX 2D rendering engine is for entry workstations. The architecture is dedicated to accelerating X Window applications. The architecture of a graphics subsystem can influence X Window performance and time to market in several areas. The PAX architecture targeted three areas: efficient interfaces, simple programming model, and rendering speed.

Graphics Subsystem Organization
   The POWER GXT100 and GXT150 consists of the PAX 2D rendering engine, the IBM RGB 530 Palette–DAC, 1M to 3M bytes of frame buffer memory, and the initialization Read Only Memory (ROM). Figure 1 shows the complete block diagram and the major interfaces of the graphics subsystem. The PAX chip serves as the graphics accelerator as well as the system interface for the adapter. It contains the control for accessing the initialization ROM, the Palette–DAC control port, and the video RAM (VRAM) parallel port.

Graphics Subsystem Block Diagram

   PAX attaches directly to the PowerPC 601 Microprocessor bus. Attaching directly to the
PowerPC 601 bus improves performance by avoiding the latency and synchronization overhead created when converting from one bus to another. The 601 bus is a split address and data bus with 32 bits of address and 64 bits of data [1]. The processor uses byte, half–word, or word operations to access the graphics subsystem. Based on the address of the operation, PAX directs the access to one of four locations: the internal registers, the frame buffer (i.e. the VRAM),
the RGB 530 Palette–DAC, or the initialization ROM.

PAX Block Diagram

The frame buffer is constructed of specialized memory called video RAM (VRAM). These
memory devices contain key features for increasing graphics performance including:
 Serial output port
 Block Write
 Write per bit

   The serial output port is a dedicated high speed interface used by the Palette–DAC for scanning pixels out of the frame buffer VRAM. This interface eliminates the overhead of refreshing the screen from the parallel port which is often required with more conventional DRAM interfaces.

   Block Write is a unique feature of VRAM which allows a constant color to be written to multiple locations within a single write cycle. This feature allows PAX to render up to 32 pixels in a single write cycle.

   The Write per bit feature provides a write mask which the rendering engine uses to select which bits of the pixel are to be updated. This feature eliminates costly read / modify / write operations.

   PAX’s 32–bit frame buffer architecture exploits these VRAM features, as well as employs advanced interleaving techniques referred to as Pixel Interleaving [2] and Load Clock Interleaving to enhance performance. The architecture supports 1M to 5M bytes of memory which provides the capability to support screen resolutions ranging from 1024 x 768 to 1280 x 1024. With 2M bytes or more of memory, PAX can support a double buffer display.

   The IBM RGB 530 Palette–DAC serves as the video controller. It provides four color palettes, video output (red, green, blue), display timings, VRAM serial port control, and two hardware cursors. This device also provides an on–chip programmable Phase Locked Loop (PLL) which allows the graphics subsystem to support a wide range of monitors with varying timing requirements.

   As the block diagram and descriptions illustrate, most of the logic required for the graphics subsystem is integrated into the PAX and the IBM RGB 530 custom chips. This level of integration is key to keeping the cost of the subsystem to a minimum while enhancing the performance and function. 

Programming Models
The PAX chip supports two different programming models: Direct Frame Buffer Access (DFA) and ”Poly” commands. 

   Direct Frame Buffer Access allows the Frame Buffer to be accessed by the PowerPC 601 processor as if it were part of system memory. This simplified interface is very effective in reducing the X Window software development time. X Window software development uses DFA to directly map the X Window System code received from Massachusetts Institure of Technology (MIT) into the initial device driver for the adapter. Then, X Window software development
optimizes the device driver to exploit the hardware accelerated functions. DFA is also key to the performance of certain X Window commands such as points, circles, and complex area fills because there is no additional hardware acceleration provided for these primitives.

    The ”Poly” command interface provides a rich set of rendering instructions which complement the DFA programming model. The word ”Poly” refers to the ability to render multiple primitives of the same type with only one command. The ability to render multiple primitives with only one command eliminates the overhead of sending a new command with each primitive. The command set is designed to map closely to the X Window protocol. The performance of accelerated functions using the ”Poly” command interface is significantly faster than DFA. Some of these accelerated functions include line draw, area fill, and bit block transfer.

Rendering Engine Architecture
The PAX rendering engine architecture includes rendering functions, rendering attributes, and 3D Application Programming Interface (API) assist functions.

Rendering Functions
The PAX architecture consists of several processing units for accelerating X Window System rendering commands. These processing units are used to accelerate:
 Line Draw
 Points
 Area Fill
 Bit Block Transfer

See Figure 2 for a block diagram of the internal architecture. These processing units provide the X Window server (X server) with dedicated hardware to accelerate the rendering of lines, points, and area fills. In addition to rendering, PAX also provides assistance for moving pixels to and from system memory or to and from another screen location. This function is referred to as Bit Block Transfer (Blits).

   Vertices define the boundary of the region to be rendered. These vertices are included in ”Poly” commands. All vertices for these drawing commands can be sent as 16–bit, two’s complement, window relative coordinates. This 16–bit vertex provides the X server with a 64 K x 64 K virtual screen. The origin of this virtual screen space is at the center, thus allowing for both positive and negative X and Y addresses. 

Line Draw 
   The line draw engine provides the rasterization of a line between two points provided by an application. A technique known as Bresenham’s line algorithm [3] is used to render the line. PAX supports two commands for rendering lines: Poly Line and Poly Segment.

   Every vertex sent with the Poly Line command defines a new line. This is useful for drawing connected lines. The Poly Segment command requires two vertices to define a line. This command is used to render multiple non–connecting lines.

   The X Window protocol supports styled lines. These are referred to as OnOffDashed and
DoubleDashed lines [4]. An OnOffDashed line appears as a line with sections, or dashes, rendered in the foreground color and sections not rendered. DoubleDashed lines appear as a line with sections rendered in the foreground color and sections rendered in the background color.

   PAX supports a set of dash counters which allow the server to define a line style with up to 8 unique segments or dashes. The X server selects between Dashed and Double Dashed by enabling transparent rendering.

   The major performance bottleneck for the line generation logic is the frame buffer. As mentioned earlier, PAX employs Pixel Interleaving and Load Clock Interleaving to reduce this bottleneck.

Points
   Points are rendered using either DFA or ”Poly” point commands. Points are not accelerated since there is no significant processing required to generate them. Instead, points are provided a fast path through the hardware so that they are rendered into the frame buffer at the maximum frame buffer bandwidth.

Area Fill
   PAX supports four types of area fills: Spans, Triangles, Rectangles, and Quadrilaterals. A
unique command is dedicated to each type of area fill operation to reduce the number of stores required. The Block Write VRAM feature is used to increase the fill rate for these objects. The Block Write feature allows PAX to write up to 32 pixels in one memory cycle – four times faster than normal writes. 

   Spans are a continuous row of pixels. These are the basic area fill primitives. All other types of area fills are broken down into spans by the internal area fill logic.

   The triangle and quadrilateral fill logic uses the line logic and another simpler line generator to find the edges of the area. Since only two line generators (edge walkers) are available, PAX can only support quadrilaterals which are convex in the Y direction. See Figure 3 for examples of the
supported and unsupported areas.

   Rectangles are a special case of a quadrilateral and are handled separately. Special casing rectangles provides additional performance since the overhead of walking the edges is eliminated. This extra performance is beneficial to window management and clearing areas on the screen.

     Quadrilaterals, rectangles, and triangles can be drawn in one of two modes: X Window compliant or Full Fill. The X Window mode draws the fill area such that when an object is connected to other objects along an edge, no pixel is written twice. This is accomplished by
rendering pixels whose center lies inside the area, on the left or top edge, but not pixels on the right and bottom edges. Full Fill mode draws all pixels including the edges. 

Figure 3 Area Fill Examples

Bit Block Transfer
   Bit Block Transfers (Blits) are a hardware assist for the movement of blocks of pixel data. PAX supports three types of Blits: Screen to Screen, System to Screen, and Screen to System. They are used extensively by the X server to accelerate the movement of windows.

   Screen to Screen Blits are used to copy a block of pixels from one location on the screen to another. PAX automatically handles overlapping source and destination blocks so that the source block appears correctly at the destination.

   Screen to Screen Blits continuously switch between frame buffer reads and writes. These transitions drastically reduce the usable frame buffer bandwidth. PAX has an internal buffer to reduce the number of transitions which increases the utilization of the frame buffer’s bandwidth.

   System to Screen and Screen to System Blits are used to copy pixels between system memory and the frame buffer. There are two modes of operation for these commands: Direct and Indirect. In Direct mode, system software controls the pixel transfer from system memory and PAX only controls the frame buffer address. In Indirect mode, the PAX chip becomes a master of the PowerPC bus and completes the transfer of data with no processor intervention. The software in
this case only supplies the source and destination addresses.

   PAX supports a special System to Screen Blit mode which accelerates character performance. When operating in this mode, each bit of the data sent is interpreted as a pixel. The hardware renders the foreground color for all the 1s in the data word. The background color is rendered for all the 0s in the data word if transparency is disabled and nothing is rendered if transparency is
enabled. With transparency enabled, this function uses the Block Write VRAM feature.

Rendering Attributes
   PAX also supports a variety of rendering attributes. These attributes can be applied to both the DFA and ”Poly” command programming models. Some of these attributes are listed below.
 Boolean operations
 Window Management
 Stipple

These attributes modify and control the rendering functions by modifying vertices and pixel generation. Different attributes apply at different stages in the rendering process.

Boolean Operations
       The X Window protocol specifies the rendered pixel to be a combination of the pixel’s current color (destination) and the source color. One of 16 logical functions can be selected, all of which are supported by the PAX architecture. Below is a list of the logical functions.

 Clear (0)               Set (1)
 Destination (D)(NoOp)   Source (S) (Copy)
 !S                      !D
 S & D                   S | D
 S & !D                  S | !D
 !S & D                  !S | D
 !S & !D                 !S | !D
 S^D                     !(S^D)

  By implementing Boolean operations in hardware, the X server is relieved of the slow task of reading the frame buffer, modifying the color, and writing the new color to the frame buffer. This function is extremely important to applications which require Boolean operations.

Windows Management
   Applications running under the X Window System request a window from the X server. The user is free to resize and move these windows at any time. Since multiple windows may be open on the screen, there is a possibility a window may be partially obscured by others. It is the X server’s responsibility to manage these windows so that pixels rendered to the obscured sections are not visible. PAX supports the following functions which assist the X server:
 Rectangular Clippers
 Clipping Planes
 Window Origin Offset
 Window ID planes

   The Rectangular Clipping logic consists of four pairs of extent registers. Each pair defines a rectangular region with either an inclusive (all pixels inside the region are rendered) or exclusive (all pixels inside the region are NOT rendered) attribute. The X server uses these registers to define the window’s geometry to the PAX chip so that pixels in the obscured sections of the window
are clipped.

   Four regions are not always enough to define a window’s geometry. Such is the case when the window is obscured in four or more unique areas or when shaped (i.e. non rectangular) windows are used. For these cases, the X server may render the window’s geometry to off screen memory, referred to as clipping planes, with a unique ID. PAX then reads the pixel’s corresponding clip plane value and compares it with the clipping ID to determine if the pixel should be written to the
frame buffer.

   Applications draw using a window relative coordinate system. Vertices in this system must be converted to screen coordinates before they can be used to render an object. This conversion is accomplished, in hardware, by adding the Window Origin Offset to every pixel before it is  rendered. Providing this capability in the PAX chip eliminates the task of converting the coordinates in software which increases the rate at which the vertices are sent to PAX. Overall, adding the window origin offset increases the performance of processor bound primitives such as points.

   Some applications require a different color palette than the default. If only one palette is available, the colors of other windows will change when focused on these types of applications. The PAX chip supports an additional four planes of memory, referred to as Window ID planes, which allow the X server to select a unique palette on a per pixel basis. These planes also identify other attributes of the pixels such as: frame buffer
select, pixel type, and overlay plane enable. 

Stipple
   The X Window protocol allows a fill pattern, such as a checkerboard, to be applied to the
objects rendered [4]. This pattern is referred to as the Stipple pattern. The pattern can be transparent or opaque. Transparent patterns result in the foreground color being rendered where there are 1s in the pattern and nothing rendered where there are 0s in the pattern. The only difference for an opaque pattern is that the background color is rendered where there are 0s in the pattern.

   PAX supports a fixed 16 x 16 Stipple pattern. This pattern is addressed by the four least significant bits of the window coordinate of the pixel to be rendered. The value at that location is the stipple value for that pixel. PAX supports stippling for all rendering operations.

   Applying a transparent stipple pattern to an area fill operation does not effect the ability to use the Block Write function. However, applying an opaque stipple prevents the use of Block Write since two colors must be rendered (foreground and background). This normally reduces the performance of large opaque stippled objects by a factor of approximately four. To prevent such a drastic drop in performance, the PAX architecture employs a unique feature called Stipple
Invert. This feature allows an opaque stippled object to be rendered twice using Block Write. The second time it is rendered, the stipple pattern is inverted and the foreground and background colors are swapped. This simple feature almost doubles the performance for opaque stippled objects.

3D API Assist Functions
   PAX supports a few functions which are intended to enhance the performance of 3D
Application Programming Interfaces (APIs). These additional functions are listed below:
 Anti–Aliased Lines
 Sub pixel positioning of lines
 24–bit RGB to 8–bit RGB Dither

   The Anti–Aliased line draw function provides the system software with the ability to render lines without the familiar problem of ”stair steps” or ”jaggies” (i.e. aliasing). This function uses a proprietary two pixel approximation technique to visually remove the aliasing caused by the discrete pixels on a raster display.

   Sub pixel positioning of lines allows the system software to more precisely place the anti–aliased lines on the display.

   Dithering is a technique which trades spatial resolution for more color resolution. Essentially a 24–bit color value is converted to an 8–bit value and then slightly modified based on the pixel’s position in the window. The overall appearance is that the graphics subsystem has more than 256 colors.

9595 Main Page