Table of Contents for GPGPU Programming For Games and Science

1 Introduction
2 CPU Computing
  2.1 Numerical Computing
      2.1.1 The Curse: An Example from Games
      2.1.2 The Curse: An Example From Science
      2.1.3 The Need to Understand Floating-Point Systems
  2.2 Balancing Robustness, Accuracy, and Speed
      2.2.1 Robustness
            2.2.1.1 Formal Definitions
            2.2.1.2 Algorithms and Implementations
            2.2.1.3 Practical Definitions
      2.2.2 Accuracy
      2.2.3 Speed
      2.2.4 Computer Science is a Study of Trade-offs
  2.3 IEEE Floating Point Standard
  2.4 Binary Scientific Notation
      2.4.1 Conversion from Rational to Binary Scientific Numbers
      2.4.2 Arithmetic Properties of Binary Scientific Numbers
            2.4.2.1 Addition of Binary Scientific Numbers
            2.4.2.2 Subtraction of Binary Scientific Numbers
            2.4.2.3 Multiplication of Binary Scientific Numbers
            2.4.2.4 Division of Binary Scientific Numbers
      2.4.3 Algebraic Properties of Binary Scientific Numbers
  2.5 Floating-Point Arithmetic
      2.5.1 Binary Encodings
            2.5.1.1 8-Bit Floating-Point Numbers
            2.5.1.2 16-Bit Floating-Point Numbers
            2.5.1.3 32-Bit Floating-Point Numbers
            2.5.1.4 64-Bit Floating-Point Numbers
            2.5.1.5 n-Bit Floating-Point Numbers
            2.5.1.6 Classifications of Floating-Point Numbers
      2.5.2 Rounding and Conversions
            2.5.2.1 Rounding with Ties-to-Even
            2.5.2.2 Rounding with Ties-to-Away
            2.5.2.3 Rounding Toward Zero
            2.5.2.4 Rounding Toward Positive
            2.5.2.5 Rounding Toward Negative
            2.5.2.6 Rounding from Floating-Point to Integral Floating-Point
            2.5.2.7 Conversion from Integer to Floating-Point
            2.5.2.8 Conversion from Floating-Point to Rational
            2.5.2.9 Conversion from Rational to Floating-Point
            2.5.2.10 Conversion to Wider Format
            2.5.2.11 Conversion to Narrower Format
      2.5.3 Arithmetic Operations
      2.5.4 Mathematical Functions
      2.5.5 Floating-Point Oddities
            2.5.5.1 Where Have All My Digits Gone?
            2.5.5.2 Have a Nice Stay!
            2.5.5.3 The Best I Can Do is That Bad?
            2.5.5.4 You Have Been More Than Helpful
            2.5.5.5 Hardware and Optimizing Compiler Issues
3 SIMD Computing
  3.1 Intel Streaming SIMD Extensions
      3.1.1 Shuffling Components
      3.1.2 Single-Component versus All-Component Access
      3.1.3 Load and Store Instructions
      3.1.4 Logical Instructions
      3.1.5 Comparison Instructions
      3.1.6 Arithmetic Instructions
      3.1.7 Matrix Multiplication and Transpose
      3.1.8 IEEE Floating-Point Support
      3.1.9 Keep the Pipeline Running
      3.1.10 Flattening of Branches
  3.2 SIMD Wrappers
  3.3 Function Approximations
      3.3.1 Minimax Approximations
      3.3.2 Inverse Square Root Function using Root Finding
      3.3.3 Square Root Function
      3.3.4 Inverse Square Root Function using a Minimax Algorithm
      3.3.5 Sine Function
      3.3.6 Cosine Function
      3.3.7 Tangent Function
      3.3.8 Inverse Sine Function
      3.3.9 Inverse Cosine Function
      3.3.10 Inverse Tangent Function
      3.3.11 Exponential Functions
      3.3.12 Logarithmic Functions
4 GPU Computing
  4.1 Drawing a 3D Object
      4.1.1 Model Space
      4.1.2 World Space
      4.1.3 View Space
      4.1.4 Projection Space
      4.1.5 Window Space
      4.1.6 Summary of the Transformations
      4.1.7 Rasterization
  4.2 High Level Shading Language (HLSL)
      4.2.1 Vertex and Pixel Shaders
      4.2.2 Geometry Shaders
      4.2.3 Compute Shaders
      4.2.4 Compiling HLSL Shaders
            4.2.4.1 Compiling the Vertex Coloring Shaders
            4.2.4.2 Compiling the Texturing Shaders
            4.2.4.3 Compiling the Billboard Shaders
            4.2.4.4 Compiling the Gaussian Blurring Shaders
      4.2.5 Reflecting HLSL Shaders
  4.3 Devices, Contexts, and Swap Chains
      4.3.1 Creating a Device and an Immediate Context
      4.3.2 Creating Swap Chains
      4.3.3 Creating the Back Buffer
  4.4 Resources
      4.4.1 Resource Usage and CPU Access
      4.4.2 Resource Views
      4.4.3 Subresources
      4.4.4 Buffers
            4.4.4.1 Constant Buffers
            4.4.4.2 Texture Buffers
            4.4.4.3 Vertex Buffers
            4.4.4.4 Index Buffers
            4.4.4.5 Structured Buffers
            4.4.4.6 Raw Buffers
            4.4.4.7 Indirect-Argument Buffers
      4.4.5 Textures
            4.4.5.1 1D Textures
            4.4.5.2 2D Textures
            4.4.5.3 3D Textures
      4.4.6 Texture Arrays
            4.4.6.1 1D Texture Arrays
            4.4.6.2 2D Texture Arrays
            4.4.6.3 Cubemap Textures
            4.4.6.4 Cubemap Texture Arrays
      4.4.7 Draw Targets
  4.5 States
  4.6 Shaders
      4.6.1 Creating Shaders
      4.6.2 Vertex, Geometry, and Pixel Shader Execution
      4.6.3 Compute Shader Execution
  4.7 Copying Data between CPU and GPU
      4.7.1 Mapped Writes for Dynamic Update
      4.7.2 Staging Resources
      4.7.3 Copy from CPU to GPU
      4.7.4 Copy from GPU to CPU
      4.7.5 Copy from GPU to GPU
  4.8 Multiple GPUs
      4.8.1 Enumerating the Adapters
      4.8.2 Copying Data between Multiple GPUs
  4.9 IEEE Floating-Point on the GPU
5 Practical Matters
  5.1 Engine Design and Architecture
      5.1.1 A Simple Low-Level D3D11 Application
      5.1.2 HLSL Compilation in Microsoft Visual Studio
      5.1.3 Design Goals for the Geometric Tools Engine
            5.1.3.1 An HLSL Factory
            5.1.3.2 Resource Bridges
            5.1.3.3 Visual Effects
            5.1.3.4 Visual Objects and Scene Graphs
            5.1.3.5 Cameras
  5.2 Debugging
      5.2.1 Debugging on the CPU
      5.2.2 Debugging on the GPU
      5.2.3 Be Mindful of Your Surroundings
            5.2.3.1 An Example of an HLSL Compiler Bug
            5.2.3.2 An Example of a Programmer Bug
  5.3 Performance
      5.3.1 Performance on the CPU
      5.3.2 Performance on the GPU
      5.3.3 Performance Guidelines
  5.4 Code Testing
      5.4.1 Topics in Code Testing
      5.4.2 Code Coverage and Unit Testing on the GPU
6 Linear Algebra
  6.1 Vectors
      6.1.1 Robust Length and Normalization Computations
      6.1.2 Orthogonality
            6.1.2.1 Orthogonality in 2D
            6.1.2.2 Orthogonality in 3D
            6.1.2.3 Orthogonality in 4D
            6.1.2.1 Gram-Schmidt Orthonormalization
      6.1.3 Orthonormal Sets
            6.1.3.1 Orthonormal Sets in 2D
            6.1.3.2 Orthonormal Sets in 3D
            6.1.3.3 Orthonormal Sets in 4D
      6.1.4 Barycentric Coordinates
      6.1.5 Intrinsic Dimensionality
  6.2 Matrices
      6.2.1 Matrix Storage and Transform Conventions
      6.2.2 Base Class Matrix Operations
      6.2.3 Square Matrix Operations in 2D
      6.2.4 Square Matrix Operations in 3D
      6.2.5 Square Matrix Operations in 4D
      6.2.6 The Laplace Expansion Theorem
  6.3 Rotations
      6.3.1 Rotations in 2D
      6.3.2 Rotations in 3D
      6.3.3 Rotations in 4D
      6.3.4 Quaternions
            6.3.4.1 Algebraic Operations
            6.3.4.2 Relationship of Quaternions to Rotations
            6.3.4.3 Spherical Linear Interpolation of Quaternions
      6.3.5 Euler Angles
            6.3.5.1 World Coordinates versus Body Coordinates
      6.3.6 Conversion between Representations
            6.3.6.1 Quaternion to Matrix
            6.3.6.2 Matrix to Quaternion
            6.3.6.3 Axis-Angle to Matrix
            6.3.6.4 Matrix to Axis-Angle
            6.3.6.5 Axis-Angle to Quaternion
            6.3.6.6 Quaternion to Axis-Angle
            6.3.6.7 Euler Angles to Matrix
            6.3.6.8 Matrix to Euler Angles
            6.3.6.9 Euler Angles to and from Quaternion or Axis-Angle
  6.4 Coordinate Systems
      6.4.1 Geometry and Affine Algebra
      6.4.2 Transformations
            6.4.6.1 Composition of Affine Transformations
            6.4.6.2 Decomposition of Affine Transformations
            6.4.6.3 A Simple Transformation Factory
      6.4.3 Coordinate System Conventions
      6.4.4 Converting Between Coordinate Systems
7 Sample Applications
  7.1 Video Streams
      7.1.1 The VideoStream Class
      7.1.2 The VideoStreamManager Class
  7.2 Root Finding
      7.2.1 Root Bounding
      7.2.2 Bisection
      7.2.3 Newton's Method
      7.2.4 Exhaustive Evaluation
            7.2.4.1 CPU Root Finding using a Single Thread
            7.2.4.2 CPU Root Finding using Multiple Threads
            7.2.4.3 GPU Root Finding
  7.3 Least Squares Fitting
      7.3.1 Fit a Line to 2D Points
      7.3.2 Fit a Plane to 3D Points
      7.3.3 Orthogonal Regression
            7.3.3.1 Fitting with Lines
            7.3.3.2 Fitting with Planes
      7.3.4 Estimation of Tangent Planes
  7.4 Partial Sums
  7.5 All-Pairs Triangle Intersection
  7.6 Shortest Path in a Weighted Graph
  7.7 Convolution
  7.8 Median Filtering
      7.8.1 Median by Sorting
      7.8.2 Median of 3x3 using Min-Max Operations
      7.8.3 Median of 5x5 using Min-Max Operations
  7.9 Level Surface Extraction
  7.10 Mass-Spring Systems
  7.11 Fluid Dynamics
       7.11.1 Numerical Methods
       7.11.2 Solving Fluid Flow in 2D
              7.11.2.1 Initialization of State
              7.11.2.2 Initialization of External Forces
              7.11.2.3 Updating the State with Advection
              7.11.2.4 Applying the State Boundary Conditions
              7.11.2.5 Computing the Divergence of Velocity
              7.11.2.6 Solving the Poisson Equation
              7.11.2.7 Updating the Velocity to be Divergence-Free
              7.11.2.8 Screen Captures from the Simulation
       7.11.3 Solving Fluid Flow in 3D
              7.11.3.1 Initialization of State
              7.11.3.2 Initialization of External Forces
              7.11.3.3 Updating the State with Advection
              7.11.3.4 Applying the State Boundary Conditions
              7.11.3.5 Computing the Divergence of Velocity
              7.11.3.6 Solving the Poisson Equation
              7.11.3.7 Updating the Velocity to be Divergence-Free
              7.11.3.8 Screen Captures from the Simulation