2052 lines
89 KiB
HTML
2052 lines
89 KiB
HTML
<meta charset="utf-8">
|
|
**NVIDIA Vulkan Ray Tracing Tutorial**
|
|
<small>
|
|
By [Martin-Karl Lefrançois](https://devblogs.nvidia.com/author/mlefrancois/),
|
|
[Pascal Gautron](https://devblogs.nvidia.com/author/pgautron/), Neil Bickford
|
|
</small>
|
|
|
|
|
|
The focus of this document and the provided code is to showcase a basic integration of
|
|
ray tracing within an existing Vulkan sample, using the
|
|
[`VK_KHR_ray_tracing`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VK_KHR_ray_tracing)
|
|
extension. This tutorial starts from a basic Vulkan application and provides step-by-step instructions to modify and add
|
|
methods and functions. The sections are organized by components, with subsections identifying the modified functions.
|
|
|
|

|
|
|
|
!!! Note GitHub repository
|
|
https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR
|
|
|
|
# Introduction
|
|
<script type="preformatted">
|
|
This tutorial highlights the steps to add ray tracing to an existing Vulkan application, and assumes a working knowledge
|
|
of Vulkan in general. The code verbosity of classical components such as swapchain management, render passes etc. is
|
|
reduced using [C++ API helpers](https://github.com/nvpro-samples/shared_sources/tree/master/nvvk) and
|
|
NVIDIA's [nvpro-samples](https://github.com/nvpro-samples/build_all) framework. This framework contains many advanced
|
|
examples and best practices for Vulkan and OpenGL. We also use a helper for the creation of the ray tracing acceleration
|
|
structures, but we will document its contents extensively in this tutorial. The code is further simplified by using the
|
|
[Vulkan C++ API](https://github.com/KhronosGroup/Vulkan-Hpp), whose type safety and constructors reduce both its
|
|
verbosity and its potential for errors.
|
|
|
|
!!! Note Note
|
|
For educational purposes all the code is contained in a very small set of files.
|
|
A real integration would require additional levels of abstraction.
|
|
|
|
[//]: # This may be the most platform independent comment
|
|
|
|
# Environment Setup
|
|
|
|
**The preferred way** to download the project (including NVVK) is to use the
|
|
nvpro-samples `build_all` script.
|
|
|
|
In a command line, clone the `nvpro-samples/build_all` repository from
|
|
https://github.com/nvpro-samples/build_all:
|
|
|
|
~~~~~
|
|
git clone https://github.com/nvpro-samples/build_all.git
|
|
~~~~~
|
|
|
|
Then open the `build_all` folder and run either `clone_all.bat` (Windows) or
|
|
`clone_all.sh` (Linux).
|
|
|
|
**If you want to clone as few repositories as possible**, open a command line,
|
|
and run the following commands to clone the repositories you need:
|
|
~~~~~
|
|
git clone https://github.com/nvpro-samples/shared_sources.git
|
|
git clone https://github.com/nvpro-samples/shared_external.git
|
|
git clone https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR.git
|
|
~~~~~
|
|
|
|
## Generating the Solution
|
|
|
|
One typical way to store the build system is to create a `build` directory below the
|
|
main project. You can use CMake-GUI or do the following steps.
|
|
|
|
~~~~~
|
|
cd vk_raytracing_tutorial_KHR
|
|
mkdir build
|
|
cd build
|
|
cmake ..
|
|
~~~~~
|
|
|
|
## Beta Installation
|
|
|
|
The SDK 1.2.161 and up which can be found under https://vulkan.lunarg.com/sdk/home will work with this project.
|
|
|
|
Nevertheless, if you are in the Beta period, it is suggested to install and compile all of the following and replace
|
|
with the current environment.
|
|
|
|
* Latest driver: https://developer.nvidia.com/vulkan-driver
|
|
* Vulkan headers: https://github.com/KhronosGroup/Vulkan-Headers
|
|
* Validator: https://github.com/KhronosGroup/Vulkan-ValidationLayers
|
|
* Vulkan-Hpp: https://github.com/KhronosGroup/Vulkan-Hpp
|
|
|
|
!!! Tip Visual Assist
|
|
To get auto-completion, edit vulkan.hpp and change two places from:<br>
|
|
`namespace VULKAN_HPP_NAMESPACE` to `namespace vk`
|
|
|
|
# Compiling & Running
|
|
|
|
Open the solution located in the build directory, then compile and run `vk_ray_tracing__before_KHR`.
|
|
|
|
This will be the starting point of the tutorial. This project is a simple framework allowing us to load OBJ files and rasterize them
|
|
using Vulkan.
|
|
|
|

|
|
|
|
|
|
The following steps in the tutorial will be modifying this project
|
|
`vk_ray_tracing__before_KHR` and will add support for ray tracing. The
|
|
end result of the tutorial is the project `vk_ray_tracing__simple_KHR`.
|
|
It is possible to look in that project if something went wrong.
|
|
|
|
The project `vk_ray_tracing__simple_KHR` will be the starting point for the
|
|
extra tutorials.
|
|
|
|
|
|
# Ray Tracing Setup
|
|
|
|
Go to the `main` function of the `main.cpp` file, and find where we request Vulkan extensions with
|
|
`nvvk::ContextCreateInfo`.
|
|
To be able to use ray tracing, we will need VK_KHR_ACCELERATION_STRUCTURE and VK_KHR_RAY_TRACING_PIPELINE.
|
|
Those extensions have also dependencies on other extension, therefore all the following
|
|
extensions will need to be added.
|
|
|
|
```` C
|
|
// #VKRay: Activate the ray tracing extension
|
|
vk::PhysicalDeviceAccelerationStructureFeaturesKHR accelFeature;
|
|
contextInfo.addDeviceExtension(VK_KHR_ACCELERATION_STRUCTURE_EXTENSION_NAME, false,
|
|
&accelFeature);
|
|
vk::PhysicalDeviceRayTracingPipelineFeaturesKHR rtPipelineFeature;
|
|
contextInfo.addDeviceExtension(VK_KHR_RAY_TRACING_PIPELINE_EXTENSION_NAME, false,
|
|
&rtPipelineFeature);
|
|
contextInfo.addDeviceExtension(VK_KHR_MAINTENANCE3_EXTENSION_NAME);
|
|
contextInfo.addDeviceExtension(VK_KHR_PIPELINE_LIBRARY_EXTENSION_NAME);
|
|
contextInfo.addDeviceExtension(VK_KHR_DEFERRED_HOST_OPERATIONS_EXTENSION_NAME);
|
|
contextInfo.addDeviceExtension(VK_KHR_BUFFER_DEVICE_ADDRESS_EXTENSION_NAME);
|
|
|
|
````
|
|
|
|
Before creating the device, a linked structure of features must past. Not all extensions
|
|
requires a set of features, but ray tracing features must be enabled before the creation of the device.
|
|
By providing `accelFeature`, and `rtPipelineFeature`, the context creation will query the capable features
|
|
for ray tracing and will use the filled structure to create the device.
|
|
|
|
In the `HelloVulkan` class in `hello_vulkan.h`, add an initialization function and a member storing the capabilities of
|
|
the GPU for ray tracing:
|
|
|
|
```` C
|
|
// #VKRay
|
|
void initRayTracing();
|
|
vk::PhysicalDeviceRayTracingPipelinePropertiesKHR m_rtProperties;
|
|
````
|
|
|
|
At the end of `hello_vulkan.cpp`, add the body of `initRayTracing()`, which will query the ray tracing capabilities
|
|
of the GPU using this extension. In particular, it will obtain the maximum recursion depth,
|
|
ie. the number of nested ray tracing calls that can be performed from a single ray. This can be seen as the number
|
|
of times a ray can bounce in the scene in a recursive path tracer. Note that for performance purposes, recursion
|
|
should in practice be kept to a minimum, favoring a loop formulation. The shader header size will be useful when
|
|
creating the shader binding table in a later section.
|
|
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// Initialize Vulkan ray tracing
|
|
// #VKRay
|
|
void HelloVulkan::initRayTracing()
|
|
{
|
|
// Requesting ray tracing properties
|
|
auto properties =
|
|
m_physicalDevice.getProperties2<vk::PhysicalDeviceProperties2,
|
|
vk::PhysicalDeviceRayTracingPipelinePropertiesKHR>();
|
|
m_rtProperties = properties.get<vk::PhysicalDeviceRayTracingPipelinePropertiesKHR>();
|
|
}
|
|
````
|
|
|
|
## main
|
|
|
|
In `main.cpp`, in the `main()` function, we call the initialization method right after
|
|
`helloVk.updateDescriptorSet();`
|
|
|
|
```` C
|
|
// #VKRay
|
|
helloVk.initRayTracing();
|
|
````
|
|
|
|
!!! Note: Exercise
|
|
When running the program, you can put a breakpoint in the `initRayTracing()` method to inspect
|
|
the resulting values. On a Quadro RTX 6000, the maximum recursion depth is 31, and the shader
|
|
group handle size is 16.
|
|
|
|
# Acceleration Structure
|
|
|
|
To be efficient, ray tracing requires organizing the geometry into an acceleration structure (AS)
|
|
that will reduce the number of ray-triangle intersection tests during rendering.
|
|
This structure is divided into a two-level tree. Intuitively, this can directly map to the notion
|
|
of a simplified scene graph, in which the internal nodes of the graph have been collapsed into a single
|
|
transform matrix for each instance. The geometry of an instance is stored in a bottom-level acceleration structure
|
|
(BLAS) object, which holds the actual vertex data. It is also possible to further simplify the scene graph by combining
|
|
multiple objects within a single bottom-level AS: for that, a single BLAS can be built from multiple vertex buffers, each with
|
|
its own transform matrix. Note that if an object is instantiated several times within a same BLAS, its geometry
|
|
will be duplicated. This can be particularly useful for improving performance on static, non-instantiated
|
|
scene components (as a rule of thumb, the fewer BLAS, the better).
|
|
|
|
The top-level AS (TLAS) will contain the object instances, each
|
|
with its own transformation matrix and reference to a corresponding BLAS.
|
|
We will start with a single bottom-level AS and a top-level AS instancing it once with an identity transform.
|
|
|
|
|
|
![Figure [step]: Acceleration Structure](Images/AccelerationStructure.svg)
|
|
|
|
This sample loads an OBJ file and stores its indices, vertices and material data into an `ObjModel` structure. This
|
|
model is referenced by an `ObjInstance` structure which also contains the transformation matrix of that particular
|
|
instance. For ray tracing the `ObjModel` and `ObjInstance` will then naturally fit the BLAS and TLAS, respectively.
|
|
|
|
To simplify the ray tracing setup we use a helper class containing utility functions for
|
|
acceleration structure builds. In the header file, include the`raytrace_vkpp` helper
|
|
|
|
```` C
|
|
// #VKRay
|
|
#include "nvvk/raytrace_vk.hpp"
|
|
````
|
|
|
|
so that we can add that helper as a member in the `HelloVulkan` class,
|
|
|
|
```` C
|
|
nvvk::RaytracingBuilder m_rtBuilder;
|
|
````
|
|
|
|
and initialize it at the end of `initRaytracing()`:
|
|
|
|
```` C
|
|
m_rtBuilder.setup(m_device, m_alloc, m_graphicsQueueIndex);
|
|
````
|
|
|
|
## Bottom-Level Acceleration Structure
|
|
|
|
The first step of building a BLAS object consists in converting the geometry data of an `ObjModel` into a
|
|
multiple structures than can be used by the AS builder. We are holding all those structure under
|
|
`nvvk::RaytracingBuilderKHR::Blas`
|
|
|
|
Add a new method to the `HelloVulkan`
|
|
class:
|
|
|
|
```` C
|
|
nvvk::RaytracingBuilderKHR::Blas objectToVkGeometryKHR(const ObjModel& model);
|
|
````
|
|
|
|
Its implementation will fill three structures
|
|
|
|
* vk::AccelerationStructureGeometryTrianglesDataKHR: defines the data from which the AS will be constructed.
|
|
* vk::AccelerationStructureGeometryKHR: the geometry type for building the AS, in this case, from triangles.
|
|
* vk::AccelerationStructureBuildRangeInfoKHR: the offset, which correspond to the actual wanted geometry when building.
|
|
|
|
Multiple of the above structure can be combined to create a single blas. In this example,
|
|
the array will always be a length of one.
|
|
|
|
Note that we consider all objects opaque for now, and indicate this to the builder for
|
|
potential optimization.
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// Converting a OBJ primitive to the ray tracing geometry used for the BLAS
|
|
//
|
|
nvvk::RaytracingBuilderKHR::Blas HelloVulkan::objectToVkGeometryKHR(const ObjModel& model)
|
|
{
|
|
// Building part
|
|
vk::DeviceAddress vertexAddress = m_device.getBufferAddress({model.vertexBuffer.buffer});
|
|
vk::DeviceAddress indexAddress = m_device.getBufferAddress({model.indexBuffer.buffer});
|
|
|
|
uint32_t maxPrimitiveCount = model.nbIndices / 3;
|
|
|
|
vk::AccelerationStructureGeometryTrianglesDataKHR triangles;
|
|
triangles.setVertexFormat(vk::Format::eR32G32B32Sfloat);
|
|
triangles.setVertexData(vertexAddress);
|
|
triangles.setVertexStride(sizeof(VertexObj));
|
|
triangles.setIndexType(vk::IndexType::eUint32);
|
|
triangles.setIndexData(indexAddress);
|
|
triangles.setTransformData({});
|
|
triangles.setMaxVertex(model.nbVertices);
|
|
|
|
// Setting up the build info of the acceleration
|
|
vk::AccelerationStructureGeometryKHR asGeom;
|
|
asGeom.setGeometryType(vk::GeometryTypeKHR::eTriangles);
|
|
asGeom.setFlags(vk::GeometryFlagBitsKHR::eOpaque);
|
|
asGeom.geometry.setTriangles(triangles);
|
|
|
|
// The primitive itself
|
|
vk::AccelerationStructureBuildRangeInfoKHR offset;
|
|
offset.setFirstVertex(0);
|
|
offset.setPrimitiveCount(maxPrimitiveCount);
|
|
offset.setPrimitiveOffset(0);
|
|
offset.setTransformOffset(0);
|
|
|
|
// Our blas is only one geometry, but could be made of many geometries
|
|
nvvk::RaytracingBuilderKHR::Blas blas;
|
|
blas.asGeometry.emplace_back(asGeom);
|
|
blas.asBuildOffsetInfo.emplace_back(offset);
|
|
|
|
return blas;
|
|
}
|
|
````
|
|
|
|
In the `HelloVulkan` class declaration, we can now add the `createBottomLevelAS()` method that will generate a
|
|
`nvvk::RaytracingBuilderKHR::Blas` for each object, and trigger a BLAS build:
|
|
|
|
```` C
|
|
void createBottomLevelAS();
|
|
````
|
|
|
|
The implementation loops over all the loaded models and fills in an array of `nvvk::RaytracingBuilderKHR::Blas` before
|
|
triggering a build of all BLAS's in a batch. The resulting acceleration structures will be stored
|
|
within the helper in the order of construction, so that they can be directly referenced by index later.
|
|
|
|
```` C
|
|
void HelloVulkan::createBottomLevelAS()
|
|
{
|
|
// BLAS - Storing each primitive in a geometry
|
|
std::vector<nvvk::RaytracingBuilderKHR::Blas> allBlas;
|
|
allBlas.reserve(m_objModel.size());
|
|
for(const auto& obj : m_objModel)
|
|
{
|
|
auto blas = objectToVkGeometryKHR(obj);
|
|
|
|
// We could add more geometry in each BLAS, but we add only one for now
|
|
allBlas.emplace_back(blas);
|
|
}
|
|
m_rtBuilder.buildBlas(allBlas, vk::BuildAccelerationStructureFlagBitsKHR::ePreferFastTrace);
|
|
}
|
|
````
|
|
|
|
|
|
### Helper Details: RaytracingBuilder::buildBlas()
|
|
|
|
This helper function is already present in `raytraceKHR_vkpp.hpp`: it can be reused in many projects, and is
|
|
part of the set of helpers provided by the [nvpro-samples](https://github.com/nvpro-samples). The function
|
|
will generate one BLAS for each `RaytracingBuilderKHR::Blas`:
|
|
|
|
```` C
|
|
void buildBlas(const std::vector<RaytracingBuilderKHR::Blas>& blas_,
|
|
VkBuildAccelerationStructureFlagsKHR flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
|
|
{
|
|
m_blas = blas_; // Keeping a copy
|
|
|
|
VkDeviceSize maxScratch{0}; // Largest scratch buffer for our BLAS
|
|
|
|
// Is compaction requested?
|
|
bool doCompaction = (flags & VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR)
|
|
== VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR;
|
|
std::vector<VkDeviceSize> originalSizes;
|
|
originalSizes.resize(m_blas.size());
|
|
|
|
// Iterate over the groups of geometries, creating one BLAS for each group
|
|
int idx{0};
|
|
for(auto& blas : m_blas)
|
|
{
|
|
````
|
|
|
|
The creation of the acceleration structure needs all `vk::AccelerationStructureCreateGeometryTypeInfoKHR` previously set and
|
|
set into `vk::AccelerationStructureCreateInfoKHR`.
|
|
|
|
```` C
|
|
VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
|
|
asCreateInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
|
|
asCreateInfo.flags = flags;
|
|
asCreateInfo.maxGeometryCount = (uint32_t)blas.asCreateGeometryInfo.size();
|
|
asCreateInfo.pGeometryInfos = blas.asCreateGeometryInfo.data();
|
|
````
|
|
|
|
The creation information is then passed to the allocator, that will internally create an acceleration structure handle.
|
|
It will also query `vk::Device::getAccelerationStructureMemoryRequirementsKHR` to obtain the size of the resulting BLAS,
|
|
and allocate memory accordingly.
|
|
|
|
```` C
|
|
// Create an acceleration structure identifier and allocate memory to
|
|
// store the resulting structure data
|
|
blas.as = m_alloc.createAcceleration(asCreateInfo);
|
|
m_debug.setObjectName(blas.as.accel, (std::string("Blas" + std::to_string(idx)).c_str()));
|
|
````
|
|
|
|
The acceleration structure builder requires some scratch memory to generate the BLAS. Since we generate all the
|
|
BLAS's in a batch, we query the scratch memory requirements for each BLAS, and find the maximum such requirement.
|
|
The amount of memory for the scratch is determined by filling the memory requirement structure, and setting
|
|
the previous created acceleration structure. At the time to write those lines, only the device can be use
|
|
for building the acceleration structure. The same scratch buffer is used by each BLAS, which is the reason to
|
|
allocate the largest size, to avoid any realocation. At the end of building all BLAS, we can dispose the scratch
|
|
buffer.
|
|
|
|
We are querying the size the acceleration structure is taking on the device as well. This has no real use except
|
|
for statistics and to compare it to the compact size which can happen in a second step.
|
|
|
|
```` C
|
|
// Estimate the amount of scratch memory required to build the BLAS, and
|
|
// update the size of the scratch buffer that will be allocated to
|
|
// sequentially build all BLASes
|
|
VkAccelerationStructureMemoryRequirementsInfoKHR memoryRequirementsInfo{
|
|
VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_KHR};
|
|
memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_KHR;
|
|
memoryRequirementsInfo.accelerationStructure = blas.as.accel;
|
|
memoryRequirementsInfo.buildType = VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR;
|
|
|
|
VkMemoryRequirements2 reqMem{VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2};
|
|
vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
|
|
VkDeviceSize scratchSize = reqMem.memoryRequirements.size;
|
|
|
|
|
|
blas.flags = flags;
|
|
maxScratch = std::max(maxScratch, scratchSize);
|
|
|
|
// Original size
|
|
memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_OBJECT_KHR;
|
|
vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
|
|
originalSizes[idx] = reqMem.memoryRequirements.size;
|
|
|
|
idx++;
|
|
}
|
|
````
|
|
|
|
Once that maximum has been found, we allocate a scratch buffer.
|
|
|
|
```` C
|
|
// Allocate the scratch buffers holding the temporary data of the acceleration structure builder
|
|
nvvkBuffer scratchBuffer =
|
|
m_alloc.createBuffer(maxScratch, VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
|
|
VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
|
|
bufferInfo.buffer = scratchBuffer.buffer;
|
|
VkDeviceAddress scratchAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);
|
|
````
|
|
|
|
To know the size that the BLAS is really taking, we use queries and setting the type to `VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR`.
|
|
This is needed if we want to compact the acceleration structure in a second step. By default, the
|
|
memory allocated by the creation of the acceleration structure has the size of the worst case. After creation,
|
|
the real space can be smaller, and it is possible to copy the acceleration structure to one that is
|
|
using exactly what is needed. This could save over 50% of the device memory usage.
|
|
|
|
```` C
|
|
// Query size of compact BLAS
|
|
VkQueryPoolCreateInfo qpci{VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO};
|
|
qpci.queryCount = (uint32_t)m_blas.size();
|
|
qpci.queryType = VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR;
|
|
VkQueryPool queryPool;
|
|
vkCreateQueryPool(m_device, &qpci, nullptr, &queryPool);
|
|
````
|
|
|
|
We then use multiple command buffers to launch all the BLAS builds. We are using multiple
|
|
command buffers instead of one, to allow the driver to allow system interuption and avoid a
|
|
TDR if the job was to heavy.
|
|
|
|
Note the barrier after each
|
|
build call: this is required as we reuse the scratch space across builds, and hence need to ensure
|
|
the previous build has completed before starting the next. We could have used multiple scratch buffers,
|
|
but it would have been expensive memory wise, and the device can only build one BLAS at a time, so we
|
|
wouldn't be faster.
|
|
|
|
```` C
|
|
// Query size of compact BLAS
|
|
VkQueryPoolCreateInfo qpci{VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO};
|
|
qpci.queryCount = (uint32_t)m_blas.size();
|
|
qpci.queryType = VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR;
|
|
VkQueryPool queryPool;
|
|
vkCreateQueryPool(m_device, &qpci, nullptr, &queryPool);
|
|
|
|
|
|
// Create a command buffer containing all the BLAS builds
|
|
nvvk::CommandPool genCmdBuf(m_device, m_queueIndex);
|
|
int ctr{0};
|
|
std::vector<VkCommandBuffer> allCmdBufs;
|
|
allCmdBufs.reserve(m_blas.size());
|
|
for(auto& blas : m_blas)
|
|
{
|
|
VkCommandBuffer cmdBuf = genCmdBuf.createCommandBuffer();
|
|
allCmdBufs.push_back(cmdBuf);
|
|
|
|
const VkAccelerationStructureGeometryKHR* pGeometry = blas.asGeometry.data();
|
|
VkAccelerationStructureBuildGeometryInfoKHR bottomASInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR};
|
|
bottomASInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
|
|
bottomASInfo.flags = flags;
|
|
bottomASInfo.update = VK_FALSE;
|
|
bottomASInfo.srcAccelerationStructure = VK_NULL_HANDLE;
|
|
bottomASInfo.dstAccelerationStructure = blas.as.accel;
|
|
bottomASInfo.geometryArrayOfPointers = VK_FALSE;
|
|
bottomASInfo.geometryCount = (uint32_t)blas.asGeometry.size();
|
|
bottomASInfo.ppGeometries = &pGeometry;
|
|
bottomASInfo.scratchData.deviceAddress = scratchAddress;
|
|
|
|
// Pointers of offset
|
|
std::vector<const VkAccelerationStructureBuildOffsetInfoKHR*> pBuildOffset(blas.asBuildOffsetInfo.size());
|
|
for(size_t i = 0; i < blas.asBuildOffsetInfo.size(); i++)
|
|
pBuildOffset[i] = &blas.asBuildOffsetInfo[i];
|
|
|
|
// Building the AS
|
|
vkCmdBuildAccelerationStructureKHR(cmdBuf, 1, &bottomASInfo, pBuildOffset.data());
|
|
|
|
// Since the scratch buffer is reused across builds, we need a barrier to ensure one build
|
|
// is finished before starting the next one
|
|
VkMemoryBarrier barrier{VK_STRUCTURE_TYPE_MEMORY_BARRIER};
|
|
barrier.srcAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR;
|
|
barrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR;
|
|
vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR,
|
|
VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR, 0, 1, &barrier, 0, nullptr, 0, nullptr);
|
|
|
|
// Query the compact size
|
|
if(doCompaction)
|
|
{
|
|
vkCmdWriteAccelerationStructuresPropertiesKHR(cmdBuf, 1, &blas.as.accel,
|
|
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR, queryPool, ctr++);
|
|
}
|
|
}
|
|
genCmdBuf.submitAndWait(allCmdBufs);
|
|
allCmdBufs.clear();
|
|
````
|
|
|
|
While this approach has the advantage of keeping all BLAS's independent, building many BLAS's efficiently would
|
|
require allocating a larger scratch buffer, and launch several builds simultaneously. This current tutorial
|
|
does not make use of compaction, which could reduce significantly the memory footprint of the acceleration structures. Both
|
|
of those aspects will be part of a future advanced tutorial.
|
|
|
|
The following is when compation flag is enabled. This part, which is optional, will compact the BLAS in the memory that it is really using. It needs to wait that all BLASes
|
|
are constructred, to make a copy in the more fitted memory space.
|
|
|
|
```` C
|
|
|
|
// Compacting all BLAS
|
|
if(doCompaction)
|
|
{
|
|
cmdBuf = genCmdBuf.createCommandBuffer();
|
|
|
|
// Get the size result back
|
|
std::vector<VkDeviceSize> compactSizes(m_blas.size());
|
|
vkGetQueryPoolResults(m_device, queryPool, 0, (uint32_t)compactSizes.size(), compactSizes.size() * sizeof(VkDeviceSize),
|
|
compactSizes.data(), sizeof(VkDeviceSize), VK_QUERY_RESULT_WAIT_BIT);
|
|
|
|
|
|
// Compacting
|
|
std::vector<nvvkAccel> cleanupAS(m_blas.size());
|
|
uint32_t totOriginalSize{0}, totCompactSize{0};
|
|
for(int i = 0; i < m_blas.size(); i++)
|
|
{
|
|
// LOGI("Reducing %i, from %d to %d \n", i, originalSizes[i], compactSizes[i]);
|
|
totOriginalSize += (uint32_t)originalSizes[i];
|
|
totCompactSize += (uint32_t)compactSizes[i];
|
|
|
|
// Creating a compact version of the AS
|
|
VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
|
|
asCreateInfo.compactedSize = compactSizes[i];
|
|
asCreateInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
|
|
asCreateInfo.flags = flags;
|
|
auto as = m_alloc.createAcceleration(asCreateInfo);
|
|
|
|
// Copy the original BLAS to a compact version
|
|
VkCopyAccelerationStructureInfoKHR copyInfo{VK_STRUCTURE_TYPE_COPY_ACCELERATION_STRUCTURE_INFO_KHR};
|
|
copyInfo.src = m_blas[i].as.accel;
|
|
copyInfo.dst = as.accel;
|
|
copyInfo.mode = VK_COPY_ACCELERATION_STRUCTURE_MODE_COMPACT_KHR;
|
|
vkCmdCopyAccelerationStructureKHR(cmdBuf, ©Info);
|
|
cleanupAS[i] = m_blas[i].as;
|
|
m_blas[i].as = as;
|
|
}
|
|
genCmdBuf.submitAndWait(cmdBuf);
|
|
|
|
// Destroying the previous version
|
|
for(auto as : cleanupAS)
|
|
m_alloc.destroy(as);
|
|
|
|
LOGI("------------------\n");
|
|
LOGI("Total: %d -> %d = %d (%2.2f%s smaller) \n", totOriginalSize, totCompactSize,
|
|
totOriginalSize - totCompactSize, (totOriginalSize - totCompactSize) / float(totOriginalSize) * 100.f, "%%");
|
|
}
|
|
````
|
|
|
|
Finally, destroying what was allocated.
|
|
|
|
```` C
|
|
vkDestroyQueryPool(m_device, queryPool, nullptr);
|
|
m_alloc.destroy(scratchBuffer);
|
|
m_alloc.finalizeAndReleaseStaging();
|
|
}
|
|
````
|
|
|
|
## Top-Level Acceleration Structure
|
|
|
|
The TLAS is the entry point in the ray tracing scene description, and stores all the instances. Add a new method
|
|
to the `HelloVulkan` class:
|
|
|
|
```` C
|
|
void createTopLevelAS();
|
|
````
|
|
|
|
An instance is represented by a `nvvk::RaytracingBuilder::Instance`, which stores its transform matrix (`transform`)
|
|
and the identifier of its corresponding BLAS (`blasId`). It also contains an instance identifier that will be available
|
|
during shading as `gl_InstanceCustomIndex`, as well as the index of the hit group that represents the shaders that will be
|
|
invoked upon hitting the object (`hitGroupId`).
|
|
This index and the notion of hit group are tied to the definition of the ray tracing pipeline and the Shader Binding
|
|
Table, described later in this tutorial. For now
|
|
it suffices to say that we will use only one hit group for the whole scene, and hence the hit group index is always 0.
|
|
Finally, the instance may indicate culling preferences, such as backface culling, using its `vk::GeometryInstanceFlagsKHR
|
|
flags` member. In our example we decide to disable culling altogether
|
|
for simplicity and independence on the winding of the input models.
|
|
|
|
Once all the instance objects are created we trigger the TLAS build, directing the builder to prefer generating a TLAS
|
|
optimized for tracing performance (rather than AS size, for example).
|
|
|
|
```` C
|
|
void HelloVulkan::createTopLevelAS()
|
|
{
|
|
std::vector<nvvk::RaytracingBuilderKHR::Instance> tlas;
|
|
tlas.reserve(m_objInstance.size());
|
|
for(int i = 0; i < static_cast<int>(m_objInstance.size()); i++)
|
|
{
|
|
nvvk::RaytracingBuilderKHR::Instance rayInst;
|
|
rayInst.transform = m_objInstance[i].transform; // Position of the instance
|
|
rayInst.instanceId = i; // gl_InstanceID
|
|
rayInst.blasId = m_objInstance[i].objIndex;
|
|
rayInst.hitGroupId = 0; // We will use the same hit group for all objects
|
|
rayInst.flags = VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR;
|
|
tlas.emplace_back(rayInst);
|
|
}
|
|
m_rtBuilder.buildTlas(tlas, vk::BuildAccelerationStructureFlagBitsKHR::ePreferFastTrace);
|
|
}
|
|
````
|
|
|
|
As usual in Vulkan, we need to explicitly destroy the objects we created by adding a call at the end of
|
|
`HelloVulkan::destroyResources`:
|
|
|
|
```` C
|
|
// #VKRay
|
|
m_rtBuilder.destroy();
|
|
````
|
|
|
|
### Helper Details: RaytracingBuilder::buildTlas()
|
|
|
|
The helper function for building top-level acceleration structures is part of the
|
|
[nvpro-samples](https://github.com/nvpro-samples)
|
|
and builds a TLAS from a vector of `Instance` objects. We first store some basic information about the TLAS, namely
|
|
the number of instances it will hold, and flags indicating preferences for the builder, such as whether to prefer faster
|
|
builds or better performance.
|
|
|
|
```` C
|
|
void buildTlas(const std::vector<Instance>& instances,
|
|
VkBuildAccelerationStructureFlagsKHR flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
|
|
{
|
|
m_tlas.flags = flags;
|
|
|
|
VkAccelerationStructureCreateGeometryTypeInfoKHR geometryCreate{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_GEOMETRY_TYPE_INFO_KHR};
|
|
geometryCreate.geometryType = VK_GEOMETRY_TYPE_INSTANCES_KHR;
|
|
geometryCreate.maxPrimitiveCount = (static_cast<uint32_t>(instances.size()));
|
|
geometryCreate.allowsTransforms = (VK_TRUE);
|
|
|
|
VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
|
|
asCreateInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR;
|
|
asCreateInfo.flags = flags;
|
|
asCreateInfo.maxGeometryCount = 1;
|
|
asCreateInfo.pGeometryInfos = &geometryCreate;
|
|
````
|
|
|
|
We then call the allocator, which will create an acceleration structure handle for the TLAS. It will also query the
|
|
resulting size of the TLAS using `vk::Device::getAccelerationStructureMemoryRequirementsKHR` and allocate that
|
|
amount of memory:
|
|
|
|
```` C
|
|
// Create the acceleration structure object and allocate the memory
|
|
// required to hold the TLAS data
|
|
m_tlas.as = m_alloc.createAcceleration(asCreateInfo);
|
|
m_debug.setObjectName(m_tlas.as.accel, "Tlas");
|
|
````
|
|
|
|
As with the BLAS, we also query the amount of scratch memory required by the builder to generate the TLAS,
|
|
and allocate a scratch buffer. Note that since the BLAS and TLAS both require a scratch buffer, we could also have used
|
|
one buffer and thus saved an allocation. However, for the purpose of this tutorial, we keep the BLAS and TLAS builds
|
|
independent.
|
|
|
|
```` C
|
|
// Compute the amount of scratch memory required by the acceleration structure builder
|
|
VkAccelerationStructureMemoryRequirementsInfoKHR memoryRequirementsInfo{
|
|
VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_KHR};
|
|
memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_KHR;
|
|
memoryRequirementsInfo.accelerationStructure = m_tlas.as.accel;
|
|
memoryRequirementsInfo.buildType = VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR;
|
|
|
|
VkMemoryRequirements2 reqMem{VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2};
|
|
vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
|
|
VkDeviceSize scratchSize = reqMem.memoryRequirements.size;
|
|
|
|
// Allocate the scratch memory
|
|
nvvkBuffer scratchBuffer =
|
|
m_alloc.createBuffer(scratchSize, VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
|
|
VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
|
|
bufferInfo.buffer = scratchBuffer.buffer;
|
|
VkDeviceAddress scratchAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);
|
|
````
|
|
|
|
An `Instance` object is nearly identical to a `VkGeometryInstanceKHR` object: the only difference is the transform
|
|
matrix of the instance. The former uses a $4\times4$ matrix from GLM (column-major), while the latter uses a raw
|
|
array of floating-point values representing a row-major $4\times3$ matrix. Using the `Instance` object on the
|
|
application side allows us to use the more intuitive $4\times4$ matrices, making the code clearer. When generating the
|
|
TLAS we then convert all the `Instance` objects to `VkGeometryInstanceKHR`:
|
|
|
|
```` C
|
|
// For each instance, build the corresponding instance descriptor
|
|
std::vector<VkAccelerationStructureInstanceKHR> geometryInstances;
|
|
geometryInstances.reserve(instances.size());
|
|
for(const auto& inst : instances)
|
|
{
|
|
geometryInstances.push_back(instanceToVkGeometryInstanceKHR(inst));
|
|
}
|
|
````
|
|
|
|
We then upload the instance descriptions to the device using a one-time command buffer. This command buffer will also be
|
|
used to generate the TLAS itself, and so we add a barrier after the copy to ensure it has completed before launching the
|
|
TLAS build.
|
|
|
|
```` C
|
|
// Building the TLAS
|
|
nvvk::CommandPool genCmdBuf(m_device, m_queueIndex);
|
|
VkCommandBuffer cmdBuf = genCmdBuf.createCommandBuffer();
|
|
|
|
// Create a buffer holding the actual instance data for use by the AS
|
|
// builder
|
|
VkDeviceSize instanceDescsSizeInBytes = instances.size() * sizeof(VkAccelerationStructureInstanceKHR);
|
|
|
|
// Allocate the instance buffer and copy its contents from host to device
|
|
// memory
|
|
m_instBuffer = m_alloc.createBuffer(cmdBuf, geometryInstances,
|
|
VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
|
|
m_debug.setObjectName(m_instBuffer.buffer, "TLASInstances");
|
|
//VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
|
|
bufferInfo.buffer = m_instBuffer.buffer;
|
|
VkDeviceAddress instanceAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);
|
|
|
|
// Make sure the copy of the instance buffer are copied before triggering the
|
|
// acceleration structure build
|
|
VkMemoryBarrier barrier{VK_STRUCTURE_TYPE_MEMORY_BARRIER};
|
|
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
|
|
barrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR;
|
|
vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR,
|
|
0, 1, &barrier, 0, nullptr, 0, nullptr);
|
|
````
|
|
|
|
The build is then triggered, and we execute the command buffer before destroying the temporary buffers.
|
|
|
|
```` C
|
|
// Build the TLAS
|
|
VkAccelerationStructureGeometryDataKHR geometry{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR};
|
|
geometry.instances.arrayOfPointers = VK_FALSE;
|
|
geometry.instances.data.deviceAddress = instanceAddress;
|
|
VkAccelerationStructureGeometryKHR topASGeometry{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR};
|
|
topASGeometry.geometryType = VK_GEOMETRY_TYPE_INSTANCES_KHR;
|
|
topASGeometry.geometry = geometry;
|
|
|
|
|
|
const VkAccelerationStructureGeometryKHR* pGeometry = &topASGeometry;
|
|
VkAccelerationStructureBuildGeometryInfoKHR topASInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR};
|
|
topASInfo.flags = flags;
|
|
topASInfo.update = VK_FALSE;
|
|
topASInfo.srcAccelerationStructure = VK_NULL_HANDLE;
|
|
topASInfo.dstAccelerationStructure = m_tlas.as.accel;
|
|
topASInfo.geometryArrayOfPointers = VK_FALSE;
|
|
topASInfo.geometryCount = 1;
|
|
topASInfo.ppGeometries = &pGeometry;
|
|
topASInfo.scratchData.deviceAddress = scratchAddress;
|
|
|
|
// Build Offsets info: n instances
|
|
VkAccelerationStructureBuildOffsetInfoKHR buildOffsetInfo{static_cast<uint32_t>(instances.size()), 0, 0, 0};
|
|
const VkAccelerationStructureBuildOffsetInfoKHR* pBuildOffsetInfo = &buildOffsetInfo;
|
|
|
|
// Build the TLAS
|
|
vkCmdBuildAccelerationStructureKHR(cmdBuf, 1, &topASInfo, &pBuildOffsetInfo);
|
|
|
|
|
|
genCmdBuf.submitAndWait(cmdBuf);
|
|
m_alloc.finalizeAndReleaseStaging();
|
|
m_alloc.destroy(scratchBuffer);
|
|
}
|
|
````
|
|
|
|
## main
|
|
|
|
In the `main` function, we can now add the creation of the geometry instances and acceleration structures
|
|
right after initializing ray tracing:
|
|
|
|
```` C
|
|
// #VKRay
|
|
helloVk.initRayTracing();
|
|
helloVk.createBottomLevelAS();
|
|
helloVk.createTopLevelAS();
|
|
````
|
|
|
|
# Ray Tracing Descriptor Set
|
|
|
|
The ray tracing shaders, like the rasterization shaders, use external resources referenced by a descriptor set. A key
|
|
difference, however, is that in a scene requiring several types of shaders, the rasterization would allow each set of
|
|
shaders to have their own descriptor set(s). For example, objects with different materials may each have a descriptor
|
|
set containing the handles of the textures it needs. This is easily done since for a given material, we would create its
|
|
corresponding rasterization pipeline and use that pipeline to render all the objects with that material. On the
|
|
contrary, with ray tracing it is not possible to know in advance which objects will be hit by a ray, so any shader may
|
|
be invoked at any time. The Vulkan ray tracing extension then uses a single set of descriptor sets containing all the
|
|
resources necessary to render the scene: for example, it would contain all the textures for all the materials.
|
|
|
|
To maintain compatibility between rasterization and ray tracing, the ray tracing pipeline will use the same descriptor
|
|
set containing the scene information, and will add another descriptor set referencing the TLAS and the buffer in which
|
|
we store the output image.
|
|
|
|
In the header, we declare the objects related to this additional descriptor set:
|
|
|
|
```` C
|
|
void createRtDescriptorSet();
|
|
|
|
nvvk::DescriptorSetBindings m_rtDescSetLayoutBind;
|
|
vk::DescriptorPool m_rtDescPool;
|
|
vk::DescriptorSetLayout m_rtDescSetLayout;
|
|
vk::DescriptorSet m_rtDescSet;
|
|
````
|
|
|
|
The acceleration structure will be accessible by the Ray Generation shader, as we want to call `TraceRayEXT()` from this
|
|
shader. Later in this document, we will also make it accessible from the Closest Hit shader, in order to send rays from
|
|
there as well. The output image is the offscreen buffer used by the rasterization, and will be written only by the
|
|
RayGen shader.
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// This descriptor set holds the Acceleration structure and the output image
|
|
//
|
|
void HelloVulkan::createRtDescriptorSet()
|
|
{
|
|
using vkDT = vk::DescriptorType;
|
|
using vkSS = vk::ShaderStageFlagBits;
|
|
using vkDSLB = vk::DescriptorSetLayoutBinding;
|
|
|
|
m_rtDescSetLayoutBind.addBinding(vkDSLB(0, vkDT::eAccelerationStructureKHR, 1,
|
|
vkSS::eRaygenKHR | vkSS::eClosestHitKHR)); // TLAS
|
|
m_rtDescSetLayoutBind.addBinding(
|
|
vkDSLB(1, vkDT::eStorageImage, 1, vkSS::eRaygenKHR)); // Output image
|
|
|
|
m_rtDescPool = m_rtDescSetLayoutBind.createPool(m_device);
|
|
m_rtDescSetLayout = m_rtDescSetLayoutBind.createLayout(m_device);
|
|
m_rtDescSet = m_device.allocateDescriptorSets({m_rtDescPool, 1, &m_rtDescSetLayout})[0];
|
|
|
|
vk::AccelerationStructureKHR tlas = m_rtBuilder.getAccelerationStructure();
|
|
vk::WriteDescriptorSetAccelerationStructureKHR descASInfo;
|
|
descASInfo.setAccelerationStructureCount(1);
|
|
descASInfo.setPAccelerationStructures(&tlas);
|
|
vk::DescriptorImageInfo imageInfo{
|
|
{}, m_offscreenColor.descriptor.imageView, vk::ImageLayout::eGeneral};
|
|
|
|
std::vector<vk::WriteDescriptorSet> writes;
|
|
writes.emplace_back(m_rtDescSetLayoutBind.makeWrite(m_rtDescSet, 0, &descASInfo));
|
|
writes.emplace_back(m_rtDescSetLayoutBind.makeWrite(m_rtDescSet, 1, &imageInfo));
|
|
m_device.updateDescriptorSets(static_cast<uint32_t>(writes.size()), writes.data(), 0, nullptr);
|
|
}
|
|
````
|
|
|
|
## Additions to the Scene Descriptor Set
|
|
|
|
As the ray tracing shaders also have to access the scene description, we need to extend the access flags of the
|
|
corresponding buffers in the original `createDescriptorSetLayout()`. The RayGen should access the camera matrices to
|
|
compute ray directions, and the ClosestHit needs access to the materials, scene instances, textures, vertex buffers, and
|
|
index buffers. Even though the vertex and index buffers will only be used by the ray tracing shaders we add them to this
|
|
descriptor set as they semantically fit the Scene descriptor set.
|
|
|
|
```` C
|
|
// Camera matrices (binding = 0)
|
|
m_descSetLayoutBind.addBinding(
|
|
vkDS(0, vkDT::eUniformBuffer, 1, vkSS::eVertex | vkSS::eRaygenKHR));
|
|
// Materials (binding = 1)
|
|
m_descSetLayoutBind.addBinding(
|
|
vkDS(1, vkDT::eStorageBuffer, nbObj, vkSS::eVertex | vkSS::eFragment | vkSS::eClosestHitKHR));
|
|
// Scene description (binding = 2)
|
|
m_descSetLayoutBind.addBinding( //
|
|
vkDS(2, vkDT::eStorageBuffer, 1, vkSS::eVertex | vkSS::eFragment | vkSS::eClosestHitKHR));
|
|
// Textures (binding = 3)
|
|
m_descSetLayoutBind.addBinding(
|
|
vkDS(3, vkDT::eCombinedImageSampler, nbTxt, vkSS::eFragment | vkSS::eClosestHitKHR));
|
|
// Materials (binding = 4)
|
|
m_descSetLayoutBind.addBinding(
|
|
vkDS(4, vkDT::eStorageBuffer, nbObj, vkSS::eFragment | vkSS::eClosestHitKHR));
|
|
// Storing vertices (binding = 5)
|
|
m_descSetLayoutBind.addBinding( //
|
|
vkDS(5, vkDT::eStorageBuffer, nbObj, vkSS::eClosestHitKHR));
|
|
// Storing indices (binding = 6)
|
|
m_descSetLayoutBind.addBinding( //
|
|
vkDS(6, vkDT::eStorageBuffer, nbObj, vkSS::eClosestHitKHR));
|
|
````
|
|
|
|
We set the actual contents of the descriptor set by adding those buffers in `updateDescriptorSet()`:
|
|
|
|
```` C
|
|
// All material buffers, 1 buffer per OBJ
|
|
std::vector<vk::DescriptorBufferInfo> dbiMat;
|
|
std::vector<vk::DescriptorBufferInfo> dbiMatIdx;
|
|
std::vector<vk::DescriptorBufferInfo> dbiVert;
|
|
std::vector<vk::DescriptorBufferInfo> dbiIdx;
|
|
for(size_t i = 0; i < m_objModel.size(); ++i)
|
|
{
|
|
dbiMat.push_back({m_objModel[i].matColorBuffer.buffer, 0, VK_WHOLE_SIZE});
|
|
dbiMatIdx.push_back({m_objModel[i].matIndexBuffer.buffer, 0, VK_WHOLE_SIZE});
|
|
dbiVert.push_back({m_objModel[i].vertexBuffer.buffer, 0, VK_WHOLE_SIZE});
|
|
dbiIdx.push_back({m_objModel[i].indexBuffer.buffer, 0, VK_WHOLE_SIZE});
|
|
}
|
|
writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 1, dbiMat.data()));
|
|
writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 4, dbiMatIdx.data()));
|
|
writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 5, dbiVert.data()));
|
|
writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 6, dbiIdx.data()));
|
|
````
|
|
|
|
Originally the buffers containing the vertices and indices were only used by the rasterization pipeline.
|
|
The ray tracing will need to use those buffers as storage buffers (`VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT`),
|
|
the address to those buffers are needed to fill the `VkAccelerationStructureGeometryTrianglesDataKHR` structure,
|
|
and because they are use for constructing the acceleration structure, they also need
|
|
the `VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR` flag.
|
|
|
|
We update the usage of the buffers in `loadModel`:
|
|
|
|
```` C
|
|
model.vertexBuffer =
|
|
m_alloc.createBuffer(cmdBuf, loader.m_vertices,
|
|
vkBU::eVertexBuffer | vkBU::eStorageBuffer | vkBU::eShaderDeviceAddress
|
|
| vkBU::eAccelerationStructureBuildInputReadOnlyKHR);
|
|
model.indexBuffer =
|
|
m_alloc.createBuffer(cmdBuf, loader.m_indices,
|
|
vkBU::eIndexBuffer | vkBU::eStorageBuffer | vkBU::eShaderDeviceAddress
|
|
| vkBU::eAccelerationStructureBuildInputReadOnlyKHR);
|
|
````
|
|
|
|
!!! Note: Array of Buffers
|
|
Each model (OBJ) was constructed with a buffer of vertices, indices, and materials. Therefore the
|
|
scene has vectors of those buffers. In the shaders, we access the right buffer using the
|
|
the ObjectID used by the Instance. This is convenient, as we have access to all the data
|
|
of the scene while ray tracing.
|
|
|
|
## Descriptor Update
|
|
|
|
As with the rasterization descriptor set, the ray tracing descriptor set needs to be updated if its contents change.
|
|
This typically happens when resizing the window, as the output image is recreated and needs to be re-linked to the
|
|
descriptor set. The update is performed in a new method of the `HelloVulkan` class:
|
|
|
|
```` C
|
|
void updateRtDescriptorSet();
|
|
````
|
|
|
|
The implementation is straightforward, simply updating the output image reference:
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// Writes the output image to the descriptor set
|
|
// - Required when changing resolution
|
|
//
|
|
void HelloVulkan::updateRtDescriptorSet()
|
|
{
|
|
using vkDT = vk::DescriptorType;
|
|
|
|
// (1) Output buffer
|
|
vk::DescriptorImageInfo imageInfo{
|
|
{}, m_offscreenColor.descriptor.imageView, vk::ImageLayout::eGeneral};
|
|
vk::WriteDescriptorSet wds{m_rtDescSet, 1, 0, 1, vkDT::eStorageImage, &imageInfo};
|
|
m_device.updateDescriptorSets(wds, nullptr);
|
|
}
|
|
````
|
|
|
|
We can then add the update call to the `onResize()` method to link it to the resizing event:
|
|
|
|
```` C
|
|
updateRtDescriptorSet();
|
|
````
|
|
|
|
The resources created in this section need to be destroyed when closing the application by adding the following to
|
|
`destroyResources`:
|
|
|
|
```` C
|
|
m_device.destroy(m_rtDescPool);
|
|
m_device.destroy(m_rtDescSetLayout);
|
|
````
|
|
|
|
## main
|
|
|
|
In the `main` function, we create the descriptor set after the other ray tracing calls:
|
|
|
|
```` C
|
|
helloVk.createRtDescriptorSet();
|
|
````
|
|
|
|
# Ray Tracing Pipeline
|
|
|
|
When creating rasterization shaders with Vulkan, the application compiles them into executable shaders, which are bound
|
|
to the rasterization pipeline. All objects rendered using this pipeline will use those shaders. To render an image with
|
|
several types of shaders, the rasterization pipeline needs to be set to use each before calling the draw commands.
|
|
|
|
In a ray tracing context, a ray traced through the scene can hit any object and thus trigger the execution of any
|
|
shader. Instead of using one shader executable at a time, we now need to have all shaders available at once. The
|
|
pipeline then contains all the shaders required to render the scene, and information on how to execute it. To be able to
|
|
ray trace some geometry, the Vulkan ray tracing extension typically uses at least these 3 shader programs:
|
|
|
|
* The **ray generation** shader will be the starting point for ray tracing, and will be called for each pixel. It will
|
|
typically initialize a ray starting at the location of the camera, in a direction given by evaluating the camera lens
|
|
model at the pixel location. It will then invoke `traceRayEXT()`, that will shoot the ray in the scene. Other shaders below
|
|
will process further events, and return their result to the ray generation shader through the ray payload.
|
|
|
|
* The **miss** shader is executed when a ray does not intersect any geometry. For instance, it might sample an
|
|
environment map, or return a simple color through the ray payload.
|
|
|
|
* The **closest hit** shader is called upon hitting the geometric instance closest to the starting point of the ray.
|
|
This shader can for example perform lighting calculations and return the results through the ray payload. There can be
|
|
as many closest hit shaders as needed, much like how a rasterization-based application has multiple pixel shaders
|
|
depending on its objects.
|
|
|
|
Two more shader types can optionally be used:
|
|
|
|
* The **intersection** shader, which allows intersecting user-defined geometry. For example, this can be used to
|
|
intersect geometry placeholders for on-demand geometry loading, or intersecting procedural geometry without tessellating
|
|
them beforehand. Using this shader requires modifying how the acceleration structures are built, and is beyond the scope
|
|
of this tutorial. We will instead rely on the built-in triangle intersection shader provided by the extension, which
|
|
returns 2 floating-point values representing the barycentric coordinates `(u,v)` of the hit point inside the triangle.
|
|
For a triangle made of vertices `v0`, `v1`, `v2`, the barycentric coordinates define the weights of the vertices as
|
|
follows:
|
|
|
|
***********************
|
|
* . u *
|
|
* / \ *
|
|
* / v1\ *
|
|
* / \ *
|
|
* / \ *
|
|
* 1-u-v / v0 v2 \ v *
|
|
* '-----------' *
|
|
***********************
|
|
|
|
|
|
* The **any hit** shader is executed on each potential intersection: when searching for the hit point closest to the ray
|
|
origin, several candidates may be found on the way. The any hit shader can frequently be used to efficiently implement
|
|
alpha-testing. If the alpha test fails, the ray traversal can continue without having to call `traceRayEXT()` again. The
|
|
built-in any hit shader is simply a pass-through returning the intersection to the traversal engine, which will
|
|
determine which ray intersection is the closest.
|
|
|
|
![Figure [step]: The Ray Tracing Pipeline](Images/ShaderPipeline.svg)
|
|
|
|
We will start with a pipeline containing only the 3 main shader programs: a single ray generation shader, a single miss
|
|
shader, and a single hit group made only of a closest hit shader. This is done by first compiling each GLSL shader
|
|
program into SPIR-V. These SPIR-V shaders will be linked together into a ray tracing pipeline, which will be able to
|
|
route the intersection calculations to the right hit shaders.
|
|
|
|
To be able to focus on the pipeline generation, we provide simple shaders:
|
|
|
|
## Adding Shaders
|
|
|
|
!!! Warning: [Download Ray Tracing Shaders](files/shaders.zip)
|
|
Download the shaders and extract the content into `src/shaders`. Then rerun CMake, which will add those files to the project.
|
|
|
|
The `shaders` folder now contains 3 more files:
|
|
|
|
* `raytrace.rgen` contains the ray generation program. It also declares its access to the ray tracing output buffer
|
|
`image`, and the ray tracing acceleration structure `topLevelAS`, bound as an `accelerationStructureKHR`. For now this
|
|
shader program simply writes a constant color into the output buffer.
|
|
|
|
* `raytrace.rmiss` defines the miss shader. This shader will be executed when no geometry is hit, and will write a
|
|
constant color into the ray payload `rayPayloadInEXT`, which is provided automatically. Since our current ray generation
|
|
program does not trace any rays for now, this shader will not be called.
|
|
|
|
* `raytrace.rchit` contains a very simple closest hit shader. It will be executed upon hitting the geometry (our
|
|
triangles). As the miss shader, it takes the ray payload `rayPayloadInEXT`. It also has a second input defining the
|
|
intersection attributes `hitAttributeEXT` as provided by the intersection shader, i.e. the barycentric coordinates. This
|
|
shader simply writes a constant color to the payload.
|
|
|
|
In the header file, let's add the definition of the ray tracing pipeline building method, and the storage members of the
|
|
pipeline:
|
|
|
|
```` C
|
|
void createRtPipeline();
|
|
std::vector<vk::RayTracingShaderGroupCreateInfoKHR> m_rtShaderGroups;
|
|
vk::PipelineLayout m_rtPipelineLayout;
|
|
vk::Pipeline m_rtPipeline;
|
|
````
|
|
|
|
The pipeline will also use push constants to store global uniform values, namely the background color and
|
|
the light source information:
|
|
|
|
```` C
|
|
struct RtPushConstant
|
|
{
|
|
nvmath::vec4f clearColor;
|
|
nvmath::vec3f lightPosition;
|
|
float lightIntensity;
|
|
int lightType;
|
|
} m_rtPushConstants;
|
|
````
|
|
|
|
Our implementation of the ray tracing pipeline generation starts by adding the ray generation and miss shader stages,
|
|
followed by the closest hit shader. Note that this order is arbitrary, as the extension allows the developer to set up
|
|
the pipeline in any order.
|
|
|
|
All stages are stored in an array of `vk::PipelineShaderStageCreateInfo` objects. Indices within this vector will be
|
|
used as unique identifiers for the shaders in the Shader Binding Table. These identifiers are stored in the
|
|
`RayTracingShaderGroupCreateInfoKHR` structure. This structure first specifies a `type`, which represents the kind of
|
|
shader group represented in the structure. Ray generation, miss shaders are called 'general' shaders. In this case the
|
|
type is `eGeneral`, and only the `generalShader` member of the structure is filled. The other ones are set to
|
|
`VK_SHADER_UNUSED_KHR`. This is also the case for the callable shaders, not used in this tutorial. In our layout the ray
|
|
generation comes first (0), followed by the miss shader (1).
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// Pipeline for the ray tracer: all shaders, raygen, chit, miss
|
|
//
|
|
void HelloVulkan::createRtPipeline()
|
|
{
|
|
std::vector<std::string> paths = defaultSearchPaths;
|
|
|
|
vk::ShaderModule raygenSM =
|
|
nvvk::createShaderModule(m_device, //
|
|
nvh::loadFile("shaders/raytrace.rgen.spv", true, paths, true));
|
|
vk::ShaderModule missSM =
|
|
nvvk::createShaderModule(m_device, //
|
|
nvh::loadFile("shaders/raytrace.rmiss.spv", true, paths, true));
|
|
|
|
std::vector<vk::PipelineShaderStageCreateInfo> stages;
|
|
|
|
// Raygen
|
|
vk::RayTracingShaderGroupCreateInfoKHR rg{vk::RayTracingShaderGroupTypeKHR::eGeneral,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
|
|
stages.push_back({{}, vk::ShaderStageFlagBits::eRaygenKHR, raygenSM, "main"});
|
|
rg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
|
|
m_rtShaderGroups.push_back(rg);
|
|
// Miss
|
|
vk::RayTracingShaderGroupCreateInfoKHR mg{vk::RayTracingShaderGroupTypeKHR::eGeneral,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
|
|
stages.push_back({{}, vk::ShaderStageFlagBits::eMissKHR, missSM, "main"});
|
|
mg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
|
|
m_rtShaderGroups.push_back(mg);
|
|
|
|
````
|
|
|
|
As detailed before, intersections are managed by 3 kinds of shaders: the intersection shader computes the ray-geometry
|
|
intersections, the any-hit shader is run for every potential intersection, and the closest hit shader is applied to the
|
|
closest hit point along the ray. Those 3 shaders are bound into a hit group. In our case the geometry is made of
|
|
triangles, so the `type` of the `RayTracingShaderGroupCreateInfoKHR` is `eTrianglesHitGroup`. The intersection shader is
|
|
then built-in, and we set the `intersectionShader` member to `VK_SHADER_UNUSED_KHR`. We do not use a any-hit shader,
|
|
letting the system use a built-in pass-through shader. Therefore, we also leave the `anyHitShader` to
|
|
`VK_SHADER_UNUSED_KHR`. The only shader we define is then the closest hit shader, by setting the `closestHitShader`
|
|
member to the index `2` (`stages.size()-1`), since the `stages` vector already contains the ray generation and miss
|
|
shaders.
|
|
|
|
```` C
|
|
// Hit Group - Closest Hit + AnyHit
|
|
vk::ShaderModule chitSM =
|
|
nvvk::createShaderModule(m_device, //
|
|
nvh::loadFile("shaders/raytrace.rchit.spv", true, paths, true));
|
|
|
|
vk::RayTracingShaderGroupCreateInfoKHR hg{vk::RayTracingShaderGroupTypeKHR::eTrianglesHitGroup,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
|
|
VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
|
|
stages.push_back({{}, vk::ShaderStageFlagBits::eClosestHitKHR, chitSM, "main"});
|
|
hg.setClosestHitShader(static_cast<uint32_t>(stages.size() - 1));
|
|
m_rtShaderGroups.push_back(hg);
|
|
````
|
|
|
|
Note that if the geometry were not triangles, we would have set the `type` to `eProceduralHitGroup`, and would have to
|
|
define an intersection shader.
|
|
|
|
After creating the shader groups, we need to setup the pipeline layout that will describe how the pipeline
|
|
will access external data:
|
|
|
|
```` C
|
|
vk::PipelineLayoutCreateInfo pipelineLayoutCreateInfo;
|
|
````
|
|
|
|
We first add the push constant range to allow the ray tracing shaders to access the global uniform values:
|
|
|
|
```` C
|
|
// Push constant: we want to be able to update constants used by the shaders
|
|
vk::PushConstantRange pushConstant{vk::ShaderStageFlagBits::eRaygenKHR
|
|
| vk::ShaderStageFlagBits::eClosestHitKHR
|
|
| vk::ShaderStageFlagBits::eMissKHR,
|
|
0, sizeof(RtPushConstant)};
|
|
pipelineLayoutCreateInfo.setPushConstantRangeCount(1);
|
|
pipelineLayoutCreateInfo.setPPushConstantRanges(&pushConstant);
|
|
````
|
|
|
|
As described earlier, the pipeline uses two descriptor sets: `set=0` is specific to the ray tracing pipeline (TLAS and
|
|
output image), and `set=1` is shared with the rasterization (scene data):
|
|
|
|
```` C
|
|
// Descriptor sets: one specific to ray tracing, and one shared with the rasterization pipeline
|
|
std::vector<vk::DescriptorSetLayout> rtDescSetLayouts = {m_rtDescSetLayout, m_descSetLayout};
|
|
pipelineLayoutCreateInfo.setSetLayoutCount(static_cast<uint32_t>(rtDescSetLayouts.size()));
|
|
pipelineLayoutCreateInfo.setPSetLayouts(rtDescSetLayouts.data());
|
|
````
|
|
|
|
The pipeline layout information is now complete, allowing us to create the layout itself.
|
|
|
|
```` C
|
|
m_rtPipelineLayout = m_device.createPipelineLayout(pipelineLayoutCreateInfo);
|
|
````
|
|
|
|
The creation of the ray tracing pipeline is different from the classical graphics pipeline. In the graphics pipeline we
|
|
simply need to fill in the fixed set of programmable stages (vertex, fragment, etc.). The ray tracing pipeline can
|
|
contain an arbitrary number of stages depending on the number of active shaders in the scene.
|
|
|
|
We first provide all the stages that will be used:
|
|
|
|
```` C
|
|
// Assemble the shader stages and recursion depth info into the ray tracing pipeline
|
|
vk::RayTracingPipelineCreateInfoKHR rayPipelineInfo;
|
|
rayPipelineInfo.setStageCount(static_cast<uint32_t>(stages.size())); // Stages are shaders
|
|
rayPipelineInfo.setPStages(stages.data());
|
|
````
|
|
|
|
Then, we indicate how the shaders can be assembled into groups. A ray generation or miss shader is a group by
|
|
itself, but hit groups can comprise up to 3 shaders (intersection, any hit, closest hit).
|
|
|
|
```` C
|
|
rayPipelineInfo.setGroupCount(
|
|
static_cast<uint32_t>(m_rtShaderGroups.size())); // 1-raygen, n-miss, n-(hit[+anyhit+intersect])
|
|
rayPipelineInfo.setPGroups(m_rtShaderGroups.data());
|
|
````
|
|
|
|
The ray generation and closest hit shaders can trace rays, making the ray tracing a potentially recursive process. To
|
|
allow the underlying RTX layer to optimize the pipeline we indicate the maximum recursion depth used by our shaders. For
|
|
the simplistic shaders we currently have, we set this depth to 1, meaning that even if the shaders would trigger
|
|
recursion (ie. a hit shader calling `TraceRayEXT()`), this recursion would be prevented by setting the result of this trace
|
|
call as a miss. Note that it is preferable to keep the recursion level as low as possible, replacing it by a loop
|
|
formulation instead.
|
|
|
|
```` C
|
|
rayPipelineInfo.setMaxPipelineRayRecursionDepth(1); // Ray depth
|
|
rayPipelineInfo.setLayout(m_rtPipelineLayout);
|
|
m_rtPipeline = static_cast<const vk::Pipeline&>(
|
|
m_device.createRayTracingPipelineKHR({}, {}, rayPipelineInfo));
|
|
````
|
|
|
|
Once the pipeline has been created we discard the shader modules:
|
|
|
|
```` C
|
|
m_device.destroy(raygenSM);
|
|
m_device.destroy(missSM);
|
|
m_device.destroy(chitSM);
|
|
}
|
|
````
|
|
|
|
The pipeline layout and the pipeline itself also have to be cleaned up upon closing, hence we add this to
|
|
`destroyResources`:
|
|
|
|
```` C
|
|
m_device.destroy(m_rtPipeline);
|
|
m_device.destroy(m_rtPipelineLayout);
|
|
````
|
|
|
|
## main
|
|
|
|
In the `main` function, we call the pipeline construction after the other ray tracing calls:
|
|
|
|
```` C
|
|
helloVk.createRtPipeline();
|
|
````
|
|
|
|
# Shader Binding Table
|
|
|
|
In a typical rasterization setup, a current shader and its associated resources are bound prior to drawing the
|
|
corresponding objects, then another shader and resource set can be bound for some other objects, and so on. Since ray
|
|
tracing can hit any surface of the scene at any time, all shaders must be available simultaneously.
|
|
|
|
The Shader Binding Table is the blueprint of the ray tracing process. It indicates which ray generation shader to start
|
|
with, which miss shader to execute if no intersections are found, and which hit shader groups can be executed for each
|
|
instance. This association between instances and shader groups is created when setting up the geometry: for each
|
|
instance we provided a `hitGroupId` in the TLAS. This value represents the index in the SBT corresponding to the hit
|
|
group for that instance.
|
|
|
|
## Handles
|
|
|
|
The SBT is an array containing the handles to the shader groups used in the ray tracing pipeline. In our example, we
|
|
will create a buffer for the three groups: raygen, miss and closest hit. The size of the handle is given by the
|
|
`shaderGroupHandleSize` member of the ray tracing properties, but the offset need to be aligned on `shaderGroupBaseAlignment`.
|
|
We will then allocate a buffer of size `3 * shaderGroupBaseAlignment` and will consecutively write the handle of each shader group.
|
|
To retrieve all the handles, we will call `vkGetRayTracingShaderGroupHandlesKHR`.
|
|
|
|
The buffer will have the following information, which will later be used when calling `vkCmdTraceRaysKHR`:
|
|
|
|
******************
|
|
*+--------------+*
|
|
*| RayGen |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
*| Miss |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
*| HitGroup |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
******************
|
|
|
|
We first add the declarations of the SBT creation method and the SBT buffer itself in the `HelloVulkan` class:
|
|
|
|
```` C
|
|
void createRtShaderBindingTable();
|
|
nvvkBuffer m_rtSBTBuffer;
|
|
````
|
|
|
|
In this function, we start by computing the size of the binding table from the number of groups and the handle size so
|
|
that we can allocate the SBT buffer.
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// The Shader Binding Table (SBT)
|
|
// - getting all shader handles and writing them in a SBT buffer
|
|
// - Besides exception, this could be always done like this
|
|
// See how the SBT buffer is used in run()
|
|
//
|
|
void HelloVulkan::createRtShaderBindingTable()
|
|
{
|
|
auto groupCount =
|
|
static_cast<uint32_t>(m_rtShaderGroups.size()); // 3 shaders: raygen, miss, chit
|
|
uint32_t groupHandleSize = m_rtProperties.shaderGroupHandleSize; // Size of a program identifier
|
|
uint32_t groupSizeAligned =
|
|
nvh::align_up(groupHandleSize, m_rtProperties.shaderGroupBaseAlignment);
|
|
|
|
// Fetch all the shader handles used in the pipeline, so that they can be written in the SBT
|
|
uint32_t sbtSize = groupCount * groupSizeAligned;
|
|
````
|
|
|
|
We then fetch the handles to the shader groups of the pipeline, and let the allocator
|
|
allocate the device memory and copy the handles into the SBT. Note that SBT buffer need the
|
|
`VK_BUFFER_USAGE_SHADER_BINDING_TABLE_BIT_KHR` flag and since we will need the address
|
|
of SBT buffer, therefore the buffer need also the `VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT` flag.
|
|
|
|
```` C
|
|
std::vector<uint8_t> shaderHandleStorage(sbtSize);
|
|
m_device.getRayTracingShaderGroupHandlesKHR(m_rtPipeline, 0, groupCount, sbtSize,
|
|
shaderHandleStorage.data());
|
|
|
|
// Write the handles in the SBT
|
|
m_rtSBTBuffer = m_alloc.createBuffer(
|
|
sbtSize,
|
|
vk::BufferUsageFlagBits::eTransferSrc | vk::BufferUsageFlagBits::eShaderDeviceAddress
|
|
| vk::BufferUsageFlagBits::eShaderBindingTableKHR,
|
|
vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
|
|
m_debug.setObjectName(m_rtSBTBuffer.buffer, std::string("SBT").c_str());
|
|
|
|
// Write the handles in the SBT
|
|
void* mapped = m_alloc.map(m_rtSBTBuffer);
|
|
auto* pData = reinterpret_cast<uint8_t*>(mapped);
|
|
for(uint32_t g = 0; g < groupCount; g++)
|
|
{
|
|
memcpy(pData, shaderHandleStorage.data() + g * groupHandleSize, groupHandleSize); // raygen
|
|
pData += groupSizeAligned;
|
|
}
|
|
m_alloc.unmap(m_rtSBTBuffer);
|
|
|
|
m_alloc.finalizeAndReleaseStaging();
|
|
}
|
|
````
|
|
|
|
As with other resources, we destroy the SBT in `destroyResources`:
|
|
|
|
```` C
|
|
m_alloc.destroy(m_rtSBTBuffer);
|
|
````
|
|
|
|
## main
|
|
|
|
In the `main` function, we now add the construction of the Shader Binding Table:
|
|
|
|
```` C
|
|
helloVk.createRtShaderBindingTable();
|
|
````
|
|
|
|
# Ray Tracing
|
|
|
|
Let's create a function that will call the execution of the ray tracer. First, add the declaration to the header
|
|
|
|
```` C
|
|
void raytrace(const vk::CommandBuffer& cmdBuf, const nvmath::vec4f& clearColor);
|
|
````
|
|
|
|
We first bind the pipeline and its layout, and set the push constants that will be available throughout the pipeline:
|
|
|
|
```` C
|
|
//--------------------------------------------------------------------------------------------------
|
|
// Ray Tracing the scene
|
|
//
|
|
void HelloVulkan::raytrace(const vk::CommandBuffer& cmdBuf, const nvmath::vec4f& clearColor)
|
|
{
|
|
m_debug.beginLabel(cmdBuf, "Ray trace");
|
|
// Initializing push constant values
|
|
m_rtPushConstants.clearColor = clearColor;
|
|
m_rtPushConstants.lightPosition = m_pushConstant.lightPosition;
|
|
m_rtPushConstants.lightIntensity = m_pushConstant.lightIntensity;
|
|
m_rtPushConstants.lightType = m_pushConstant.lightType;
|
|
|
|
cmdBuf.bindPipeline(vk::PipelineBindPoint::eRayTracingKHR, m_rtPipeline);
|
|
cmdBuf.bindDescriptorSets(vk::PipelineBindPoint::eRayTracingKHR, m_rtPipelineLayout, 0,
|
|
{m_rtDescSet, m_descSet}, {});
|
|
cmdBuf.pushConstants<RtPushConstant>(m_rtPipelineLayout,
|
|
vk::ShaderStageFlagBits::eRaygenKHR
|
|
| vk::ShaderStageFlagBits::eClosestHitKHR
|
|
| vk::ShaderStageFlagBits::eMissKHR,
|
|
0, m_rtPushConstants);
|
|
````
|
|
|
|
Since the structure of the Shader Binding Table is up to the developer, we need to indicate the ray tracing pipeline how
|
|
to interpret it. In particular we compute the offsets in the SBT where the ray generation shader, miss shaders and hit
|
|
groups can be found. Miss shaders and hit groups are stored contiguously, hence we also compute the stride separating
|
|
each shader. In our case the stride is simply the size of a shader group handle, but more advanced uses may embed
|
|
shader-group-specific data within the SBT, resulting in a larger stride.
|
|
|
|
```` C
|
|
// Size of a program identifier
|
|
uint32_t groupSize =
|
|
nvh::align_up(m_rtProperties.shaderGroupHandleSize, m_rtProperties.shaderGroupBaseAlignment);
|
|
uint32_t groupStride = groupSize;
|
|
vk::DeviceSize hitGroupSize =
|
|
nvh::align_up(m_rtProperties.shaderGroupHandleSize + sizeof(HitRecordBuffer),
|
|
m_rtProperties.shaderGroupBaseAlignment);
|
|
vk::DeviceAddress sbtAddress = m_device.getBufferAddress({m_rtSBTBuffer.buffer});
|
|
|
|
using Stride = vk::StridedDeviceAddressRegionKHR;
|
|
std::array<Stride, 4> strideAddresses{
|
|
Stride{sbtAddress + 0u * groupSize, groupStride, groupSize * 1}, // raygen
|
|
Stride{sbtAddress + 1u * groupSize, groupStride, groupSize * 2}, // miss
|
|
Stride{sbtAddress + 3u * groupSize, hitGroupSize, hitGroupSize * 3}, // hit
|
|
Stride{0u, 0u, 0u}}; // callable
|
|
|
|
````
|
|
|
|
We can finally call `traceRaysKHR` that will add the ray tracing launch in the command buffer. Note that the SBT buffer
|
|
is mentioned several times. This is due to the possibility of separating the SBT into several buffers, one for each
|
|
type: ray generation, miss shaders, hit groups, and callable shaders (outside the scope of this tutorial). The last
|
|
three parameters are equivalent to the grid size of a compute launch, and represent the total number of threads. Since
|
|
we want to trace one ray per pixel, the grid size has the width and height of the output image, and a depth of 1.
|
|
|
|
```` C
|
|
cmdBuf.traceRaysKHR(&strideAddresses[0], &strideAddresses[1], &strideAddresses[2],
|
|
&strideAddresses[3], //
|
|
m_size.width, m_size.height, 1); //
|
|
|
|
m_debug.endLabel(cmdBuf);
|
|
}
|
|
````
|
|
|
|
# Let's Ray Trace
|
|
|
|
Now we have everything set up to be able to trace rays: the acceleration structure, the descriptor sets, the ray tracing
|
|
pipeline and the shader binding table. Let's try to make images from this.
|
|
|
|
## main
|
|
|
|
In the `main` function, we will define a local variable to switch between rasterization and ray tracing. Add the
|
|
following right after the ray tracing initialization calls:
|
|
|
|
```` C
|
|
bool useRaytracer = true;
|
|
````
|
|
|
|
In the same function, we will add a UI checkbox to make that switch at runtime. Right after the line
|
|
`ImGui::ColorEdit3(`, we add
|
|
|
|
```` C
|
|
ImGui::Checkbox("Ray Tracer mode", &useRaytracer); // Switch between raster and ray tracing
|
|
````
|
|
|
|
A few lines below, you can find a block containing the `helloVk.rasterize` call. Since our application will now have two
|
|
render modes, we replace that block by
|
|
|
|
```` C
|
|
// Rendering Scene
|
|
if(useRaytracer)
|
|
{
|
|
helloVk.raytrace(cmdBuff, clearColor);
|
|
}
|
|
else
|
|
{
|
|
cmdBuff.beginRenderPass(offscreenRenderPassBeginInfo, vk::SubpassContents::eInline);
|
|
helloVk.rasterize(cmdBuff);
|
|
cmdBuff.endRenderPass();
|
|
}
|
|
````
|
|
|
|
Note that the ray tracing behaves more like a compute shader than a graphics task, and is then outside of a render pass.
|
|
|
|
We should now be able to alternate between rasterization and ray tracing. However, the ray tracing result only renders a
|
|
flat gray image: the simplistic ray generation shader does not trace any ray yet, and simply returns a fixed color.
|
|
|
|
Raster | | Ray Trace
|
|
:-----------------------------:|:---:|:--------------------------------:
|
|
 | <-> | 
|
|
|
|
# Camera Setup
|
|
|
|
In the context of rasterization, the vertices of the objects are projected from their world-space position into a
|
|
$[0,1]\times[0,1]\times[0,1]$ cube, before being rasterized on the XY plane. For ray tracing, we need to initialize some
|
|
rays at the camera position, and intersect the geometry in world space. To achieve this, we need to store the inverse
|
|
view and projection matrices in the `CameraMatrices` at the beginning of the `hello_vulkan.cpp` file:
|
|
|
|
```` C
|
|
struct CameraMatrices
|
|
{
|
|
nvmath::mat4f view;
|
|
nvmath::mat4f proj;
|
|
nvmath::mat4f viewInverse;
|
|
// #VKRay
|
|
nvmath::mat4f projInverse;
|
|
};
|
|
````
|
|
|
|
Since the camera matrices will be used by the RayGen, see next sub section, the descriptorSet need to also have
|
|
the usage flag to include that stage. This was done in section Additions to the Scene Descriptor Set
|
|
|
|
## updateUniformBuffer
|
|
|
|
The computation of the matrix inverses is done in `updateUniformBuffer`, after setting the `ubo.proj` matrix:
|
|
|
|
```` C
|
|
// #VKRay
|
|
ubo.projInverse = nvmath::invert(ubo.proj);
|
|
````
|
|
|
|
## Ray generation (raytrace.rgen)
|
|
|
|
It is now time to enrich the ray generation shader to allow it to trace rays. We will first add a new binding to allow
|
|
the shader to access the camera matrices.
|
|
|
|
```` C
|
|
layout(binding = 0, set = 1) uniform CameraProperties
|
|
{
|
|
mat4 view;
|
|
mat4 proj;
|
|
mat4 viewInverse;
|
|
mat4 projInverse;
|
|
}
|
|
cam;
|
|
````
|
|
!!! Note: Binding
|
|
The buffer of camera uses `binding = 0` as described in `createDescriptorSetLayout()`. The
|
|
`set = 1` comes from the fact that it is the second descriptor set in `raytrace()`.
|
|
|
|
When tracing a ray, the hit or miss shaders need to be able to return some information to the shader program that
|
|
invoked the ray tracing. This is done through the use of a payload, identified by the `rayPayloadEXT` qualifier.
|
|
|
|
Since the payload struct will be reused in several shaders, we create a new shader file `raycommon.glsl` and add it to
|
|
the Visual Studio folder.
|
|
|
|
This file contains only the payload definition:
|
|
|
|
~~~~ C++
|
|
struct hitPayload
|
|
{
|
|
vec3 hitValue;
|
|
};
|
|
~~~~
|
|
|
|
We now modify `raytrace.rgen` to include this new file. Note that the `#include` directive is an GLSL extension, which
|
|
we also enable:
|
|
|
|
~~~~ C++
|
|
#extension GL_GOOGLE_include_directive : enable
|
|
#include "raycommon.glsl"
|
|
~~~~
|
|
|
|
The payload, identified with `rayPayloadEXT` is then our `hitPayload` structure.
|
|
|
|
```` C
|
|
layout(location = 0) rayPayloadEXT hitPayload prd;
|
|
````
|
|
|
|
### Note
|
|
|
|
> In incoming shaders, like miss and closest hit, the payload will be `rayPayloadInEXT`.
|
|
|
|
The `main` function of the shader then starts by computing the floating-point pixel coordinates, normalized between 0
|
|
and 1. The `gl_LaunchIDEXT` contains the integer coordinates of the pixel being rendered, while `gl_LaunchSizeEXT`
|
|
corresponds to the image size provided when calling `traceRayEXT`.
|
|
|
|
```` C
|
|
void main()
|
|
{
|
|
const vec2 pixelCenter = vec2(gl_LaunchIDEXT.xy) + vec2(0.5);
|
|
const vec2 inUV = pixelCenter/vec2(gl_LaunchSizeEXT.xy);
|
|
vec2 d = inUV * 2.0 - 1.0;
|
|
````
|
|
|
|
From the pixel coordinates, we can apply the inverse transformation of the view and projection matrices of the camera to
|
|
obtain the origin and direction of the ray.
|
|
|
|
```` C
|
|
vec4 origin = cam.viewInverse * vec4(0, 0, 0, 1);
|
|
vec4 target = cam.projInverse * vec4(d.x, d.y, 1, 1);
|
|
vec4 direction = cam.viewInverse * vec4(normalize(target.xyz), 0);
|
|
````
|
|
|
|
In addition, we provide some flags for the ray: first. a flag indicating that all geometry will be considered opaque, as
|
|
we also indicated when creating the acceleration structures. We also indicate the minimum and maximum distance of the
|
|
potential intersections along the ray. Those distances can be useful to reduce the ray tracing costs if intersections
|
|
before or after a given point do not matter. A typical use case is for computing ambient occlusion.
|
|
|
|
```` C
|
|
uint rayFlags = gl_RayFlagsOpaqueEXT;
|
|
float tMin = 0.001;
|
|
float tMax = 10000.0;
|
|
````
|
|
|
|
We now trace the ray itself, by first providing `traceRayEXT` with the top-level acceleration structure and the ray masks.
|
|
The `cullMask` value is a mask that will be binary AND-ed with the mask of the geometry instances. Since all instances
|
|
have a `0xFF` flag as well, they will all be visible. The next 3 parameters indicate which hit group would be called
|
|
when hitting a surface. For example, a single object may be associated to 2 hit groups representing the behavior when
|
|
hit by a direct camera ray, or from a shadow ray. Since each instance has an index indicating the offset of the hit
|
|
groups for the instance in the shader binding table, the `sbtRecordOffset` will allow to fetch the right kind of shader
|
|
for that instance. In the case of the primary rays we may want to use the first hit group and use an offset of 0, while
|
|
for shadow rays the second hit group would be required, hence an offset of 1. The stride indicates the number of hit
|
|
groups for a single instance. This is particularly useful if the instance offset is not set when creating the instances
|
|
in the acceleration structure. A stride of 0 indicates that all hit groups are packed together, and the instance offset
|
|
can be used directly to find them in the SBT. The index of the miss shader comes next, followed by the ray origin,
|
|
direction and extents. The last parameter identifies the payload that will be carried by the ray, by giving its location
|
|
index. The last `0` corresponds to the location of our payload, `layout(location = 0) rayPayloadEXT hitPayload prd;`.
|
|
|
|
```` C
|
|
traceRayEXT(topLevelAS, // acceleration structure
|
|
rayFlags, // rayFlags
|
|
0xFF, // cullMask
|
|
0, // sbtRecordOffset
|
|
0, // sbtRecordStride
|
|
0, // missIndex
|
|
origin.xyz, // ray origin
|
|
tMin, // ray min range
|
|
direction.xyz, // ray direction
|
|
tMax, // ray max range
|
|
0 // payload (location = 0)
|
|
);
|
|
````
|
|
|
|
Finally, we write the resulting payload into the output buffer.
|
|
|
|
```` C
|
|
imageStore(image, ivec2(gl_LaunchIDEXT.xy), vec4(prd.hitValue, 1.0));
|
|
}
|
|
````
|
|
|
|
Raster | | Ray Trace
|
|
:-----------------------------:|:---:|:--------------------------------:
|
|
 | <-> | 
|
|
|
|
## Miss shader (raytrace.miss)
|
|
|
|
To share the clear color of the rasterization with the ray tracer, we will change the return value of the miss shader to
|
|
return the clear value passed as a push constant. While the `Constants` struct contains more members, here we use the
|
|
fact that `clearColor` is the first member in the struct, and do not even declare the subsequent members.
|
|
|
|
```` C
|
|
#extension GL_GOOGLE_include_directive : enable
|
|
#include "raycommon.glsl"
|
|
|
|
layout(location = 0) rayPayloadInEXT hitPayload prd;
|
|
|
|
layout(push_constant) uniform Constants
|
|
{
|
|
vec4 clearColor;
|
|
};
|
|
|
|
void main()
|
|
{
|
|
prd.hitValue = clearColor.xyz * 0.8;
|
|
}
|
|
````
|
|
|
|
!!! Note:
|
|
The color of the background is slightly darker to differentiate the two renderers.
|
|
|
|
|
|
|
|
# Simple Lighting
|
|
|
|
The current closest hit shader only returns a flat color. To add some lighting, we will need to introduce the concept of
|
|
surface normals. However, the ray tracing only provides the barycentric coordinates of the hit point. To obtain the
|
|
normals and the other vertex attributes, we will need to find them in the vertex buffer and interpolate them using the
|
|
barycentric coordinates. This is why we extended the usage of the vertex and index buffers when creating the ray tracing
|
|
descriptor set.
|
|
|
|
## Closest Hit (raytrace.rchit)
|
|
|
|
When we created the ray tracing descriptor set, we already included the geometry definition. Therefore, we can reference
|
|
the vertex and index buffers directly in the closest hit shader, via the scene description `binding = 2`
|
|
|
|
We first include the payload definition and the OBJ-Wavefront structures
|
|
|
|
```` C
|
|
#extension GL_EXT_scalar_block_layout : enable
|
|
#extension GL_GOOGLE_include_directive : enable
|
|
#include "raycommon.glsl"
|
|
#include "wavefront.glsl"
|
|
````
|
|
|
|
Then we describe the resources according to the descriptor set layout
|
|
|
|
```` C
|
|
layout(location = 0) rayPayloadInEXT hitPayload prd;
|
|
|
|
layout(binding = 2, set = 1, scalar) buffer ScnDesc { sceneDesc i[]; } scnDesc;
|
|
layout(binding = 5, set = 1, scalar) buffer Vertices { Vertex v[]; } vertices[];
|
|
layout(binding = 6, set = 1) buffer Indices { uint i[]; } indices[];
|
|
````
|
|
|
|
In the Hit shader we need all the members of the push constant block:
|
|
|
|
```` C
|
|
layout(push_constant) uniform Constants
|
|
{
|
|
vec4 clearColor;
|
|
vec3 lightPosition;
|
|
float lightIntensity;
|
|
int lightType;
|
|
}
|
|
pushC;
|
|
````
|
|
|
|
In the `main` function, the `gl_PrimitiveID` allows us to find the vertices of the triangle hit by the ray:
|
|
|
|
```` C
|
|
void main()
|
|
{
|
|
// Object of this instance
|
|
uint objId = scnDesc.i[gl_InstanceID].objId;
|
|
|
|
// Indices of the triangle
|
|
ivec3 ind = ivec3(indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 0], //
|
|
indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 1], //
|
|
indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 2]); //
|
|
// Vertex of the triangle
|
|
Vertex v0 = vertices[nonuniformEXT(objId)].v[ind.x];
|
|
Vertex v1 = vertices[nonuniformEXT(objId)].v[ind.y];
|
|
Vertex v2 = vertices[nonuniformEXT(objId)].v[ind.z];
|
|
````
|
|
|
|
Using the hit point's barycentric coordinates, we can interpolate the normal:
|
|
|
|
```` C
|
|
const vec3 barycentrics = vec3(1.0 - attribs.x - attribs.y, attribs.x, attribs.y);
|
|
|
|
// Computing the normal at hit position
|
|
vec3 normal = v0.nrm * barycentrics.x + v1.nrm * barycentrics.y + v2.nrm * barycentrics.z;
|
|
// Transforming the normal to world space
|
|
normal = normalize(vec3(scnDesc.i[gl_InstanceID].transfoIT * vec4(normal, 0.0)));
|
|
````
|
|
|
|
The world-space position could be calculated in two ways, the first one being to use the information from the hit
|
|
shader. But this could have precision issues if the hit point is very far.
|
|
|
|
```` C
|
|
vec3 worldPos = gl_WorldRayOriginEXT + gl_WorldRayDirectionEXT * gl_HitTEXT;
|
|
````
|
|
|
|
Another solution, more precise, consists in computing the position by interpolation, as for the normal
|
|
|
|
```` C
|
|
// Computing the coordinates of the hit position
|
|
vec3 worldPos = v0.pos * barycentrics.x + v1.pos * barycentrics.y + v2.pos * barycentrics.z;
|
|
// Transforming the position to world space
|
|
worldPos = vec3(scnDesc.i[gl_InstanceID].transfo * vec4(worldPos, 1.0));
|
|
````
|
|
|
|
The light source specified in the constants can then be used to compute the dot product of the normal with the lighting
|
|
direction, giving a simple diffuse lighting effect:
|
|
|
|
```` C
|
|
// Vector toward the light
|
|
vec3 L;
|
|
float lightIntensity = pushC.lightIntensity;
|
|
float lightDistance = 100000.0;
|
|
// Point light
|
|
if(pushC.lightType == 0)
|
|
{
|
|
vec3 lDir = pushC.lightPosition - worldPos;
|
|
lightDistance = length(lDir);
|
|
lightIntensity = pushC.lightIntensity / (lightDistance * lightDistance);
|
|
L = normalize(lDir);
|
|
}
|
|
else // Directional light
|
|
{
|
|
L = normalize(pushC.lightPosition - vec3(0));
|
|
}
|
|
|
|
float dotNL = max(dot(normal, L), 0.2);
|
|
|
|
prd.hitValue = vec3(dotNL);
|
|
}
|
|
````
|
|
|
|

|
|
|
|
|
|
# Simple Materials
|
|
|
|
The rendering above could be made more interesting by adding support for materials. The imported OBJ objects provide
|
|
simplified Alias Wavefront material definitions.
|
|
|
|
## raytrace.rchit
|
|
|
|
These materials define their basic reflectance properties using simple color coefficients, and also support texturing.
|
|
The buffer containing the materials has already been created for rasterization, and has also been added into the ray
|
|
tracing descriptor set. Add the binding of the material buffer and the array of texture samplers:
|
|
|
|
```` C
|
|
layout(binding = 1, set = 1, scalar) buffer MatColorBufferObject { WaveFrontMaterial m[]; } materials[];
|
|
layout(binding = 3, set = 1) uniform sampler2D textureSamplers[];
|
|
layout(binding = 4, set = 1) buffer MatIndexColorBuffer { int i[]; } matIndex[];
|
|
````
|
|
|
|
The declaration of the material is the same as that used for the rasterizer and is defined in
|
|
`wavefront.glsl`.
|
|
|
|
The `Vertex` structure contains a material index, which we will use to find the corresponding material in the buffer.
|
|
|
|
We first remove these lines at the end of `main()`
|
|
|
|
```` C
|
|
float dotNL = max(dot(normal, L), 0.2);
|
|
prd.hitValue = vec3(dotNL);
|
|
````
|
|
|
|
and fetch the material definition instead:
|
|
|
|
```` C
|
|
// Material of the object
|
|
int matIdx = matIndex[nonuniformEXT(objId)].i[gl_PrimitiveID];
|
|
WaveFrontMaterial mat = materials[nonuniformEXT(objId)].m[matIdx];
|
|
````
|
|
|
|
!!! Note Note
|
|
There is one buffer of materials per object, and each material can be access via the index.
|
|
And each triangle has an index of material.
|
|
|
|
From that material definition, we use the diffuse and specular reflectances to compute diffuse lighting. This code also
|
|
supports textures to modulate the surface albedo.
|
|
|
|
```` C
|
|
// Diffuse
|
|
vec3 diffuse = computeDiffuse(mat, L, normal);
|
|
if(mat.textureId >= 0)
|
|
{
|
|
uint txtId = mat.textureId + scnDesc.i[gl_InstanceID].txtOffset;
|
|
vec2 texCoord =
|
|
v0.texCoord * barycentrics.x + v1.texCoord * barycentrics.y + v2.texCoord * barycentrics.z;
|
|
diffuse *= texture(textureSamplers[nonuniformEXT(txtId)], texCoord).xyz;
|
|
}
|
|
|
|
// Specular
|
|
vec3 specular = computeSpecular(mat, gl_WorldRayDirectionEXT, L, normal);
|
|
````
|
|
|
|
The final lighting is then computed as
|
|
|
|
```` C
|
|
prd.hitValue = vec3(lightIntensity * (diffuse + specular));
|
|
````
|
|
|
|

|
|
|
|
|
|
## main
|
|
|
|
The OBJ model is loaded in `main.cpp` by calling `helloVk.loadModel`. Let's load something more interesting than a cube:
|
|
|
|
```` C
|
|
// Creation of the example
|
|
helloVk.loadModel(nvh::findFile("media/scenes/Medieval_building.obj", defaultSearchPaths, true));
|
|
helloVk.loadModel(nvh::findFile("media/scenes/plane.obj", defaultSearchPaths, true));
|
|
````
|
|
|
|
Since that model is larger, we can change the `CameraManip.setLookat` call to
|
|
|
|
```` C
|
|
CameraManip.setLookat(nvmath::vec3f(4, 4, 4), nvmath::vec3f(0, 1, 0), nvmath::vec3f(0, 1, 0));
|
|
````
|
|
|
|

|
|
|
|
# Shadows
|
|
|
|
The above allows us to ray trace a scene and apply some lighting, but it is still missing shadows. To this end, we will
|
|
add a new ray type, and shoot rays from the closest hit shader. This new ray type will require adding a new miss shader.
|
|
|
|
## `createRaytracingPipeline`
|
|
|
|
For simple shadow rays we only need to compute whether some geometry was hit along the ray or not. This can be achieved
|
|
using a Boolean payload initialized as if a hit were found, and ray trace using only an additional miss shader that will
|
|
set the payload to no hit.
|
|
|
|
!!! Warning: [Download Shadow Shader](files/shadowShaders.zip)
|
|
Download and add shader file
|
|
|
|
This archive contains only one file, `raytraceShadow.rmiss`. Add this file to the `src/shaders` directory and rerun
|
|
CMake. The shader file should compile, and the resulting SPIR-V file should be stored in the `shaders` folder alongside
|
|
the GLSL file.
|
|
|
|
In the body of `createRtPipeline`, we need to define the new miss shader right after the previous miss shader:
|
|
|
|
```` C
|
|
// The second miss shader is invoked when a shadow ray misses the geometry. It
|
|
// simply indicates that no occlusion has been found
|
|
vk::ShaderModule shadowmissSM =
|
|
nvvk::createShaderModule(m_device,
|
|
nvh::loadFile("shaders/raytraceShadow.rmiss.spv", true, paths, true));
|
|
|
|
````
|
|
|
|
After pushing the miss shader `missSM`, we also push the miss shader for the shadow rays:
|
|
|
|
```` C
|
|
// Shadow Miss
|
|
stages.push_back({{}, vk::ShaderStageFlagBits::eMissKHR, shadowmissSM, "main"});
|
|
mg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
|
|
m_rtShaderGroups.push_back(mg);
|
|
````
|
|
|
|
The pipeline now has to allow shooting rays from the closest hit program, which requires increasing the recursion level to 2:
|
|
|
|
```` C
|
|
// The ray tracing process can shoot rays from the camera, and a shadow ray can be shot from the
|
|
// hit points of the camera rays, hence a recursion level of 2. This number should be kept as low
|
|
// as possible for performance reasons. Even recursive ray tracing should be flattened into a loop
|
|
// in the ray generation to avoid deep recursion.
|
|
rayPipelineInfo.setMaxPipelineRayRecursionDepth(2); // Ray depth
|
|
````
|
|
|
|
At the end of the method, we destroy the shader module for the shadow miss shader:
|
|
|
|
```` C
|
|
m_device.destroy(shadowmissSM);
|
|
````
|
|
|
|
## `traceRaysKHR`
|
|
|
|
The addition of the new miss shader group has modified our shader binding table, which now looks like:
|
|
|
|
******************
|
|
*+--------------+*
|
|
*| RayGen |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
*| Miss |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
*| ShadowMiss |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
*| HitGroup |*
|
|
*| Handle |*
|
|
*+--------------+*
|
|
******************
|
|
|
|
Therefore, we have to change `HelloVulkan::raytrace` to adjust the the closest hit offset before calling `traceRaysKHR`.
|
|
This also points out that in real-world applications the SBT should be embedded so that it can handle those offsets
|
|
automatically.
|
|
|
|
```` C
|
|
vk::DeviceSize hitGroupOffset = 3u * progSize; // Jump over the raygen and 2 miss shaders
|
|
````
|
|
|
|
## `createRtDescriptorSet`
|
|
|
|
For each resource entry in the descriptor set, we indicated which shader stage would be able to use it. Since shadow
|
|
rays will be traced from the closest hit shader, we add `vkSS::eClosestHitKHR` to the acceleration structure binding:
|
|
|
|
```` C
|
|
// Top-level acceleration structure, usable by both the ray generation and the closest hit (to
|
|
// shoot shadow rays)
|
|
m_rtDescSetLayoutBind.emplace_back(
|
|
vkDSLB(0, vkDT::eAccelerationStructureKHR, 1, vkSS::eRaygenKHR | vkSS::eClosestHitKHR)); // TLAS
|
|
````
|
|
|
|
## `raytrace.rchit`
|
|
|
|
The closest hit shader now needs to be aware of the acceleration structure to be able to shoot rays:
|
|
|
|
```` C
|
|
layout(binding = 0, set = 0) uniform accelerationStructureEXT topLevelAS;
|
|
````
|
|
|
|
Those rays will also carry a payload, which will need to be defined at a different location from the payload of the
|
|
current ray. In this case, the payload will be a simple Boolean value indicating whether an occluder has been found or
|
|
not:
|
|
|
|
```` C
|
|
layout(location = 1) rayPayloadEXT bool isShadowed;
|
|
````
|
|
|
|
In the `main` function, instead of simply setting our payload to `prd.hitValue = c;`, we will initiate a new ray. Note that
|
|
the index of the miss shader is now 1, since the SBT has 2 miss shaders. The payload location is defined to match
|
|
the declaration `layout(location = 1)` above. Note, when invoking `traceRayEXT()` we are setting
|
|
the flags with
|
|
|
|
* `gl_RayFlagsSkipClosestHitShaderKHR`: Will not invoke the hit shader, only the miss shader
|
|
* `gl_RayFlagsOpaqueKHR` : Will not call the any hit shader, so all objects will be opaque
|
|
* `gl_RayFlagsTerminateOnFirstHitKHR` : The first hit is always good.
|
|
|
|
Since we skip the shadow hit group, no code will be invoked when hitting a surface. Therefore, we initialize the payload
|
|
`isShadowed` to `true`, and will rely on the miss shader to set it to false if no surfaces have been encountered. We
|
|
also set the ray flags to optimize the ray tracing: since these simple shadow rays only need to return whether the ray
|
|
intersects any surface, we can instruct the ray tracing engine to stop the traversal after finding the first
|
|
intersection, without trying to execute a closest hit shader.
|
|
|
|
Shadow rays only need to be cast if the light is in front of the surface, and specular lighting should not be computed
|
|
if we are in shadow (since the light source won't be visible from the shading point). The code that previously computed
|
|
the specular term will then look like this:
|
|
|
|
```` C
|
|
vec3 specular = vec3(0);
|
|
float attenuation = 1;
|
|
|
|
// Tracing shadow ray only if the light is visible from the surface
|
|
if(dot(normal, L) > 0)
|
|
{
|
|
float tMin = 0.001;
|
|
float tMax = lightDistance;
|
|
vec3 origin = gl_WorldRayOriginEXT + gl_WorldRayDirectionEXT * gl_HitTEXT;
|
|
vec3 rayDir = L;
|
|
uint flags =
|
|
gl_RayFlagsTerminateOnFirstHitEXT | gl_RayFlagsOpaqueEXT | gl_RayFlagsSkipClosestHitShaderEXT;
|
|
isShadowed = true;
|
|
traceRayEXT(topLevelAS, // acceleration structure
|
|
flags, // rayFlags
|
|
0xFF, // cullMask
|
|
0, // sbtRecordOffset
|
|
0, // sbtRecordStride
|
|
1, // missIndex
|
|
origin, // ray origin
|
|
tMin, // ray min range
|
|
rayDir, // ray direction
|
|
tMax, // ray max range
|
|
1 // payload (location = 1)
|
|
);
|
|
|
|
if(isShadowed)
|
|
{
|
|
attenuation = 0.3;
|
|
}
|
|
else
|
|
{
|
|
// Specular
|
|
specular = computeSpecular(mat, gl_WorldRayDirectionEXT, L, normal);
|
|
}
|
|
}
|
|
````
|
|
|
|
The final payload value can then be adjusted depending on the result of the shadow ray:
|
|
|
|
```` C
|
|
prd.hitValue = vec3(lightIntensity * attenuation * (diffuse + specular));
|
|
````
|
|
|
|

|
|
|
|
The final project can be found under the [ray_tracing__simple](https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR/tree/master/ray_tracing__simple) directory.
|
|
|
|
|
|
# Going Further
|
|
|
|
From this point on, you can continue creating your own ray types and shaders, and experiment
|
|
with more advanced ray tracing based algorithms.
|
|
</script>
|
|
|
|
|
|
----
|
|
|
|
<!-- Markdeep: -->
|
|
<script src="https://developer.nvidia.com/sites/default/files/akamai/gameworks/whitepapers/markdeep.min.js?" charset="utf-8"></script>
|
|
<script>
|
|
window.alreadyProcessedMarkdeep || (document.body.style.visibility = "visible")
|
|
</script>
|