bluenoise-raytracer/docs/vkrt_tutorial.md.htm

                        <meta charset="utf-8">
            **NVIDIA Vulkan Ray Tracing Tutorial**
<small>
By [Martin-Karl Lefrançois](https://devblogs.nvidia.com/author/mlefrancois/),
   [Pascal Gautron](https://devblogs.nvidia.com/author/pgautron/), Neil Bickford
</small>


The focus of this document and the provided code is to showcase a basic integration of
ray tracing within an existing Vulkan sample, using the
[`VK_KHR_ray_tracing`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VK_KHR_ray_tracing)
extension. This tutorial starts from a basic Vulkan application and provides step-by-step instructions to modify and add
methods and functions. The sections are organized by components, with subsections identifying the modified functions.

![Final Result](Images/resultRaytraceShadowMedieval.png width="350px")

!!! Note GitHub repository
    https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR

# Introduction
<script type="preformatted">
This tutorial highlights the steps to add ray tracing to an existing Vulkan application, and assumes a working knowledge
of Vulkan in general. The code verbosity of classical components such as swapchain management, render passes etc. is
reduced using [C++ API helpers](https://github.com/nvpro-samples/shared_sources/tree/master/nvvk) and
NVIDIA's [nvpro-samples](https://github.com/nvpro-samples/build_all) framework. This framework contains many advanced
examples and best practices for Vulkan and OpenGL. We also use a helper for the creation of the ray tracing acceleration
structures, but we will document its contents extensively in this tutorial. The code is further simplified by using the
[Vulkan C++ API](https://github.com/KhronosGroup/Vulkan-Hpp), whose type safety and constructors reduce both its
verbosity and its potential for errors.

!!! Note Note
    For educational purposes all the code is contained in a very small set of files.
    A real integration would require additional levels of abstraction.

[//]: #  This may be the most platform independent comment

# Environment Setup

**The preferred way** to download the project (including NVVK) is to use the
nvpro-samples `build_all` script.

In a command line, clone the `nvpro-samples/build_all` repository from
https://github.com/nvpro-samples/build_all:

~~~~~
git clone https://github.com/nvpro-samples/build_all.git
~~~~~

Then open the `build_all` folder and run either `clone_all.bat` (Windows) or
`clone_all.sh` (Linux).

**If you want to clone as few repositories as possible**, open a command line,
and run the following commands to clone the repositories you need:
~~~~~
git clone https://github.com/nvpro-samples/shared_sources.git
git clone https://github.com/nvpro-samples/shared_external.git
git clone https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR.git
~~~~~

## Generating the Solution

One typical way to store the build system is to create a `build` directory below the
main project. You can use CMake-GUI or do the following steps.

~~~~~
cd vk_raytracing_tutorial_KHR
mkdir build
cd build
cmake ..
~~~~~

## Beta Installation

The SDK 1.2.161 and up which can be found under https://vulkan.lunarg.com/sdk/home will work with this project.

Nevertheless, if you are in the Beta period, it is suggested to install and compile all of the following and replace
with the current environment.

* Latest driver: https://developer.nvidia.com/vulkan-driver
* Vulkan headers: https://github.com/KhronosGroup/Vulkan-Headers
* Validator: https://github.com/KhronosGroup/Vulkan-ValidationLayers
* Vulkan-Hpp: https://github.com/KhronosGroup/Vulkan-Hpp

!!! Tip Visual Assist
    To get auto-completion, edit vulkan.hpp and change two places from:<br>
    `namespace VULKAN_HPP_NAMESPACE` to `namespace vk`

# Compiling & Running

Open the solution located in the build directory, then compile and run `vk_ray_tracing__before_KHR`.

This will be the starting point of the tutorial. This project is a simple framework allowing us to load OBJ files and rasterize them
using Vulkan.

![First Run](Images/resultRasterCube.png width="350px")


The following steps in the tutorial will be modifying this project
`vk_ray_tracing__before_KHR` and will add support for ray tracing. The
end result of the tutorial is the project `vk_ray_tracing__simple_KHR`.
It is possible to look in that project if something went wrong.

The project `vk_ray_tracing__simple_KHR` will be the starting point for the
extra tutorials.


# Ray Tracing Setup

Go to the `main` function of the `main.cpp` file, and find where we request Vulkan extensions with
`nvvk::ContextCreateInfo`.
To be able to use ray tracing, we will need VK_KHR_ACCELERATION_STRUCTURE and VK_KHR_RAY_TRACING_PIPELINE.
Those extensions have also dependencies on other extension, therefore all the following
extensions will need to be added.

```` C
// #VKRay: Activate the ray tracing extension
vk::PhysicalDeviceAccelerationStructureFeaturesKHR accelFeature;
contextInfo.addDeviceExtension(VK_KHR_ACCELERATION_STRUCTURE_EXTENSION_NAME, false,
                               &accelFeature);
vk::PhysicalDeviceRayTracingPipelineFeaturesKHR rtPipelineFeature;
contextInfo.addDeviceExtension(VK_KHR_RAY_TRACING_PIPELINE_EXTENSION_NAME, false,
                               &rtPipelineFeature);
contextInfo.addDeviceExtension(VK_KHR_MAINTENANCE3_EXTENSION_NAME);
contextInfo.addDeviceExtension(VK_KHR_PIPELINE_LIBRARY_EXTENSION_NAME);
contextInfo.addDeviceExtension(VK_KHR_DEFERRED_HOST_OPERATIONS_EXTENSION_NAME);
contextInfo.addDeviceExtension(VK_KHR_BUFFER_DEVICE_ADDRESS_EXTENSION_NAME);

````

Before creating the device, a linked structure of features must past. Not all extensions
requires a set of features, but ray tracing features must be enabled before the creation of the device.
By providing `accelFeature`,  and `rtPipelineFeature`, the context creation will query the capable features
 for ray tracing and will use the filled structure to create the device.

In the `HelloVulkan` class in `hello_vulkan.h`, add an initialization function and a member storing the capabilities of
the GPU for ray tracing:

```` C
// #VKRay
void                                               initRayTracing();
vk::PhysicalDeviceRayTracingPipelinePropertiesKHR  m_rtProperties;
````

At the end of `hello_vulkan.cpp`, add the body of `initRayTracing()`, which will query the ray tracing capabilities
of the GPU using this extension. In particular, it will obtain the maximum recursion depth,
ie. the number of nested ray tracing calls that can be performed from a single ray. This can be seen as the number
of times a ray can bounce in the scene in a recursive path tracer. Note that for performance purposes, recursion
should in practice be kept to a minimum, favoring a loop formulation. The shader header size will be useful when
creating the shader binding table in a later section.


```` C
//--------------------------------------------------------------------------------------------------
// Initialize Vulkan ray tracing
// #VKRay
void HelloVulkan::initRayTracing()
{
  // Requesting ray tracing properties
  auto properties =
      m_physicalDevice.getProperties2<vk::PhysicalDeviceProperties2,
                                      vk::PhysicalDeviceRayTracingPipelinePropertiesKHR>();
  m_rtProperties = properties.get<vk::PhysicalDeviceRayTracingPipelinePropertiesKHR>();
}
````

## main

In `main.cpp`, in the `main()` function, we call the initialization method right after
`helloVk.updateDescriptorSet();`

```` C
// #VKRay
helloVk.initRayTracing();
````

!!! Note: Exercise
    When running the program, you can put a breakpoint in the `initRayTracing()` method to inspect
    the resulting values. On a Quadro RTX 6000, the maximum recursion depth is 31, and the shader
    group handle size is 16.

# Acceleration Structure

To be efficient, ray tracing requires organizing the geometry into an acceleration structure (AS)
that will reduce the number of ray-triangle intersection tests during rendering.
This structure is divided into a two-level tree. Intuitively, this can directly map to the notion
of a simplified scene graph, in which the internal nodes of the graph have been collapsed into a single
transform matrix for each instance. The geometry of an instance is stored in a bottom-level acceleration structure
(BLAS) object, which holds the actual vertex data. It is also possible to further simplify the scene graph by combining
multiple objects within a single bottom-level AS: for that, a single BLAS can be built from multiple vertex buffers, each with
its own transform matrix. Note that if an object is instantiated several times within a same BLAS, its geometry
will be duplicated. This can be particularly useful for improving performance on static, non-instantiated
scene components (as a rule of thumb, the fewer BLAS, the better).

The top-level AS (TLAS) will contain the object instances, each
with its own transformation matrix and reference to a corresponding BLAS.
We will start with a single bottom-level AS and a top-level AS instancing it once with an identity transform.


![Figure [step]: Acceleration Structure](Images/AccelerationStructure.svg)

This sample loads an OBJ file and stores its indices, vertices and material data into an `ObjModel` structure. This
model is referenced by an `ObjInstance` structure which also contains the transformation matrix of that particular
instance. For ray tracing the `ObjModel` and `ObjInstance` will then naturally fit the BLAS and TLAS, respectively.

To simplify the ray tracing setup we use a helper class containing utility functions for
acceleration structure builds. In the header file, include the`raytrace_vkpp` helper

```` C
// #VKRay
#include "nvvk/raytrace_vk.hpp"
````

so that we can add that helper as a member in the `HelloVulkan` class,

```` C
nvvk::RaytracingBuilder m_rtBuilder;
````

and initialize it at the end of `initRaytracing()`:

```` C
m_rtBuilder.setup(m_device, m_alloc, m_graphicsQueueIndex);
````

## Bottom-Level Acceleration Structure

The first step of building a BLAS object consists in converting the geometry data of an `ObjModel` into a
multiple structures than can be used by the AS builder. We are holding all those structure under
`nvvk::RaytracingBuilderKHR::Blas`

Add a new method to the `HelloVulkan`
class:

```` C
nvvk::RaytracingBuilderKHR::Blas objectToVkGeometryKHR(const ObjModel& model);
````

Its implementation will fill three structures

* vk::AccelerationStructureGeometryTrianglesDataKHR: defines the data from which the AS will be constructed.
* vk::AccelerationStructureGeometryKHR: the geometry type for building the AS, in this case, from triangles.
* vk::AccelerationStructureBuildRangeInfoKHR: the offset, which correspond to the actual wanted geometry when building.

Multiple of the above structure can be combined to create a single blas. In this example,
the array will always be a length of one.

Note that we consider all objects opaque for now, and indicate this to the builder for
potential optimization.

```` C
//--------------------------------------------------------------------------------------------------
// Converting a OBJ primitive to the ray tracing geometry used for the BLAS
//
nvvk::RaytracingBuilderKHR::Blas HelloVulkan::objectToVkGeometryKHR(const ObjModel& model)
{
  // Building part
  vk::DeviceAddress vertexAddress = m_device.getBufferAddress({model.vertexBuffer.buffer});
  vk::DeviceAddress indexAddress  = m_device.getBufferAddress({model.indexBuffer.buffer});

  uint32_t maxPrimitiveCount = model.nbIndices / 3;

  vk::AccelerationStructureGeometryTrianglesDataKHR triangles;
  triangles.setVertexFormat(vk::Format::eR32G32B32Sfloat);
  triangles.setVertexData(vertexAddress);
  triangles.setVertexStride(sizeof(VertexObj));
  triangles.setIndexType(vk::IndexType::eUint32);
  triangles.setIndexData(indexAddress);
  triangles.setTransformData({});
  triangles.setMaxVertex(model.nbVertices);

  // Setting up the build info of the acceleration
  vk::AccelerationStructureGeometryKHR asGeom;
  asGeom.setGeometryType(vk::GeometryTypeKHR::eTriangles);
  asGeom.setFlags(vk::GeometryFlagBitsKHR::eOpaque);
  asGeom.geometry.setTriangles(triangles);

  // The primitive itself
  vk::AccelerationStructureBuildRangeInfoKHR offset;
  offset.setFirstVertex(0);
  offset.setPrimitiveCount(maxPrimitiveCount);
  offset.setPrimitiveOffset(0);
  offset.setTransformOffset(0);

  // Our blas is only one geometry, but could be made of many geometries
  nvvk::RaytracingBuilderKHR::Blas blas;
  blas.asGeometry.emplace_back(asGeom);
  blas.asBuildOffsetInfo.emplace_back(offset);

  return blas;
}
````

In the `HelloVulkan` class declaration, we can now add the `createBottomLevelAS()` method that will generate a
`nvvk::RaytracingBuilderKHR::Blas` for each object, and trigger a BLAS build:

```` C
void createBottomLevelAS();
````

The implementation loops over all the loaded models and fills in an array of `nvvk::RaytracingBuilderKHR::Blas` before
triggering a build of all BLAS's in a batch. The resulting acceleration structures will be stored
within the helper in the order of construction, so that they can be directly referenced by index later.

```` C
void HelloVulkan::createBottomLevelAS()
{
  // BLAS - Storing each primitive in a geometry
  std::vector<nvvk::RaytracingBuilderKHR::Blas> allBlas;
  allBlas.reserve(m_objModel.size());
  for(const auto& obj : m_objModel)
  {
    auto blas = objectToVkGeometryKHR(obj);

    // We could add more geometry in each BLAS, but we add only one for now
    allBlas.emplace_back(blas);
  }
  m_rtBuilder.buildBlas(allBlas, vk::BuildAccelerationStructureFlagBitsKHR::ePreferFastTrace);
}
````


### Helper Details: RaytracingBuilder::buildBlas()

This helper function is already present in `raytraceKHR_vkpp.hpp`: it can be reused in many projects, and is
part of the set of helpers provided by the [nvpro-samples](https://github.com/nvpro-samples). The function
will generate one BLAS for each `RaytracingBuilderKHR::Blas`:

```` C
  void buildBlas(const std::vector<RaytracingBuilderKHR::Blas>& blas_,
                 VkBuildAccelerationStructureFlagsKHR flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
  {
    m_blas = blas_;  // Keeping a copy

    VkDeviceSize maxScratch{0};  // Largest scratch buffer for our BLAS

    // Is compaction requested?
    bool doCompaction = (flags & VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR)
                        == VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR;
    std::vector<VkDeviceSize> originalSizes;
    originalSizes.resize(m_blas.size());

    // Iterate over the groups of geometries, creating one BLAS for each group
    int idx{0};
    for(auto& blas : m_blas)
    {
````

The creation of the acceleration structure needs all `vk::AccelerationStructureCreateGeometryTypeInfoKHR` previously set and
set into `vk::AccelerationStructureCreateInfoKHR`.

```` C
VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
asCreateInfo.type             = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
asCreateInfo.flags            = flags;
asCreateInfo.maxGeometryCount = (uint32_t)blas.asCreateGeometryInfo.size();
asCreateInfo.pGeometryInfos   = blas.asCreateGeometryInfo.data();
````

The creation information is then passed to the allocator, that will internally create an acceleration structure handle.
It will also query `vk::Device::getAccelerationStructureMemoryRequirementsKHR` to obtain the size of the resulting BLAS,
and allocate memory accordingly.

```` C
// Create an acceleration structure identifier and allocate memory to
// store the resulting structure data
blas.as = m_alloc.createAcceleration(asCreateInfo);
m_debug.setObjectName(blas.as.accel, (std::string("Blas" + std::to_string(idx)).c_str()));
````

The acceleration structure builder requires some scratch memory to generate the BLAS. Since we generate all the
BLAS's in a batch, we query the scratch memory requirements for each BLAS, and find the maximum such requirement.
The amount of memory for the scratch is determined by filling the memory requirement structure, and setting
the previous created acceleration structure. At the time to write those lines, only the device can be use
for building the acceleration structure. The same scratch buffer is used by each BLAS, which is the reason to
allocate the largest size, to avoid any realocation. At the end of building all BLAS, we can dispose the scratch
buffer.

We are querying the size the acceleration structure is taking on the device as well. This has no real use except
for statistics and to compare it to the compact size which can happen in a second step.

```` C
// Estimate the amount of scratch memory required to build the BLAS, and
// update the size of the scratch buffer that will be allocated to
// sequentially build all BLASes
VkAccelerationStructureMemoryRequirementsInfoKHR memoryRequirementsInfo{
    VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_KHR};
memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_KHR;
memoryRequirementsInfo.accelerationStructure = blas.as.accel;
memoryRequirementsInfo.buildType             = VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR;

VkMemoryRequirements2 reqMem{VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2};
vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
VkDeviceSize scratchSize = reqMem.memoryRequirements.size;


blas.flags = flags;
maxScratch = std::max(maxScratch, scratchSize);

// Original size
memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_OBJECT_KHR;
vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
originalSizes[idx] = reqMem.memoryRequirements.size;

idx++;
}
````

Once that maximum has been found, we allocate a scratch buffer.

```` C
// Allocate the scratch buffers holding the temporary data of the acceleration structure builder
nvvkBuffer scratchBuffer =
    m_alloc.createBuffer(maxScratch, VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
bufferInfo.buffer              = scratchBuffer.buffer;
VkDeviceAddress scratchAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);
````

To know the size that the BLAS is really taking, we use queries and setting the type to `VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR`.
This is needed if we want to compact the acceleration structure in a second step. By default, the
memory allocated by the creation of the acceleration structure has the size of the worst case. After creation,
the real space can be smaller, and it is possible to copy the acceleration structure to one that is
using exactly what is needed. This could save over 50% of the device memory usage.

```` C
// Query size of compact BLAS
VkQueryPoolCreateInfo qpci{VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO};
qpci.queryCount = (uint32_t)m_blas.size();
qpci.queryType  = VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR;
VkQueryPool queryPool;
vkCreateQueryPool(m_device, &qpci, nullptr, &queryPool);
````

We then use multiple command buffers to launch all the BLAS builds. We are using multiple
command buffers instead of one, to allow the driver to allow system interuption and avoid a
TDR if the job was to heavy.

Note the barrier after each
build call: this is required as we reuse the scratch space across builds, and hence need to ensure
the previous build has completed before starting the next. We could have used multiple scratch buffers,
but it would have been expensive memory wise, and the device can only build one BLAS at a time, so we
wouldn't be faster.

```` C
// Query size of compact BLAS
VkQueryPoolCreateInfo qpci{VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO};
qpci.queryCount = (uint32_t)m_blas.size();
qpci.queryType  = VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR;
VkQueryPool queryPool;
vkCreateQueryPool(m_device, &qpci, nullptr, &queryPool);


// Create a command buffer containing all the BLAS builds
nvvk::CommandPool genCmdBuf(m_device, m_queueIndex);
int               ctr{0};
std::vector<VkCommandBuffer> allCmdBufs;
allCmdBufs.reserve(m_blas.size());
for(auto& blas : m_blas)
{
  VkCommandBuffer cmdBuf = genCmdBuf.createCommandBuffer();
  allCmdBufs.push_back(cmdBuf);

  const VkAccelerationStructureGeometryKHR* pGeometry = blas.asGeometry.data();
  VkAccelerationStructureBuildGeometryInfoKHR bottomASInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR};
  bottomASInfo.type                      = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
  bottomASInfo.flags                     = flags;
  bottomASInfo.update                    = VK_FALSE;
  bottomASInfo.srcAccelerationStructure  = VK_NULL_HANDLE;
  bottomASInfo.dstAccelerationStructure  = blas.as.accel;
  bottomASInfo.geometryArrayOfPointers   = VK_FALSE;
  bottomASInfo.geometryCount             = (uint32_t)blas.asGeometry.size();
  bottomASInfo.ppGeometries              = &pGeometry;
  bottomASInfo.scratchData.deviceAddress = scratchAddress;

  // Pointers of offset
  std::vector<const VkAccelerationStructureBuildOffsetInfoKHR*> pBuildOffset(blas.asBuildOffsetInfo.size());
  for(size_t i = 0; i < blas.asBuildOffsetInfo.size(); i++)
    pBuildOffset[i] = &blas.asBuildOffsetInfo[i];

  // Building the AS
  vkCmdBuildAccelerationStructureKHR(cmdBuf, 1, &bottomASInfo, pBuildOffset.data());

  // Since the scratch buffer is reused across builds, we need a barrier to ensure one build
  // is finished before starting the next one
  VkMemoryBarrier barrier{VK_STRUCTURE_TYPE_MEMORY_BARRIER};
  barrier.srcAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR;
  barrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR;
  vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR,
                       VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR, 0, 1, &barrier, 0, nullptr, 0, nullptr);

  // Query the compact size
  if(doCompaction)
  {
    vkCmdWriteAccelerationStructuresPropertiesKHR(cmdBuf, 1, &blas.as.accel,
                                                  VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR, queryPool, ctr++);
  }
}
genCmdBuf.submitAndWait(allCmdBufs);
allCmdBufs.clear();
````

While this approach has the advantage of keeping all BLAS's independent, building many BLAS's efficiently would
require allocating a larger scratch buffer, and launch several builds simultaneously. This current tutorial
does not make use of compaction, which could reduce significantly the memory footprint of the acceleration structures. Both
of those aspects will be part of a future advanced tutorial.

The following is when compation flag is enabled. This part, which is optional, will compact the BLAS in the memory that it is really using. It needs to wait that all BLASes
are constructred, to make a copy in the more fitted memory space.

```` C

// Compacting all BLAS
if(doCompaction)
{
  cmdBuf = genCmdBuf.createCommandBuffer();

  // Get the size result back
  std::vector<VkDeviceSize> compactSizes(m_blas.size());
  vkGetQueryPoolResults(m_device, queryPool, 0, (uint32_t)compactSizes.size(), compactSizes.size() * sizeof(VkDeviceSize),
                        compactSizes.data(), sizeof(VkDeviceSize), VK_QUERY_RESULT_WAIT_BIT);


  // Compacting
  std::vector<nvvkAccel> cleanupAS(m_blas.size());
  uint32_t               totOriginalSize{0}, totCompactSize{0};
  for(int i = 0; i < m_blas.size(); i++)
  {
    // LOGI("Reducing %i, from %d to %d \n", i, originalSizes[i], compactSizes[i]);
    totOriginalSize += (uint32_t)originalSizes[i];
    totCompactSize += (uint32_t)compactSizes[i];

    // Creating a compact version of the AS
    VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
    asCreateInfo.compactedSize = compactSizes[i];
    asCreateInfo.type          = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR;
    asCreateInfo.flags         = flags;
    auto as                    = m_alloc.createAcceleration(asCreateInfo);

    // Copy the original BLAS to a compact version
    VkCopyAccelerationStructureInfoKHR copyInfo{VK_STRUCTURE_TYPE_COPY_ACCELERATION_STRUCTURE_INFO_KHR};
    copyInfo.src  = m_blas[i].as.accel;
    copyInfo.dst  = as.accel;
    copyInfo.mode = VK_COPY_ACCELERATION_STRUCTURE_MODE_COMPACT_KHR;
    vkCmdCopyAccelerationStructureKHR(cmdBuf, &copyInfo);
    cleanupAS[i] = m_blas[i].as;
    m_blas[i].as = as;
  }
  genCmdBuf.submitAndWait(cmdBuf);

  // Destroying the previous version
  for(auto as : cleanupAS)
    m_alloc.destroy(as);

  LOGI("------------------\n");
  LOGI("Total: %d -> %d = %d (%2.2f%s smaller) \n", totOriginalSize, totCompactSize,
       totOriginalSize - totCompactSize, (totOriginalSize - totCompactSize) / float(totOriginalSize) * 100.f, "%%");
}
````

Finally, destroying what was allocated.

```` C
  vkDestroyQueryPool(m_device, queryPool, nullptr);
  m_alloc.destroy(scratchBuffer);
  m_alloc.finalizeAndReleaseStaging();
}
````

## Top-Level Acceleration Structure

The TLAS is the entry point in the ray tracing scene description, and stores all the instances. Add a new method
to the `HelloVulkan` class:

```` C
void createTopLevelAS();
````

An instance is represented by a `nvvk::RaytracingBuilder::Instance`, which stores its transform matrix (`transform`)
and the identifier of its corresponding BLAS (`blasId`). It also contains an instance identifier that will be available
during shading as `gl_InstanceCustomIndex`, as well as the index of the hit group that represents the shaders that will be
invoked upon hitting the object (`hitGroupId`).
This index and the notion of hit group are tied to the definition of the ray tracing pipeline and the Shader Binding
Table, described later in this tutorial. For now
it suffices to say that we will use only one hit group for the whole scene, and hence the hit group index is always 0.
Finally, the instance may indicate culling preferences, such as backface culling, using its `vk::GeometryInstanceFlagsKHR
flags` member. In our example we decide to disable culling altogether
for simplicity and independence on the winding of the input models.

Once all the instance objects are created we trigger the TLAS build, directing the builder to prefer generating a TLAS
optimized for tracing performance (rather than AS size, for example).

```` C
void HelloVulkan::createTopLevelAS()
{
  std::vector<nvvk::RaytracingBuilderKHR::Instance> tlas;
  tlas.reserve(m_objInstance.size());
  for(int i = 0; i < static_cast<int>(m_objInstance.size()); i++)
  {
    nvvk::RaytracingBuilderKHR::Instance rayInst;
    rayInst.transform  = m_objInstance[i].transform;  // Position of the instance
    rayInst.instanceId = i;                           // gl_InstanceID
    rayInst.blasId     = m_objInstance[i].objIndex;
    rayInst.hitGroupId = 0;  // We will use the same hit group for all objects
    rayInst.flags      = VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR;
    tlas.emplace_back(rayInst);
  }
  m_rtBuilder.buildTlas(tlas, vk::BuildAccelerationStructureFlagBitsKHR::ePreferFastTrace);
}
````

As usual in Vulkan, we need to explicitly destroy the objects we created by adding a call at the end of
`HelloVulkan::destroyResources`:

```` C
  // #VKRay
  m_rtBuilder.destroy();
````

### Helper Details: RaytracingBuilder::buildTlas()

The helper function for building top-level acceleration structures is part of the
[nvpro-samples](https://github.com/nvpro-samples)
and builds a TLAS from a vector of `Instance` objects. We first store some basic information about the TLAS, namely
the number of instances it will hold, and flags indicating preferences for the builder, such as whether to prefer faster
builds or better performance.

```` C
  void buildTlas(const std::vector<Instance>&         instances,
                 VkBuildAccelerationStructureFlagsKHR flags = VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
  {
    m_tlas.flags = flags;

    VkAccelerationStructureCreateGeometryTypeInfoKHR geometryCreate{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_GEOMETRY_TYPE_INFO_KHR};
    geometryCreate.geometryType      = VK_GEOMETRY_TYPE_INSTANCES_KHR;
    geometryCreate.maxPrimitiveCount = (static_cast<uint32_t>(instances.size()));
    geometryCreate.allowsTransforms  = (VK_TRUE);

    VkAccelerationStructureCreateInfoKHR asCreateInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR};
    asCreateInfo.type             = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR;
    asCreateInfo.flags            = flags;
    asCreateInfo.maxGeometryCount = 1;
    asCreateInfo.pGeometryInfos   = &geometryCreate;
````

We then call the allocator, which will create an acceleration structure handle for the TLAS. It will also query the
resulting size of the TLAS using `vk::Device::getAccelerationStructureMemoryRequirementsKHR` and allocate that
amount of memory:

```` C
    // Create the acceleration structure object and allocate the memory
    // required to hold the TLAS data
    m_tlas.as = m_alloc.createAcceleration(asCreateInfo);
    m_debug.setObjectName(m_tlas.as.accel, "Tlas");
````

As with the BLAS, we also query the amount of scratch memory required by the builder to generate the TLAS,
and allocate a scratch buffer. Note that since the BLAS and TLAS both require a scratch buffer, we could also have used
one buffer and thus saved an allocation. However, for the purpose of this tutorial, we keep the BLAS and TLAS builds
independent.

```` C
    // Compute the amount of scratch memory required by the acceleration structure builder
    VkAccelerationStructureMemoryRequirementsInfoKHR memoryRequirementsInfo{
        VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_KHR};
    memoryRequirementsInfo.type                  = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_KHR;
    memoryRequirementsInfo.accelerationStructure = m_tlas.as.accel;
    memoryRequirementsInfo.buildType             = VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR;

    VkMemoryRequirements2 reqMem{VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2};
    vkGetAccelerationStructureMemoryRequirementsKHR(m_device, &memoryRequirementsInfo, &reqMem);
    VkDeviceSize scratchSize = reqMem.memoryRequirements.size;

    // Allocate the scratch memory
    nvvkBuffer scratchBuffer =
        m_alloc.createBuffer(scratchSize, VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
    VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
    bufferInfo.buffer              = scratchBuffer.buffer;
    VkDeviceAddress scratchAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);
````

An `Instance` object is nearly identical to a `VkGeometryInstanceKHR` object: the only difference is the transform
matrix of the instance. The former uses a $4\times4$ matrix from GLM (column-major), while the latter uses a raw
array of floating-point values representing a row-major $4\times3$ matrix. Using the `Instance` object on the
application side allows us to use the more intuitive $4\times4$ matrices, making the code clearer. When generating the
TLAS we then convert all the `Instance` objects to `VkGeometryInstanceKHR`:

```` C
    // For each instance, build the corresponding instance descriptor
    std::vector<VkAccelerationStructureInstanceKHR> geometryInstances;
    geometryInstances.reserve(instances.size());
    for(const auto& inst : instances)
    {
      geometryInstances.push_back(instanceToVkGeometryInstanceKHR(inst));
    }
````

We then upload the instance descriptions to the device using a one-time command buffer. This command buffer will also be
used to generate the TLAS itself, and so we add a barrier after the copy to ensure it has completed before launching the
TLAS build.

```` C
    // Building the TLAS
    nvvk::CommandPool genCmdBuf(m_device, m_queueIndex);
    VkCommandBuffer   cmdBuf = genCmdBuf.createCommandBuffer();

    // Create a buffer holding the actual instance data for use by the AS
    // builder
    VkDeviceSize instanceDescsSizeInBytes = instances.size() * sizeof(VkAccelerationStructureInstanceKHR);

    // Allocate the instance buffer and copy its contents from host to device
    // memory
    m_instBuffer = m_alloc.createBuffer(cmdBuf, geometryInstances,
                                        VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT);
    m_debug.setObjectName(m_instBuffer.buffer, "TLASInstances");
    //VkBufferDeviceAddressInfo bufferInfo{VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO};
    bufferInfo.buffer               = m_instBuffer.buffer;
    VkDeviceAddress instanceAddress = vkGetBufferDeviceAddress(m_device, &bufferInfo);

    // Make sure the copy of the instance buffer are copied before triggering the
    // acceleration structure build
    VkMemoryBarrier barrier{VK_STRUCTURE_TYPE_MEMORY_BARRIER};
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR;
    vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR,
                         0, 1, &barrier, 0, nullptr, 0, nullptr);
````

The build is then triggered, and we execute the command buffer before destroying the temporary buffers.

```` C
    // Build the TLAS
    VkAccelerationStructureGeometryDataKHR geometry{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR};
    geometry.instances.arrayOfPointers    = VK_FALSE;
    geometry.instances.data.deviceAddress = instanceAddress;
    VkAccelerationStructureGeometryKHR topASGeometry{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR};
    topASGeometry.geometryType = VK_GEOMETRY_TYPE_INSTANCES_KHR;
    topASGeometry.geometry     = geometry;


    const VkAccelerationStructureGeometryKHR* pGeometry = &topASGeometry;
    VkAccelerationStructureBuildGeometryInfoKHR topASInfo{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR};
    topASInfo.flags                     = flags;
    topASInfo.update                    = VK_FALSE;
    topASInfo.srcAccelerationStructure  = VK_NULL_HANDLE;
    topASInfo.dstAccelerationStructure  = m_tlas.as.accel;
    topASInfo.geometryArrayOfPointers   = VK_FALSE;
    topASInfo.geometryCount             = 1;
    topASInfo.ppGeometries              = &pGeometry;
    topASInfo.scratchData.deviceAddress = scratchAddress;

    // Build Offsets info: n instances
    VkAccelerationStructureBuildOffsetInfoKHR        buildOffsetInfo{static_cast<uint32_t>(instances.size()), 0, 0, 0};
    const VkAccelerationStructureBuildOffsetInfoKHR* pBuildOffsetInfo = &buildOffsetInfo;

    // Build the TLAS
    vkCmdBuildAccelerationStructureKHR(cmdBuf, 1, &topASInfo, &pBuildOffsetInfo);


    genCmdBuf.submitAndWait(cmdBuf);
    m_alloc.finalizeAndReleaseStaging();
    m_alloc.destroy(scratchBuffer);
  }
````

## main

In the `main` function, we can now add the creation of the geometry instances and acceleration structures
 right after initializing ray tracing:

```` C
// #VKRay
helloVk.initRayTracing();
helloVk.createBottomLevelAS();
helloVk.createTopLevelAS();
````

# Ray Tracing Descriptor Set

The ray tracing shaders, like the rasterization shaders, use external resources referenced by a descriptor set. A key
difference, however, is that in a scene requiring several types of shaders, the rasterization would allow each set of
shaders to have their own descriptor set(s). For example, objects with different materials may each have a descriptor
set containing the handles of the textures it needs. This is easily done since for a given material, we would create its
corresponding rasterization pipeline and use that pipeline to render all the objects with that material. On the
contrary, with ray tracing it is not possible to know in advance which objects will be hit by a ray, so any shader may
be invoked at any time. The Vulkan ray tracing extension then uses a single set of descriptor sets containing all the
resources necessary to render the scene: for example, it would contain all the textures for all the materials.

To maintain compatibility between rasterization and ray tracing, the ray tracing pipeline will use the same descriptor
set containing the scene information, and will add another descriptor set referencing the TLAS and the buffer in which
we store the output image.

In the header, we declare the objects related to this additional descriptor set:

```` C
  void           createRtDescriptorSet();

  nvvk::DescriptorSetBindings                        m_rtDescSetLayoutBind;
  vk::DescriptorPool                                 m_rtDescPool;
  vk::DescriptorSetLayout                            m_rtDescSetLayout;
  vk::DescriptorSet                                  m_rtDescSet;
````

The acceleration structure will be accessible by the Ray Generation shader, as we want to call `TraceRayEXT()` from this
shader. Later in this document, we will also make it accessible from the Closest Hit shader, in order to send rays from
there as well. The output image is the offscreen buffer used by the rasterization, and will be written only by the
RayGen shader.

```` C
//--------------------------------------------------------------------------------------------------
// This descriptor set holds the Acceleration structure and the output image
//
void HelloVulkan::createRtDescriptorSet()
{
  using vkDT   = vk::DescriptorType;
  using vkSS   = vk::ShaderStageFlagBits;
  using vkDSLB = vk::DescriptorSetLayoutBinding;

  m_rtDescSetLayoutBind.addBinding(vkDSLB(0, vkDT::eAccelerationStructureKHR, 1,
                                          vkSS::eRaygenKHR | vkSS::eClosestHitKHR));  // TLAS
  m_rtDescSetLayoutBind.addBinding(
      vkDSLB(1, vkDT::eStorageImage, 1, vkSS::eRaygenKHR));  // Output image

  m_rtDescPool      = m_rtDescSetLayoutBind.createPool(m_device);
  m_rtDescSetLayout = m_rtDescSetLayoutBind.createLayout(m_device);
  m_rtDescSet       = m_device.allocateDescriptorSets({m_rtDescPool, 1, &m_rtDescSetLayout})[0];

  vk::AccelerationStructureKHR                   tlas = m_rtBuilder.getAccelerationStructure();
  vk::WriteDescriptorSetAccelerationStructureKHR descASInfo;
  descASInfo.setAccelerationStructureCount(1);
  descASInfo.setPAccelerationStructures(&tlas);
  vk::DescriptorImageInfo imageInfo{
      {}, m_offscreenColor.descriptor.imageView, vk::ImageLayout::eGeneral};

  std::vector<vk::WriteDescriptorSet> writes;
  writes.emplace_back(m_rtDescSetLayoutBind.makeWrite(m_rtDescSet, 0, &descASInfo));
  writes.emplace_back(m_rtDescSetLayoutBind.makeWrite(m_rtDescSet, 1, &imageInfo));
  m_device.updateDescriptorSets(static_cast<uint32_t>(writes.size()), writes.data(), 0, nullptr);
}
````

## Additions to the Scene Descriptor Set

As the ray tracing shaders also have to access the scene description, we need to extend the access flags of the
corresponding buffers in the original `createDescriptorSetLayout()`. The RayGen should access the camera matrices to
compute ray directions, and the ClosestHit needs access to the materials, scene instances, textures, vertex buffers, and
index buffers. Even though the vertex and index buffers will only be used by the ray tracing shaders we add them to this
descriptor set as they semantically fit the Scene descriptor set.

```` C
// Camera matrices (binding = 0)
m_descSetLayoutBind.addBinding(
    vkDS(0, vkDT::eUniformBuffer, 1, vkSS::eVertex | vkSS::eRaygenKHR));
// Materials (binding = 1)
m_descSetLayoutBind.addBinding(
    vkDS(1, vkDT::eStorageBuffer, nbObj, vkSS::eVertex | vkSS::eFragment | vkSS::eClosestHitKHR));
// Scene description (binding = 2)
m_descSetLayoutBind.addBinding(  //
    vkDS(2, vkDT::eStorageBuffer, 1, vkSS::eVertex | vkSS::eFragment | vkSS::eClosestHitKHR));
// Textures (binding = 3)
m_descSetLayoutBind.addBinding(
    vkDS(3, vkDT::eCombinedImageSampler, nbTxt, vkSS::eFragment | vkSS::eClosestHitKHR));
// Materials (binding = 4)
m_descSetLayoutBind.addBinding(
    vkDS(4, vkDT::eStorageBuffer, nbObj, vkSS::eFragment | vkSS::eClosestHitKHR));
// Storing vertices (binding = 5)
m_descSetLayoutBind.addBinding(  //
    vkDS(5, vkDT::eStorageBuffer, nbObj, vkSS::eClosestHitKHR));
// Storing indices (binding = 6)
m_descSetLayoutBind.addBinding(  //
    vkDS(6, vkDT::eStorageBuffer, nbObj, vkSS::eClosestHitKHR));
````

We set the actual contents of the descriptor set by adding those buffers in `updateDescriptorSet()`:

```` C
  // All material buffers, 1 buffer per OBJ
  std::vector<vk::DescriptorBufferInfo> dbiMat;
  std::vector<vk::DescriptorBufferInfo> dbiMatIdx;
  std::vector<vk::DescriptorBufferInfo> dbiVert;
  std::vector<vk::DescriptorBufferInfo> dbiIdx;
  for(size_t i = 0; i < m_objModel.size(); ++i)
  {
    dbiMat.push_back({m_objModel[i].matColorBuffer.buffer, 0, VK_WHOLE_SIZE});
    dbiMatIdx.push_back({m_objModel[i].matIndexBuffer.buffer, 0, VK_WHOLE_SIZE});
    dbiVert.push_back({m_objModel[i].vertexBuffer.buffer, 0, VK_WHOLE_SIZE});
    dbiIdx.push_back({m_objModel[i].indexBuffer.buffer, 0, VK_WHOLE_SIZE});
  }
  writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 1, dbiMat.data()));
  writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 4, dbiMatIdx.data()));
  writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 5, dbiVert.data()));
  writes.emplace_back(m_descSetLayoutBind.makeWriteArray(m_descSet, 6, dbiIdx.data()));
````

Originally the buffers containing the vertices and indices were only used by the rasterization pipeline.
The ray tracing will need to use those buffers as storage buffers (`VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT`),
the address to those buffers are needed to fill the `VkAccelerationStructureGeometryTrianglesDataKHR` structure,
and because they are use for constructing the acceleration structure, they also need
the `VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR` flag.

We update the usage of the buffers in `loadModel`:

```` C
model.vertexBuffer =
m_alloc.createBuffer(cmdBuf, loader.m_vertices,
                     vkBU::eVertexBuffer | vkBU::eStorageBuffer | vkBU::eShaderDeviceAddress
                         | vkBU::eAccelerationStructureBuildInputReadOnlyKHR);
model.indexBuffer =
m_alloc.createBuffer(cmdBuf, loader.m_indices,
                     vkBU::eIndexBuffer | vkBU::eStorageBuffer | vkBU::eShaderDeviceAddress
                         | vkBU::eAccelerationStructureBuildInputReadOnlyKHR);
````

!!! Note: Array of Buffers
    Each model (OBJ) was constructed with a buffer of vertices, indices, and materials. Therefore the
    scene has vectors of those buffers. In the shaders, we access the right buffer using the
    the ObjectID used by the Instance. This is convenient, as we have access to all the data
    of the scene while ray tracing.

## Descriptor Update

As with the rasterization descriptor set, the ray tracing descriptor set needs to be updated if its contents change.
This typically happens when resizing the window, as the output image is recreated and needs to be re-linked to the
descriptor set. The update is performed in a new method of the `HelloVulkan` class:

```` C
void updateRtDescriptorSet();
````

The implementation is straightforward, simply updating the output image reference:

```` C
//--------------------------------------------------------------------------------------------------
// Writes the output image to the descriptor set
// - Required when changing resolution
//
void HelloVulkan::updateRtDescriptorSet()
{
  using vkDT = vk::DescriptorType;

  // (1) Output buffer
  vk::DescriptorImageInfo imageInfo{
      {}, m_offscreenColor.descriptor.imageView, vk::ImageLayout::eGeneral};
  vk::WriteDescriptorSet wds{m_rtDescSet, 1, 0, 1, vkDT::eStorageImage, &imageInfo};
  m_device.updateDescriptorSets(wds, nullptr);
}
````

We can then add the update call to the `onResize()` method to link it to the resizing event:

```` C
  updateRtDescriptorSet();
````

The resources created in this section need to be destroyed when closing the application by adding the following to
`destroyResources`:

```` C
  m_device.destroy(m_rtDescPool);
  m_device.destroy(m_rtDescSetLayout);
````

## main

In the `main` function, we create the descriptor set after the other ray tracing calls:

```` C
  helloVk.createRtDescriptorSet();
````

# Ray Tracing Pipeline

When creating rasterization shaders with Vulkan, the application compiles them into executable shaders, which are bound
to the rasterization pipeline. All objects rendered using this pipeline will use those shaders. To render an image with
several types of shaders, the rasterization pipeline needs to be set to use each before calling the draw commands.

In a ray tracing context, a ray traced through the scene can hit any object and thus trigger the execution of any
shader. Instead of using one shader executable at a time, we now need to have all shaders available at once. The
pipeline then contains all the shaders required to render the scene, and information on how to execute it. To be able to
ray trace some geometry, the Vulkan ray tracing extension typically uses at least these 3 shader programs:

* The **ray generation** shader will be the starting point for ray tracing, and will be called for each pixel. It will
  typically initialize a ray starting at the location of the camera, in a direction given by evaluating the camera lens
  model at the pixel location. It will then invoke `traceRayEXT()`, that will shoot the ray in the scene. Other shaders below
  will process further events, and return their result to the ray generation shader through the ray payload.

* The **miss** shader is executed when a ray does not intersect any geometry. For instance, it might sample an
  environment map, or return a simple color through the ray payload.

* The **closest hit** shader is called upon hitting the geometric instance closest to the starting point of the ray.
  This shader can for example perform lighting calculations and return the results through the ray payload. There can be
  as many closest hit shaders as needed, much like how a rasterization-based application has multiple pixel shaders
  depending on its objects.

Two more shader types can optionally be used:

* The **intersection** shader, which allows intersecting user-defined geometry. For example, this can be used to
  intersect geometry placeholders for on-demand geometry loading, or intersecting procedural geometry without tessellating
  them beforehand. Using this shader requires modifying how the acceleration structures are built, and is beyond the scope
  of this tutorial. We will instead rely on the built-in triangle intersection shader provided by the extension, which
  returns 2 floating-point values representing the barycentric coordinates `(u,v)` of the hit point inside the triangle.
  For a triangle made of vertices `v0`, `v1`, `v2`, the barycentric coordinates define the weights of the vertices as
  follows:

***********************
*            . u      *
*           / \       *
*          / v1\      *
*         /     \     *
*        /       \    *
* 1-u-v / v0   v2 \ v *
*      '-----------'  *
***********************


* The **any hit** shader is executed on each potential intersection: when searching for the hit point closest to the ray
  origin, several candidates may be found on the way. The any hit shader can frequently be used to efficiently implement
  alpha-testing. If the alpha test fails, the ray traversal can continue without having to call `traceRayEXT()` again. The
  built-in any hit shader is simply a pass-through returning the intersection to the traversal engine, which will
  determine which ray intersection is the closest.

![Figure [step]: The Ray Tracing Pipeline](Images/ShaderPipeline.svg)

We will start with a pipeline containing only the 3 main shader programs: a single ray generation shader, a single miss
shader, and a single hit group made only of a closest hit shader. This is done by first compiling each GLSL shader
program into SPIR-V. These SPIR-V shaders will be linked together into a ray tracing pipeline, which will be able to
route the intersection calculations to the right hit shaders.

To be able to focus on the pipeline generation, we provide simple shaders:

## Adding Shaders

!!! Warning: [Download Ray Tracing Shaders](files/shaders.zip)
    Download the shaders and extract the content into `src/shaders`. Then rerun CMake, which will add those files to the project.

The `shaders` folder now contains 3 more files:

* `raytrace.rgen` contains the ray generation program. It also declares its access to the ray tracing output buffer
  `image`, and the ray tracing acceleration structure `topLevelAS`, bound as an `accelerationStructureKHR`. For now this
  shader program simply writes a constant color into the output buffer.

* `raytrace.rmiss` defines the miss shader. This shader will be executed when no geometry is hit, and will write a
  constant color into the ray payload `rayPayloadInEXT`, which is provided automatically. Since our current ray generation
  program does not trace any rays for now, this shader will not be called.

* `raytrace.rchit` contains a very simple closest hit shader. It will be executed upon hitting the geometry (our
  triangles). As the miss shader, it takes the ray payload `rayPayloadInEXT`. It also has a second input defining the
  intersection attributes `hitAttributeEXT` as provided by the intersection shader, i.e. the barycentric coordinates. This
  shader simply writes a constant color to the payload.

In the header file, let's add the definition of the ray tracing pipeline building method, and the storage members of the
pipeline:

```` C
void                                               createRtPipeline();
std::vector<vk::RayTracingShaderGroupCreateInfoKHR> m_rtShaderGroups;
vk::PipelineLayout                                 m_rtPipelineLayout;
vk::Pipeline                                       m_rtPipeline;
````

The pipeline will also use push constants to store global uniform values, namely the background color and
the light source information:

```` C
  struct RtPushConstant
  {
    nvmath::vec4f clearColor;
    nvmath::vec3f lightPosition;
    float         lightIntensity;
    int           lightType;
  } m_rtPushConstants;
````

Our implementation of the ray tracing pipeline generation starts by adding the ray generation and miss shader stages,
followed by the closest hit shader. Note that this order is arbitrary, as the extension allows the developer to set up
the pipeline in any order.

All stages are stored in an array of `vk::PipelineShaderStageCreateInfo` objects. Indices within this vector will be
used as unique identifiers for the shaders in the Shader Binding Table. These identifiers are stored in the
`RayTracingShaderGroupCreateInfoKHR` structure. This structure first specifies a `type`, which represents the kind of
shader group represented in the structure. Ray generation, miss shaders are called 'general' shaders. In this case the
type is `eGeneral`, and only the `generalShader` member of the structure is filled. The other ones are set to
`VK_SHADER_UNUSED_KHR`. This is also the case for the callable shaders, not used in this tutorial. In our layout the ray
generation comes first (0), followed by the miss shader (1).

```` C
//--------------------------------------------------------------------------------------------------
// Pipeline for the ray tracer: all shaders, raygen, chit, miss
//
void HelloVulkan::createRtPipeline()
{
  std::vector<std::string> paths = defaultSearchPaths;

  vk::ShaderModule raygenSM =
    nvvk::createShaderModule(m_device,  //
                             nvh::loadFile("shaders/raytrace.rgen.spv", true, paths, true));
  vk::ShaderModule missSM =
    nvvk::createShaderModule(m_device,  //
                             nvh::loadFile("shaders/raytrace.rmiss.spv", true, paths, true));

  std::vector<vk::PipelineShaderStageCreateInfo> stages;

  // Raygen
  vk::RayTracingShaderGroupCreateInfoKHR rg{vk::RayTracingShaderGroupTypeKHR::eGeneral,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
  stages.push_back({{}, vk::ShaderStageFlagBits::eRaygenKHR, raygenSM, "main"});
  rg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
  m_rtShaderGroups.push_back(rg);
  // Miss
  vk::RayTracingShaderGroupCreateInfoKHR mg{vk::RayTracingShaderGroupTypeKHR::eGeneral,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
  stages.push_back({{}, vk::ShaderStageFlagBits::eMissKHR, missSM, "main"});
  mg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
  m_rtShaderGroups.push_back(mg);

````

As detailed before, intersections are managed by 3 kinds of shaders: the intersection shader computes the ray-geometry
intersections, the any-hit shader is run for every potential intersection, and the closest hit shader is applied to the
closest hit point along the ray. Those 3 shaders are bound into a hit group. In our case the geometry is made of
triangles, so the `type` of the `RayTracingShaderGroupCreateInfoKHR` is `eTrianglesHitGroup`. The intersection shader is
then built-in, and we set the `intersectionShader` member to `VK_SHADER_UNUSED_KHR`. We do not use a any-hit shader,
letting the system use a built-in pass-through shader. Therefore, we also leave the `anyHitShader` to
`VK_SHADER_UNUSED_KHR`. The only shader we define is then the closest hit shader, by setting the `closestHitShader`
member to the index `2` (`stages.size()-1`), since the `stages` vector already contains the ray generation and miss
shaders.

```` C
  // Hit Group - Closest Hit + AnyHit
  vk::ShaderModule chitSM =
      nvvk::createShaderModule(m_device,  //
                               nvh::loadFile("shaders/raytrace.rchit.spv", true, paths, true));

  vk::RayTracingShaderGroupCreateInfoKHR hg{vk::RayTracingShaderGroupTypeKHR::eTrianglesHitGroup,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR,
                                           VK_SHADER_UNUSED_KHR, VK_SHADER_UNUSED_KHR};
  stages.push_back({{}, vk::ShaderStageFlagBits::eClosestHitKHR, chitSM, "main"});
  hg.setClosestHitShader(static_cast<uint32_t>(stages.size() - 1));
  m_rtShaderGroups.push_back(hg);
````

Note that if the geometry were not triangles, we would have set the `type` to `eProceduralHitGroup`, and would have to
define an intersection shader.

After creating the shader groups, we need to setup the pipeline layout that will describe how the pipeline
will access external data:

```` C
  vk::PipelineLayoutCreateInfo pipelineLayoutCreateInfo;
````

We first add the push constant range to allow the ray tracing shaders to access the global uniform values:

```` C
  // Push constant: we want to be able to update constants used by the shaders
  vk::PushConstantRange pushConstant{vk::ShaderStageFlagBits::eRaygenKHR
                                         | vk::ShaderStageFlagBits::eClosestHitKHR
                                         | vk::ShaderStageFlagBits::eMissKHR,
                                     0, sizeof(RtPushConstant)};
  pipelineLayoutCreateInfo.setPushConstantRangeCount(1);
  pipelineLayoutCreateInfo.setPPushConstantRanges(&pushConstant);
````

As described earlier, the pipeline uses two descriptor sets: `set=0` is specific to the ray tracing pipeline (TLAS and
output image), and `set=1` is shared with the rasterization (scene data):

```` C
  // Descriptor sets: one specific to ray tracing, and one shared with the rasterization pipeline
  std::vector<vk::DescriptorSetLayout> rtDescSetLayouts = {m_rtDescSetLayout, m_descSetLayout};
  pipelineLayoutCreateInfo.setSetLayoutCount(static_cast<uint32_t>(rtDescSetLayouts.size()));
  pipelineLayoutCreateInfo.setPSetLayouts(rtDescSetLayouts.data());
````

The pipeline layout information is now complete, allowing us to create the layout itself.

```` C
  m_rtPipelineLayout = m_device.createPipelineLayout(pipelineLayoutCreateInfo);
````

The creation of the ray tracing pipeline is different from the classical graphics pipeline. In the graphics pipeline we
simply need to fill in the fixed set of programmable stages (vertex, fragment, etc.). The ray tracing pipeline can
contain an arbitrary number of stages depending on the number of active shaders in the scene.

We first provide all the stages that will be used:

```` C
  // Assemble the shader stages and recursion depth info into the ray tracing pipeline
  vk::RayTracingPipelineCreateInfoKHR rayPipelineInfo;
  rayPipelineInfo.setStageCount(static_cast<uint32_t>(stages.size()));  // Stages are shaders
  rayPipelineInfo.setPStages(stages.data());
````

Then, we indicate how the shaders can be assembled into groups. A ray generation or miss shader is a group by
itself, but hit groups can comprise up to 3 shaders (intersection, any hit, closest hit).

```` C
  rayPipelineInfo.setGroupCount(
      static_cast<uint32_t>(m_rtShaderGroups.size()));  // 1-raygen, n-miss, n-(hit[+anyhit+intersect])
  rayPipelineInfo.setPGroups(m_rtShaderGroups.data());
````

The ray generation and closest hit shaders can trace rays, making the ray tracing a potentially recursive process. To
allow the underlying RTX layer to optimize the pipeline we indicate the maximum recursion depth used by our shaders. For
the simplistic shaders we currently have, we set this depth to 1, meaning that even if the shaders would trigger
recursion (ie. a hit shader calling `TraceRayEXT()`), this recursion would be prevented by setting the result of this trace
call as a miss. Note that it is preferable to keep the recursion level as low as possible, replacing it by a loop
formulation instead.

```` C
  rayPipelineInfo.setMaxPipelineRayRecursionDepth(1);  // Ray depth
  rayPipelineInfo.setLayout(m_rtPipelineLayout);
  m_rtPipeline = static_cast<const vk::Pipeline&>(
    m_device.createRayTracingPipelineKHR({}, {}, rayPipelineInfo));
````

Once the pipeline has been created we discard the shader modules:

```` C
  m_device.destroy(raygenSM);
  m_device.destroy(missSM);
  m_device.destroy(chitSM);
}
````

The pipeline layout and the pipeline itself also have to be cleaned up upon closing, hence we add this to
`destroyResources`:

```` C
  m_device.destroy(m_rtPipeline);
  m_device.destroy(m_rtPipelineLayout);
````

## main

In the `main` function, we call the pipeline construction after the other ray tracing calls:

```` C
  helloVk.createRtPipeline();
````

# Shader Binding Table

In a typical rasterization setup, a current shader and its associated resources are bound prior to drawing the
corresponding objects, then another shader and resource set can be bound for some other objects, and so on. Since ray
tracing can hit any surface of the scene at any time, all shaders must be available simultaneously.

The Shader Binding Table is the blueprint of the ray tracing process. It indicates which ray generation shader to start
with, which miss shader to execute if no intersections are found, and which hit shader groups can be executed for each
instance. This association between instances and shader groups is created when setting up the geometry: for each
instance we provided a `hitGroupId` in the TLAS. This value represents the index in the SBT corresponding to the hit
group for that instance.

## Handles

The SBT is an array containing the handles to the shader groups used in the ray tracing pipeline. In our example, we
will create a buffer for the three groups: raygen, miss and closest hit. The size of the handle is given by the
`shaderGroupHandleSize` member of the ray tracing properties, but the offset need to be aligned on `shaderGroupBaseAlignment`.
 We will then allocate a buffer of size `3 * shaderGroupBaseAlignment` and will consecutively write the handle of each shader group.
  To retrieve all the handles, we will call `vkGetRayTracingShaderGroupHandlesKHR`.

The buffer will have the following information, which will later be used when calling `vkCmdTraceRaysKHR`:

******************
*+--------------+*
*| RayGen       |*
*| Handle       |*
*+--------------+*
*| Miss         |*
*| Handle       |*
*+--------------+*
*| HitGroup     |*
*| Handle       |*
*+--------------+*
******************

We first add the declarations of the SBT creation method and the SBT buffer itself in the `HelloVulkan` class:

```` C
void           createRtShaderBindingTable();
nvvkBuffer     m_rtSBTBuffer;
````

In this function, we start by computing the size of the binding table from the number of groups and the handle size so
that we can allocate the SBT buffer.

```` C
//--------------------------------------------------------------------------------------------------
// The Shader Binding Table (SBT)
// - getting all shader handles and writing them in a SBT buffer
// - Besides exception, this could be always done like this
//   See how the SBT buffer is used in run()
//
void HelloVulkan::createRtShaderBindingTable()
{
  auto groupCount =
      static_cast<uint32_t>(m_rtShaderGroups.size());               // 3 shaders: raygen, miss, chit
  uint32_t groupHandleSize = m_rtProperties.shaderGroupHandleSize;  // Size of a program identifier
  uint32_t groupSizeAligned =
      nvh::align_up(groupHandleSize, m_rtProperties.shaderGroupBaseAlignment);

  // Fetch all the shader handles used in the pipeline, so that they can be written in the SBT
  uint32_t sbtSize = groupCount * groupSizeAligned;
````

We then fetch the handles to the shader groups of the pipeline, and let the allocator
allocate the device memory and copy the handles into the SBT. Note that SBT buffer need the
`VK_BUFFER_USAGE_SHADER_BINDING_TABLE_BIT_KHR` flag and since we will need the address
of SBT buffer, therefore the buffer need also the `VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT` flag.

```` C
  std::vector<uint8_t> shaderHandleStorage(sbtSize);
  m_device.getRayTracingShaderGroupHandlesKHR(m_rtPipeline, 0, groupCount, sbtSize,
                                              shaderHandleStorage.data());

  // Write the handles in the SBT
  m_rtSBTBuffer = m_alloc.createBuffer(
      sbtSize,
      vk::BufferUsageFlagBits::eTransferSrc | vk::BufferUsageFlagBits::eShaderDeviceAddress
          | vk::BufferUsageFlagBits::eShaderBindingTableKHR,
      vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent);
  m_debug.setObjectName(m_rtSBTBuffer.buffer, std::string("SBT").c_str());

  // Write the handles in the SBT
  void* mapped = m_alloc.map(m_rtSBTBuffer);
  auto* pData  = reinterpret_cast<uint8_t*>(mapped);
  for(uint32_t g = 0; g < groupCount; g++)
  {
    memcpy(pData, shaderHandleStorage.data() + g * groupHandleSize, groupHandleSize);  // raygen
    pData += groupSizeAligned;
  }
  m_alloc.unmap(m_rtSBTBuffer);

  m_alloc.finalizeAndReleaseStaging();
}
````

As with other resources, we destroy the SBT in `destroyResources`:

```` C
  m_alloc.destroy(m_rtSBTBuffer);
````

## main

In the `main` function, we now add the construction of the Shader Binding Table:

```` C
  helloVk.createRtShaderBindingTable();
````

# Ray Tracing

Let's create a function that will call the execution of the ray tracer. First, add the declaration to the header

```` C
void       raytrace(const vk::CommandBuffer& cmdBuf, const nvmath::vec4f& clearColor);
````

We first bind the pipeline and its layout, and set the push constants that will be available throughout the pipeline:

```` C
//--------------------------------------------------------------------------------------------------
// Ray Tracing the scene
//
void HelloVulkan::raytrace(const vk::CommandBuffer& cmdBuf, const nvmath::vec4f& clearColor)
{
  m_debug.beginLabel(cmdBuf, "Ray trace");
  // Initializing push constant values
  m_rtPushConstants.clearColor     = clearColor;
  m_rtPushConstants.lightPosition  = m_pushConstant.lightPosition;
  m_rtPushConstants.lightIntensity = m_pushConstant.lightIntensity;
  m_rtPushConstants.lightType      = m_pushConstant.lightType;

  cmdBuf.bindPipeline(vk::PipelineBindPoint::eRayTracingKHR, m_rtPipeline);
  cmdBuf.bindDescriptorSets(vk::PipelineBindPoint::eRayTracingKHR, m_rtPipelineLayout, 0,
                            {m_rtDescSet, m_descSet}, {});
  cmdBuf.pushConstants<RtPushConstant>(m_rtPipelineLayout,
                                       vk::ShaderStageFlagBits::eRaygenKHR
                                           | vk::ShaderStageFlagBits::eClosestHitKHR
                                           | vk::ShaderStageFlagBits::eMissKHR,
                                       0, m_rtPushConstants);
````

Since the structure of the Shader Binding Table is up to the developer, we need to indicate the ray tracing pipeline how
to interpret it. In particular we compute the offsets in the SBT where the ray generation shader, miss shaders and hit
groups can be found. Miss shaders and hit groups are stored contiguously, hence we also compute the stride separating
each shader. In our case the stride is simply the size of a shader group handle, but more advanced uses may embed
shader-group-specific data within the SBT, resulting in a larger stride.

```` C
// Size of a program identifier
  uint32_t groupSize =
      nvh::align_up(m_rtProperties.shaderGroupHandleSize, m_rtProperties.shaderGroupBaseAlignment);
  uint32_t       groupStride = groupSize;
  vk::DeviceSize hitGroupSize =
      nvh::align_up(m_rtProperties.shaderGroupHandleSize + sizeof(HitRecordBuffer),
                    m_rtProperties.shaderGroupBaseAlignment);
  vk::DeviceAddress sbtAddress = m_device.getBufferAddress({m_rtSBTBuffer.buffer});

  using Stride = vk::StridedDeviceAddressRegionKHR;
  std::array<Stride, 4> strideAddresses{
      Stride{sbtAddress + 0u * groupSize, groupStride, groupSize * 1},      // raygen
      Stride{sbtAddress + 1u * groupSize, groupStride, groupSize * 2},      // miss
      Stride{sbtAddress + 3u * groupSize, hitGroupSize, hitGroupSize * 3},  // hit
      Stride{0u, 0u, 0u}};                                                  // callable

````

We can finally call `traceRaysKHR` that will add the ray tracing launch in the command buffer. Note that the SBT buffer
is mentioned several times. This is due to the possibility of separating the SBT into several buffers, one for each
type: ray generation, miss shaders, hit groups, and callable shaders (outside the scope of this tutorial). The last
three parameters are equivalent to the grid size of a compute launch, and represent the total number of threads. Since
we want to trace one ray per pixel, the grid size has the width and height of the output image, and a depth of 1.

```` C
  cmdBuf.traceRaysKHR(&strideAddresses[0], &strideAddresses[1], &strideAddresses[2],
                      &strideAddresses[3],              //
                      m_size.width, m_size.height, 1);  //

  m_debug.endLabel(cmdBuf);
}
````

# Let's Ray Trace

Now we have everything set up to be able to trace rays: the acceleration structure, the descriptor sets, the ray tracing
pipeline and the shader binding table. Let's try to make images from this.

## main

In the `main` function, we will define a local variable to switch between rasterization and ray tracing. Add the
following right after the ray tracing initialization calls:

```` C
bool useRaytracer = true;
````

In the same function, we will add a UI checkbox to make that switch at runtime. Right after the line
`ImGui::ColorEdit3(`, we add

```` C
ImGui::Checkbox("Ray Tracer mode", &useRaytracer); // Switch between raster and ray tracing
````

A few lines below, you can find a block containing the `helloVk.rasterize` call. Since our application will now have two
render modes, we replace that block by

```` C
  // Rendering Scene
  if(useRaytracer)
  {
    helloVk.raytrace(cmdBuff, clearColor);
  }
  else
  {
    cmdBuff.beginRenderPass(offscreenRenderPassBeginInfo, vk::SubpassContents::eInline);
    helloVk.rasterize(cmdBuff);
    cmdBuff.endRenderPass();
  }
````

Note that the ray tracing behaves more like a compute shader than a graphics task, and is then outside of a render pass.

We should now be able to alternate between rasterization and ray tracing. However, the ray tracing result only renders a
flat gray image: the simplistic ray generation shader does not trace any ray yet, and simply returns a fixed color.

Raster                         |     | Ray Trace
:-----------------------------:|:---:|:--------------------------------:
![](Images/resultRasterCube.png width="350px")   | <-> |   ![](Images/resultRaytraceEmptyCube.png width="350px")

# Camera Setup

In the context of rasterization, the vertices of the objects are projected from their world-space position into a
$[0,1]\times[0,1]\times[0,1]$ cube, before being rasterized on the XY plane. For ray tracing, we need to initialize some
rays at the camera position, and intersect the geometry in world space. To achieve this, we need to store the inverse
view and projection matrices in the `CameraMatrices` at the beginning of the `hello_vulkan.cpp` file:

```` C
struct CameraMatrices
{
  nvmath::mat4f view;
  nvmath::mat4f proj;
  nvmath::mat4f viewInverse;
  // #VKRay
  nvmath::mat4f projInverse;
};
````

Since the camera matrices will be used by the RayGen, see next sub section, the descriptorSet need to also have
the usage flag to include that stage. This was done in section Additions to the Scene Descriptor Set

## updateUniformBuffer

The computation of the matrix inverses is done in `updateUniformBuffer`, after setting the `ubo.proj` matrix:

```` C
// #VKRay
ubo.projInverse = nvmath::invert(ubo.proj);
````

## Ray generation (raytrace.rgen)

It is now time to enrich the ray generation shader to allow it to trace rays. We will first add a new binding to allow
the shader to access the camera matrices.

```` C
layout(binding = 0, set = 1) uniform CameraProperties
{
  mat4 view;
  mat4 proj;
  mat4 viewInverse;
  mat4 projInverse;
}
cam;
````
!!! Note: Binding
    The buffer of camera uses `binding = 0` as described in `createDescriptorSetLayout()`. The
    `set = 1` comes from the fact that it is the second descriptor set in `raytrace()`.

When tracing a ray, the hit or miss shaders need to be able to return some information to the shader program that
invoked the ray tracing. This is done through the use of a payload, identified by the `rayPayloadEXT` qualifier.

Since the payload struct will be reused in several shaders, we create a new shader file `raycommon.glsl` and add it to
the Visual Studio folder.

This file contains only the payload definition:

~~~~ C++
struct hitPayload
{
  vec3 hitValue;
};
~~~~

We now modify `raytrace.rgen` to include this new file. Note that the `#include` directive is an GLSL extension, which
we also enable:

~~~~ C++
#extension GL_GOOGLE_include_directive : enable
#include "raycommon.glsl"
~~~~

The payload, identified with `rayPayloadEXT` is then our `hitPayload` structure.

```` C
layout(location = 0) rayPayloadEXT hitPayload prd;
````

### Note

> In incoming shaders, like miss and closest hit, the payload will be `rayPayloadInEXT`.

The `main` function of the shader then starts by computing the floating-point pixel coordinates, normalized between 0
and 1. The `gl_LaunchIDEXT` contains the integer coordinates of the pixel being rendered, while `gl_LaunchSizeEXT`
corresponds to the image size provided when calling `traceRayEXT`.

```` C
void main()
{
    const vec2 pixelCenter = vec2(gl_LaunchIDEXT.xy) + vec2(0.5);
    const vec2 inUV = pixelCenter/vec2(gl_LaunchSizeEXT.xy);
    vec2 d = inUV * 2.0 - 1.0;
````

From the pixel coordinates, we can apply the inverse transformation of the view and projection matrices of the camera to
obtain the origin and direction of the ray.

```` C
  vec4 origin    = cam.viewInverse * vec4(0, 0, 0, 1);
  vec4 target    = cam.projInverse * vec4(d.x, d.y, 1, 1);
  vec4 direction = cam.viewInverse * vec4(normalize(target.xyz), 0);
````

In addition, we provide some flags for the ray: first. a flag indicating that all geometry will be considered opaque, as
we also indicated when creating the acceleration structures. We also indicate the minimum and maximum distance of the
potential intersections along the ray. Those distances can be useful to reduce the ray tracing costs if intersections
before or after a given point do not matter. A typical use case is for computing ambient occlusion.

```` C
  uint  rayFlags = gl_RayFlagsOpaqueEXT;
  float tMin     = 0.001;
  float tMax     = 10000.0;
````

We now trace the ray itself, by first providing `traceRayEXT` with the top-level acceleration structure and the ray masks.
The `cullMask` value is a mask that will be binary AND-ed with the mask of the geometry instances. Since all instances
have a `0xFF` flag as well, they will all be visible. The next 3 parameters indicate which hit group would be called
when hitting a surface. For example, a single object may be associated to 2 hit groups representing the behavior when
hit by a direct camera ray, or from a shadow ray. Since each instance has an index indicating the offset of the hit
groups for the instance in the shader binding table, the `sbtRecordOffset` will allow to fetch the right kind of shader
for that instance. In the case of the primary rays we may want to use the first hit group and use an offset of 0, while
for shadow rays the second hit group would be required, hence an offset of 1. The stride indicates the number of hit
groups for a single instance. This is particularly useful if the instance offset is not set when creating the instances
in the acceleration structure. A stride of 0 indicates that all hit groups are packed together, and the instance offset
can be used directly to find them in the SBT. The index of the miss shader comes next, followed by the ray origin,
direction and extents. The last parameter identifies the payload that will be carried by the ray, by giving its location
index. The last `0` corresponds to the location of our payload, `layout(location = 0) rayPayloadEXT hitPayload prd;`.

```` C
  traceRayEXT(topLevelAS,     // acceleration structure
          rayFlags,       // rayFlags
          0xFF,           // cullMask
          0,              // sbtRecordOffset
          0,              // sbtRecordStride
          0,              // missIndex
          origin.xyz,     // ray origin
          tMin,           // ray min range
          direction.xyz,  // ray direction
          tMax,           // ray max range
          0               // payload (location = 0)
  );
````

Finally, we write the resulting payload into the output buffer.

```` C
    imageStore(image, ivec2(gl_LaunchIDEXT.xy), vec4(prd.hitValue, 1.0));
}
````

Raster                         |     | Ray Trace
:-----------------------------:|:---:|:--------------------------------:
![](Images/resultRasterCube.png width="350px")   | <-> |   ![](Images/resultRaytraceFlatCube.png width="350px")

## Miss shader (raytrace.miss)

To share the clear color of the rasterization with the ray tracer, we will change the return value of the miss shader to
return the clear value passed as a push constant. While the `Constants` struct contains more members, here we use the
fact that `clearColor` is the first member in the struct, and do not even declare the subsequent members.

```` C
#extension GL_GOOGLE_include_directive : enable
#include "raycommon.glsl"

layout(location = 0) rayPayloadInEXT hitPayload prd;

layout(push_constant) uniform Constants
{
  vec4 clearColor;
};

void main()
{
  prd.hitValue = clearColor.xyz * 0.8;
}
````

!!! Note:
    The color of the background is slightly darker to differentiate the two renderers.


# Simple Lighting

The current closest hit shader only returns a flat color. To add some lighting, we will need to introduce the concept of
surface normals. However, the ray tracing only provides the barycentric coordinates of the hit point. To obtain the
normals and the other vertex attributes, we will need to find them in the vertex buffer and interpolate them using the
barycentric coordinates. This is why we extended the usage of the vertex and index buffers when creating the ray tracing
descriptor set.

## Closest Hit (raytrace.rchit)

When we created the ray tracing descriptor set, we already included the geometry definition. Therefore, we can reference
the vertex and index buffers directly in the closest hit shader, via the scene description `binding = 2`

We first include the payload definition and the OBJ-Wavefront structures

```` C
#extension GL_EXT_scalar_block_layout : enable
#extension GL_GOOGLE_include_directive : enable
#include "raycommon.glsl"
#include "wavefront.glsl"
````

Then we describe the resources according to the descriptor set layout

```` C
layout(location = 0) rayPayloadInEXT hitPayload prd;

layout(binding = 2, set = 1, scalar) buffer ScnDesc { sceneDesc i[]; } scnDesc;
layout(binding = 5, set = 1, scalar) buffer Vertices { Vertex v[]; } vertices[];
layout(binding = 6, set = 1) buffer Indices { uint i[]; } indices[];
````

In the Hit shader we need all the members of the push constant block:

```` C
layout(push_constant) uniform Constants
{
  vec4  clearColor;
  vec3  lightPosition;
  float lightIntensity;
  int   lightType;
}
pushC;
````

In the `main` function, the `gl_PrimitiveID` allows us to find the vertices of the triangle hit by the ray:

```` C
void main()
{
  // Object of this instance
  uint objId = scnDesc.i[gl_InstanceID].objId;

  // Indices of the triangle
  ivec3 ind = ivec3(indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 0],   //
                    indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 1],   //
                    indices[nonuniformEXT(objId)].i[3 * gl_PrimitiveID + 2]);  //
  // Vertex of the triangle
  Vertex v0 = vertices[nonuniformEXT(objId)].v[ind.x];
  Vertex v1 = vertices[nonuniformEXT(objId)].v[ind.y];
  Vertex v2 = vertices[nonuniformEXT(objId)].v[ind.z];
````

Using the hit point's barycentric coordinates, we can interpolate the normal:

```` C
  const vec3 barycentrics = vec3(1.0 - attribs.x - attribs.y, attribs.x, attribs.y);

  // Computing the normal at hit position
  vec3 normal = v0.nrm * barycentrics.x + v1.nrm * barycentrics.y + v2.nrm * barycentrics.z;
  // Transforming the normal to world space
  normal = normalize(vec3(scnDesc.i[gl_InstanceID].transfoIT * vec4(normal, 0.0)));
````

The world-space position could be calculated in two ways, the first one being to use the information from the hit
shader. But this could have precision issues if the hit point is very far.

```` C
  vec3 worldPos = gl_WorldRayOriginEXT + gl_WorldRayDirectionEXT * gl_HitTEXT;
````

Another solution, more precise, consists in computing the position by interpolation, as for the normal

```` C
  // Computing the coordinates of the hit position
  vec3 worldPos = v0.pos * barycentrics.x + v1.pos * barycentrics.y + v2.pos * barycentrics.z;
  // Transforming the position to world space
  worldPos = vec3(scnDesc.i[gl_InstanceID].transfo * vec4(worldPos, 1.0));
````

The light source specified in the constants can then be used to compute the dot product of the normal with the lighting
direction, giving a simple diffuse lighting effect:

```` C
  // Vector toward the light
  vec3  L;
  float lightIntensity = pushC.lightIntensity;
  float lightDistance  = 100000.0;
  // Point light
  if(pushC.lightType == 0)
  {
    vec3 lDir      = pushC.lightPosition - worldPos;
    lightDistance  = length(lDir);
    lightIntensity = pushC.lightIntensity / (lightDistance * lightDistance);
    L              = normalize(lDir);
  }
  else // Directional light
  {
    L = normalize(pushC.lightPosition - vec3(0));
  }

  float dotNL = max(dot(normal, L), 0.2);

  prd.hitValue = vec3(dotNL);
}
````

![](Images/resultRaytraceLightGreyCube.png width="350px")


# Simple Materials

The rendering above could be made more interesting by adding support for materials. The imported OBJ objects provide
simplified Alias Wavefront material definitions.

## raytrace.rchit

These materials define their basic reflectance properties using simple color coefficients, and also support texturing.
The buffer containing the materials has already been created for rasterization, and has also been added into the ray
tracing descriptor set. Add the binding of the material buffer and the array of texture samplers:

```` C
layout(binding = 1, set = 1, scalar) buffer MatColorBufferObject { WaveFrontMaterial m[]; } materials[];
layout(binding = 3, set = 1) uniform sampler2D textureSamplers[];
layout(binding = 4, set = 1)  buffer MatIndexColorBuffer { int i[]; } matIndex[];
````

The declaration of the material is the same as that used for the rasterizer and is defined in
`wavefront.glsl`.

The `Vertex` structure contains a material index, which we will use to find the corresponding material in the buffer.

We first remove these lines at the end of `main()`

```` C
float dotNL = max(dot(normal, L), 0.2);
prd.hitValue = vec3(dotNL);
````

and fetch the material definition instead:

```` C
  // Material of the object
  int               matIdx = matIndex[nonuniformEXT(objId)].i[gl_PrimitiveID];
  WaveFrontMaterial mat    = materials[nonuniformEXT(objId)].m[matIdx];
````

!!! Note Note
    There is one buffer of materials per object, and each material can be access via the index.
    And each triangle has an index of material.

From that material definition, we use the diffuse and specular reflectances to compute diffuse lighting. This code also
supports textures to modulate the surface albedo.

```` C
  // Diffuse
  vec3 diffuse = computeDiffuse(mat, L, normal);
  if(mat.textureId >= 0)
  {
    uint txtId = mat.textureId + scnDesc.i[gl_InstanceID].txtOffset;
    vec2 texCoord =
        v0.texCoord * barycentrics.x + v1.texCoord * barycentrics.y + v2.texCoord * barycentrics.z;
    diffuse *= texture(textureSamplers[nonuniformEXT(txtId)], texCoord).xyz;
  }

  // Specular
  vec3 specular = computeSpecular(mat, gl_WorldRayDirectionEXT, L, normal);
````

The final lighting is then computed as

```` C
  prd.hitValue = vec3(lightIntensity * (diffuse + specular));
````

![](Images/resultRaytraceLightMatCube.png width="350px")


## main

The OBJ model is loaded in `main.cpp` by calling `helloVk.loadModel`. Let's load something more interesting than a cube:

```` C
  // Creation of the example
  helloVk.loadModel(nvh::findFile("media/scenes/Medieval_building.obj", defaultSearchPaths, true));
  helloVk.loadModel(nvh::findFile("media/scenes/plane.obj", defaultSearchPaths, true));
````

Since that model is larger, we can change the `CameraManip.setLookat` call to

```` C
CameraManip.setLookat(nvmath::vec3f(4, 4, 4), nvmath::vec3f(0, 1, 0), nvmath::vec3f(0, 1, 0));
````

![](Images/resultRaytraceLightMatMedieval.png)

# Shadows

The above allows us to ray trace a scene and apply some lighting, but it is still missing shadows. To this end, we will
add a new ray type, and shoot rays from the closest hit shader. This new ray type will require adding a new miss shader.

## `createRaytracingPipeline`

For simple shadow rays we only need to compute whether some geometry was hit along the ray or not. This can be achieved
using a Boolean payload initialized as if a hit were found, and ray trace using only an additional miss shader that will
set the payload to no hit.

!!! Warning: [Download Shadow Shader](files/shadowShaders.zip)
    Download and add shader file

This archive contains only one file, `raytraceShadow.rmiss`. Add this file to the `src/shaders` directory and rerun
CMake. The shader file should compile, and the resulting SPIR-V file should be stored in the `shaders` folder alongside
the GLSL file.

In the body of `createRtPipeline`, we need to define the new miss shader right after the previous miss shader:

```` C
  // The second miss shader is invoked when a shadow ray misses the geometry. It
  // simply indicates that no occlusion has been found
  vk::ShaderModule shadowmissSM =
      nvvk::createShaderModule(m_device,
                               nvh::loadFile("shaders/raytraceShadow.rmiss.spv", true, paths, true));

````

After pushing the miss shader `missSM`, we also push the miss shader for the shadow rays:

```` C
  // Shadow Miss
  stages.push_back({{}, vk::ShaderStageFlagBits::eMissKHR, shadowmissSM, "main"});
  mg.setGeneralShader(static_cast<uint32_t>(stages.size() - 1));
  m_rtShaderGroups.push_back(mg);
````

The pipeline now has to allow shooting rays from the closest hit program, which requires increasing the recursion level to 2:

```` C
  // The ray tracing process can shoot rays from the camera, and a shadow ray can be shot from the
  // hit points of the camera rays, hence a recursion level of 2. This number should be kept as low
  // as possible for performance reasons. Even recursive ray tracing should be flattened into a loop
  // in the ray generation to avoid deep recursion.
  rayPipelineInfo.setMaxPipelineRayRecursionDepth(2);  // Ray depth
````

At the end of the method, we destroy the shader module for the shadow miss shader:

```` C
  m_device.destroy(shadowmissSM);
````

## `traceRaysKHR`

The addition of the new miss shader group has modified our shader binding table, which now looks like:

******************
*+--------------+*
*| RayGen       |*
*| Handle       |*
*+--------------+*
*| Miss         |*
*| Handle       |*
*+--------------+*
*| ShadowMiss   |*
*| Handle       |*
*+--------------+*
*| HitGroup     |*
*| Handle       |*
*+--------------+*
******************

Therefore, we have to change `HelloVulkan::raytrace` to adjust the the closest hit offset before calling `traceRaysKHR`.
This also points out that in real-world applications the SBT should be embedded so that it can handle those offsets
automatically.

```` C
  vk::DeviceSize hitGroupOffset = 3u * progSize;  // Jump over the raygen and 2 miss shaders
````

## `createRtDescriptorSet`

For each resource entry in the descriptor set, we indicated which shader stage would be able to use it. Since shadow
rays will be traced from the closest hit shader, we add `vkSS::eClosestHitKHR` to the acceleration structure binding:

```` C
  // Top-level acceleration structure, usable by both the ray generation and the closest hit (to
  // shoot shadow rays)
  m_rtDescSetLayoutBind.emplace_back(
      vkDSLB(0, vkDT::eAccelerationStructureKHR, 1, vkSS::eRaygenKHR | vkSS::eClosestHitKHR));  // TLAS
````

## `raytrace.rchit`

The closest hit shader now needs to be aware of the acceleration structure to be able to shoot rays:

```` C
layout(binding = 0, set = 0) uniform accelerationStructureEXT topLevelAS;
````

Those rays will also carry a payload, which will need to be defined at a different location from the payload of the
current ray. In this case, the payload will be a simple Boolean value indicating whether an occluder has been found or
not:

```` C
layout(location = 1) rayPayloadEXT bool isShadowed;
````

In the `main` function, instead of simply setting our payload to `prd.hitValue = c;`, we will initiate a new ray. Note that
the index of the miss shader is now 1, since the SBT has 2 miss shaders. The payload location is defined to match
the declaration `layout(location = 1)` above. Note, when invoking `traceRayEXT()`  we are setting
the flags with

* `gl_RayFlagsSkipClosestHitShaderKHR`: Will not invoke the hit shader, only the miss shader
* `gl_RayFlagsOpaqueKHR` : Will not call the any hit shader, so all objects will be opaque
* `gl_RayFlagsTerminateOnFirstHitKHR` : The first hit is always good.

Since we skip the shadow hit group, no code will be invoked when hitting a surface. Therefore, we initialize the payload
`isShadowed` to `true`, and will rely on the miss shader to set it to false if no surfaces have been encountered. We
also set the ray flags to optimize the ray tracing: since these simple shadow rays only need to return whether the ray
intersects any surface, we can instruct the ray tracing engine to stop the traversal after finding the first
intersection, without trying to execute a closest hit shader.

Shadow rays only need to be cast if the light is in front of the surface, and specular lighting should not be computed
if we are in shadow (since the light source won't be visible from the shading point). The code that previously computed
the specular term will then look like this:

```` C
  vec3  specular    = vec3(0);
  float attenuation = 1;

  // Tracing shadow ray only if the light is visible from the surface
  if(dot(normal, L) > 0)
  {
    float tMin   = 0.001;
    float tMax   = lightDistance;
    vec3  origin = gl_WorldRayOriginEXT + gl_WorldRayDirectionEXT * gl_HitTEXT;
    vec3  rayDir = L;
    uint  flags =
        gl_RayFlagsTerminateOnFirstHitEXT | gl_RayFlagsOpaqueEXT | gl_RayFlagsSkipClosestHitShaderEXT;
    isShadowed = true;
    traceRayEXT(topLevelAS,  // acceleration structure
            flags,       // rayFlags
            0xFF,        // cullMask
            0,           // sbtRecordOffset
            0,           // sbtRecordStride
            1,           // missIndex
            origin,      // ray origin
            tMin,        // ray min range
            rayDir,      // ray direction
            tMax,        // ray max range
            1            // payload (location = 1)
    );

    if(isShadowed)
    {
      attenuation = 0.3;
    }
    else
    {
      // Specular
      specular = computeSpecular(mat, gl_WorldRayDirectionEXT, L, normal);
    }
  }
````

The final payload value can then be adjusted depending on the result of the shadow ray:

```` C
prd.hitValue = vec3(lightIntensity * attenuation * (diffuse + specular));
````

![](Images/resultRaytraceShadowMedieval.png)

The final project can be found under the [ray_tracing__simple](https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR/tree/master/ray_tracing__simple) directory.


# Going Further

From this point on, you can continue creating your own ray types and shaders, and experiment
with more advanced ray tracing based algorithms.
</script>


----

<!-- Markdeep: -->
<script src="https://developer.nvidia.com/sites/default/files/akamai/gameworks/whitepapers/markdeep.min.js?" charset="utf-8"></script>
<script>
    window.alreadyProcessedMarkdeep || (document.body.style.visibility = "visible")
</script>