cleanup and refactoring

2024-05-25 11:53:25 +02:00 · 2024-05-25 11:53:25 +02:00 · 76f6bf62a4
commit 76f6bf62a4
parent 2302158928
1285 changed files with 757994 additions and 8 deletions
--- a/raytracer/nvpro_core/nvh/README.md
+++ b/raytracer/nvpro_core/nvh/README.md
@ -0,0 +1,603 @@
+## Table of Contents
+- [appwindowcamerainertia.hpp](#appwindowcamerainertiahpp)
+- [appwindowprofiler.hpp](#appwindowprofilerhpp)
+- [bitarray.hpp](#bitarrayhpp)
+- [boundingbox.hpp](#boundingboxhpp)
+- [cameracontrol.hpp](#cameracontrolhpp)
+- [camerainertia.hpp](#camerainertiahpp)
+- [cameramanipulator.hpp](#cameramanipulatorhpp)
+- [commandlineparser.hpp](#commandlineparserhpp)
+- [fileoperations.hpp](#fileoperationshpp)
+- [geometry.hpp](#geometryhpp)
+- [gltfscene.hpp](#gltfscenehpp)
+- [inputparser.h](#inputparserh)
+- [misc.hpp](#mischpp)
+- [nvml_monitor.hpp](#nvml_monitorhpp)
+- [nvprint.hpp](#nvprinthpp)
+- [parallel_work.hpp](#parallel_workhpp)
+- [parametertools.hpp](#parametertoolshpp)
+- [primitives.hpp](#primitiveshpp)
+- [profiler.hpp](#profilerhpp)
+- [radixsort.hpp](#radixsorthpp)
+- [shaderfilemanager.hpp](#shaderfilemanagerhpp)
+- [threading.hpp](#threadinghpp)
+- [timesampler.hpp](#timesamplerhpp)
+- [trangeallocator.hpp](#trangeallocatorhpp)
+
+## appwindowcamerainertia.hpp
+### class AppWindowCameraInertia
+
+>  AppWindowCameraInertia is a Window base for samples, adding a camera with inertia
+
+It derives the Window for this sample
+
+## appwindowprofiler.hpp
+### class nvh::AppWindowProfiler
+
+nvh::AppWindowProfiler provides an alternative utility wrapper class around NVPWindow.
+It is useful to derive single-window applications from and is used by some
+but not all nvpro-samples.
+
+Further functionality is provided :
+- built-in profiler/timer reporting to console
+- command-line argument parsing as well as config file parsing using the ParameterTools
+see AppWindowProfiler::setupParameters() for built-in commands
+- benchmark/automation mode using ParameterTools
+- screenshot creation
+- logfile based on devicename (depends on context)
+- optional context/swapchain interface
+the derived classes nvvk/appwindowprofiler_vk and nvgl/appwindowprofiler_gl make use of this
+
+## bitarray.hpp
+### class nvh::BitArray
+
+
+> The nvh::BitArray class implements a tightly packed boolean array using single bits stored in uint64_t values.
+Whenever you want large boolean arrays this representation is preferred for cache-efficiency.
+The Visitor and OffsetVisitor traversal mechanisms make use of cpu intrinsics to speed up iteration over bits.
+
+Example:
+```cpp
+BitArray modifiedObjects(1024);
+
+// set some bits
+modifiedObjects.setBit(24,true);
+modifiedObjects.setBit(37,true);
+
+// iterate over all set bits using the built-in traversal mechanism
+
+struct MyVisitor {
+void operator()( size_t index ){
+// called with the index of a set bit
+myObjects[index].update();
+}
+};
+
+MyVisitor visitor;
+modifiedObjects.traverseBits(visitor);
+```
+
+## boundingbox.hpp
+
+```nvh::Bbox``` is a class to create bounding boxes.
+It grows by adding 3d vector, can combine other bound boxes.
+And it returns information, like its volume, its center, the min, max, etc..
+
+
+## cameracontrol.hpp
+### class nvh::CameraControl
+
+
+> nvh::CameraControl is a utility class to create a viewmatrix based on mouse inputs.
+
+It can operate in perspective or orthographic mode (`m_sceneOrtho==true`).
+
+perspective:
+- LMB: rotate
+- RMB or WHEEL: zoom via dolly movement
+- MMB: pan/move within camera plane
+
+ortho:
+- LMB: pan/move within camera plane
+- RMB or WHEEL: zoom via dolly movement, application needs to use `m_sceneOrthoZoom` for projection matrix adjustment
+- MMB: rotate
+
+The camera can be orbiting (`m_useOrbit==true`) around `m_sceneOrbit` or
+otherwise provide "first person/fly through"-like controls.
+
+Speed of movement/rotation etc. is influenced by `m_sceneDimension` as well as the
+sensitivity values.
+
+## camerainertia.hpp
+### struct InertiaCamera
+
+>  Struct that offers a camera moving with some inertia effect around a target point
+
+InertiaCamera exposes a mix of pseudo polar rotation around a target point and
+some other movements to translate the target point, zoom in and out.
+
+Either the keyboard or mouse can be used for all of the moves.
+
+## cameramanipulator.hpp
+### class nvh::CameraManipulator
+
+
+nvh::CameraManipulator is a camera manipulator help class
+It allow to simply do
+- Orbit        (LMB)
+- Pan          (LMB + CTRL  | MMB)
+- Dolly        (LMB + SHIFT | RMB)
+- Look Around  (LMB + ALT   | LMB + CTRL + SHIFT)
+
+In a various ways:
+- examiner(orbit around object)
+- walk (look up or down but stays on a plane)
+- fly ( go toward the interest point)
+
+Do use the camera manipulator, you need to do the following
+- Call setWindowSize() at creation of the application and when the window size change
+- Call setLookat() at creation to initialize the camera look position
+- Call setMousePosition() on application mouse down
+- Call mouseMove() on application mouse move
+
+Retrieve the camera matrix by calling getMatrix()
+
+See: appbase_vkpp.hpp
+
+Note: There is a singleton `CameraManip` which can be use across the entire application
+
+```cpp
+// Retrieve/set camera information
+CameraManip.getLookat(eye, center, up);
+CameraManip.setLookat(eye, center, glm::vec3(m_upVector == 0, m_upVector == 1, m_upVector == 2));
+CameraManip.getFov();
+CameraManip.setSpeed(navSpeed);
+CameraManip.setMode(navMode == 0 ? nvh::CameraManipulator::Examine : nvh::CameraManipulator::Fly);
+// On mouse down, keep mouse coordinates
+CameraManip.setMousePosition(x, y);
+// On mouse move and mouse button down
+if(m_inputs.lmb || m_inputs.rmb || m_inputs.mmb)
+{
+CameraManip.mouseMove(x, y, m_inputs);
+}
+// Wheel changes the FOV
+CameraManip.wheel(delta > 0 ? 1 : -1, m_inputs);
+// Retrieve the matrix to push to the shader
+m_ubo.view = CameraManip.getMatrix();
+````
+
+
+## commandlineparser.hpp
+Command line parser.
+```cpp
+std::string inFilename = "";
+bool printHelp = false;
+CommandLineParser args("Test Parser");
+args.addArgument({"-f", "--filename"}, &inFilename, "Input filename");
+args.addArgument({"-h", "--help"}, &printHelp, "Print Help");
+bool result = args.parse(argc, argv);
+```
+
+## fileoperations.hpp
+### functions in nvh
+
+
+- nvh::fileExists : check if file exists
+- nvh::findFile : finds filename in provided search directories
+- nvh::loadFile : (multiple overloads) loads file as std::string, binary or text, can also search in provided directories
+- nvh::getFileName : splits filename from filename with path
+- nvh::getFilePath : splits filepath from filename with path
+
+## geometry.hpp
+### namespace nvh::geometry
+
+The geometry namespace provides a few procedural mesh primitives
+that are subdivided.
+
+nvh::geometry::Mesh template uses the provided TVertex which must have a
+constructor from nvh::geometry::Vertex. You can also use nvh::geometry::Vertex
+directly.
+
+It provides triangle indices, as well as outline line indices. The outline indices
+are typical feature lines (rectangle for plane, some circles for sphere/torus).
+
+All basic primitives are within -1,1 ranges along the axis they use
+
+- nvh::geometry::Plane (x,y subdivision)
+- nvh::geometry::Box (x,y,z subdivision, made of 6 planes)
+- nvh::geometry::Sphere (lat,long subdivision)
+- nvh::geometry::Torus (inner, outer circle subdivision)
+- nvh::geometry::RandomMengerSponge (subdivision, tree depth, probability)
+
+Example:
+
+```cpp
+// single primitive
+nvh::geometry::Box<nvh::geometry::Vertex> box(4,4,4);
+
+// construct from primitives
+
+```
+
+## gltfscene.hpp
+### `nvh::GltfScene`
+
+
+These utilities are for loading glTF models in a
+canonical scene representation. From this representation
+you would create the appropriate 3D API resources (buffers
+and textures).
+
+```cpp
+// Typical Usage
+// Load the GLTF Scene using TinyGLTF
+
+tinygltf::Model    gltfModel;
+tinygltf::TinyGLTF gltfContext;
+fileLoaded = gltfContext.LoadASCIIFromFile(&gltfModel, &error, &warn, m_filename);
+
+// Fill the data in the gltfScene
+gltfScene.getMaterials(tmodel);
+gltfScene.getDrawableNodes(tmodel, GltfAttributes::Normal | GltfAttributes::Texcoord_0);
+
+// Todo in App:
+//   create buffers for vertices and indices, from gltfScene.m_position, gltfScene.m_index
+//   create textures from images: using tinygltf directly
+//   create descriptorSet for material using directly gltfScene.m_materials
+```
+
+
+## inputparser.h
+### class InputParser
+
+> InputParser is a Simple command line parser
+
+Example of usage for: test.exe -f name.txt -size 200 100
+
+Parsing the command line: mandatory '-f' for the filename of the scene
+
+```cpp
+nvh::InputParser parser(argc, argv);
+std::string filename = parser.getString("-f");
+if(filename.empty())  filename = "default.txt";
+if(parser.exist("-size") {
+auto values = parser.getInt2("-size");
+```
+
+## misc.hpp
+### functions in nvh
+
+
+- mipMapLevels : compute number of mip maps
+- stringFormat : sprintf for std::string
+- frand : random float using rand()
+- permutation : fills uint vector with random permutation of values [0... vec.size-1]
+
+## nvml_monitor.hpp
+
+Capture the GPU load and memory for all GPUs on the system.
+
+Usage:
+- There should be only one instance of NvmlMonitor
+- call refresh() in each frame. It will not pull more measurement that the interval(ms)
+- isValid() : return if it can be used
+- nbGpu()   : return the number of GPU in the computer
+- getGpuInfo()     : static info about the GPU
+- getDeviceMemory() : memory consumption info
+- getDeviceUtilization() : GPU and memory utilization
+- getDevicePerformanceState() : clock speeds and throttle reasons
+- getDevicePowerState() : power, temperature and fan speed
+
+Measurements:
+- Uses a cycle buffer.
+- Offset is the last measurement
+
+
+## nvprint.hpp
+Multiple functions and macros that should be used for logging purposes,
+rather than printf. These can print to multiple places at once
+### Function nvprintf etc
+
+
+Configuration:
+- nvprintSetLevel : sets default loglevel
+- nvprintGetLevel : gets default loglevel
+- nvprintSetLogFileName : sets log filename
+- nvprintSetLogging : sets file logging state
+- nvprintSetCallback : sets custom callback
+
+Printf-style functions and macros.
+These take printf-style specifiers.
+- nvprintf : prints at default loglevel
+- nvprintfLevel : nvprintfLevel print at a certain loglevel
+- LOGI : macro that does nvprintfLevel(LOGLEVEL_INFO)
+- LOGW : macro that does nvprintfLevel(LOGLEVEL_WARNING)
+- LOGE : macro that does nvprintfLevel(LOGLEVEL_ERROR)
+- LOGE_FILELINE : macro that does nvprintfLevel(LOGLEVEL_ERROR) combined with filename/line
+- LOGD : macro that does nvprintfLevel(LOGLEVEL_DEBUG) (only in debug builds)
+- LOGOK : macro that does nvprintfLevel(LOGLEVEL_OK)
+- LOGSTATS : macro that does nvprintfLevel(LOGLEVEL_STATS)
+
+std::print-style functions and macros.
+These take std::format-style specifiers
+(https://en.cppreference.com/w/cpp/utility/format/formatter#Standard_format_specification).
+- nvprintLevel : print at a certain loglevel
+- PRINTI : macro that does nvprintLevel(LOGLEVEL_INFO)
+- PRINTW : macro that does nvprintLevel(LOGLEVEL_WARNING)
+- PRINTE : macro that does nvprintLevel(LOGLEVEL_ERROR)
+- PRINTE_FILELINE : macro that does nvprintLevel(LOGLEVEL_ERROR) combined with filename/line
+- PRINTD : macro that does nvprintLevel(LOGLEVEL_DEBUG) (only in debug builds)
+- PRINTOK : macro that does nvprintLevel(LOGLEVEL_OK)
+- PRINTSTATS : macro that does nvprintLevel(LOGLEVEL_STATS)
+
+Safety:
+On error, all functions print an error message.
+All functions are thread-safe.
+Printf-style functions have annotations that should produce warnings at
+compile-time or when performing static analysis. Their format strings may be
+dynamic - but this can be bad if an adversary can choose the content of the
+format string.
+std::print-style functions are safer: they produce compile-time errors, and
+their format strings must be compile-time constants. Dynamic formatting
+should be performed outside of printing, like this:
+```cpp
+ImGui::InputText("Enter a format string: ", userFormat, sizeof(userFormat));
+try
+{
+std::string formatted = fmt::vformat(userFormat, ...);
+}
+catch (const std::exception& e)
+{
+(error handling...)
+}
+PRINTI("{}", formatted);
+```
+
+Text encoding:
+Printing to the Windows debug console is the only operation that assumes a
+text encoding, which is ANSI. In all other cases, strings are copied into
+the output.
+
+## parallel_work.hpp
+Distributes batches of loops over BATCHSIZE items across multiple threads. numItems reflects the total number
+of items to process.
+
+batches: fn (uint64_t itemIndex, uint32_t threadIndex)
+callback does single item
+ranges:  fn (uint64_t itemBegin, uint64_t itemEnd, uint32_t threadIndex)
+callback does loop `for (uint64_t itemIndex = itemBegin; itemIndex < itemEnd; itemIndex++)`
+
+
+## parametertools.hpp
+### class nvh::ParameterList
+
+
+The nvh::ParameterList helps parsing commandline arguments
+or commandline arguments stored within ascii config files.
+
+Parameters always update the values they point to, and optionally
+can trigger a callback that can be provided per-parameter.
+
+```cpp
+ParameterList list;
+std::string   modelFilename;
+float         modelScale;
+
+list.addFilename(".gltf|model filename", &modelFilename);
+list.add("scale|model scale", &modelScale);
+
+list.applyTokens(3, {"blah.gltf","-scale","4"}, "-", "/assets/");
+```
+
+Use in combination with the ParameterSequence class to iterate
+sequences of parameter changes for benchmarking/automation.
+### class nvh::ParameterSequence
+
+
+The nvh::ParameterSequence processes provided tokens in sequences.
+The sequences are terminated by a special "separator" token.
+All tokens between the last iteration and the separator are applied
+to the provided ParameterList.
+Useful to process commands in sequences (automation, benchmarking etc.).
+
+Example:
+
+```cpp
+ParameterSequence sequence;
+ParameterList     list;
+int               mode;
+list.add("mode", &mode);
+
+std::vector<const char*> tokens;
+ParameterList::tokenizeString("benchmark simple -mode 10 benchmark complex -mode 20", tokens);
+sequence.init(&list, tokens);
+
+// 1 means our separator is followed by one argument (simple/complex)
+// "-" as parameters in the string are prefixed with -
+
+while(!sequence.advanceIteration("benchmark", 1, "-")) {
+printf("%d %s mode %d\n", sequence.getIteration(), sequence.getSeparatorArg(0), mode);
+}
+
+// would print:
+//   0 simple mode 10
+//   1 complex mode 20
+```
+
+## primitives.hpp
+### struct `nvh::PrimitiveMesh`
+
+- Common primitive type, made of vertices: position, normal and texture coordinates.
+- All primitives are triangles, and each 3 indices is forming a triangle.
+
+### struct `nvh::Node`
+
+- Structure to hold a reference to a mesh, with a material and transformation.
+
+Primitives that can be created:
+* Tetrahedron
+* Icosahedron
+* Octahedron
+* Plane
+* Cube
+* SphereUv
+* Cone
+* SphereMesh
+* Torus
+
+Node creator: returns the instance and the position
+* MengerSponge
+* SunFlower
+
+Other utilities
+* mergeNodes
+* removeDuplicateVertices
+* wobblePrimitive
+
+
+## profiler.hpp
+### class nvh::Profiler
+
+
+> The nvh::Profiler class is designed to measure timed sections.
+
+Each section has a cpu and gpu time. Gpu times are typically provided
+by derived classes for each individual api (e.g. OpenGL, Vulkan etc.).
+
+There is functionality to pretty print the sections with their nesting level.
+Multiple profilers can reference the same database, so one profiler
+can serve as master that they others contribute to. Typically the
+base class measuring only CPU time could be the master, and the api
+derived classes reference it to share the same database.
+
+Profiler::Clock can be used standalone for time measuring.
+
+## radixsort.hpp
+### function nvh::radixsort
+
+
+The radixsort function sorts the provided keys based on
+BYTES many bytes stored inside TKey starting at BYTEOFFSET.
+The sorting result is returned as indices into the keys array.
+
+For example:
+
+```cpp
+struct MyData {
+uint32_t objectIdentifier;
+uint16_t objectSortKey;
+};
+
+
+// 4-byte offset of objectSortKey within MyData
+// 2-byte size of sorting key
+
+result = radixsort<4,2>(keys, indicesIn, indicesTemp);
+
+// after sorting the following is true
+
+keys[result[i]].objectSortKey < keys[result[i + 1]].objectSortKey
+
+// result can point either to indicesIn or indicesTemp (we swap the arrays
+// after each byte iteration)
+```
+
+## shaderfilemanager.hpp
+### class nvh::ShaderFileManager
+
+
+The nvh::ShaderFileManager class is meant to be derived from to create the actual api-specific
+shader/program managers.
+
+The ShaderFileManager provides a system to find/load shader files.
+It also allows resolving #include instructions in HLSL/GLSL source files.
+Such includes can be registered before pointing to strings in memory.
+
+If m_handleIncludePasting is true, then `#include`s are replaced by
+the include file contents (recursively) before presenting the
+loaded shader source code to the caller. Otherwise, the include file
+loader is still available but `#include`s are left unchanged.
+
+Furthermore it handles injecting prepended strings (typically used
+for #defines) after the #version statement of GLSL files,
+regardless of m_handleIncludePasting's value.
+
+
+## threading.hpp
+### class nvh::delayed_call 
+
+Class returned by delay_noreturn_for to track the thread created and possibly reset the
+delay timer.
+Delay a call to a void function for sleep_duration.
+
+`return`: A delayed_call object that holds the running thread.
+
+Example:
+```cpp
+// Create or update a delayed call to callback. Useful to consolidate multiple events into one call.
+if(!m_delayedCall.delay_for(delay))
+m_delayedCall = nvh::delay_noreturn_for(delay, callback);
+```
+
+## timesampler.hpp
+### struct TimeSampler
+
+TimeSampler does time sampling work
+### struct nvh::Stopwatch
+
+> Timer in milliseconds.
+
+Starts the timer at creation and the elapsed time is retrieved by calling `elapsed()`.
+The timer can be reset if it needs to start timing later in the code execution.
+
+Usage:
+````cpp
+{
+nvh::Stopwatch sw;
+... work ...
+LOGI("Elapsed: %f ms\n", sw.elapsed()); // --> Elapsed: 128.157 ms
+}
+````
+
+## trangeallocator.hpp
+### class nvh::TRangeAllocator
+
+
+The nvh::TRangeAllocator<GRANULARITY> template allows to sub-allocate ranges from a fixed
+maximum size. Ranges are allocated at GRANULARITY and are merged back on freeing.
+Its primary use is within allocators that sub-allocate from fixed-size blocks.
+
+The implementation is based on [MakeID by Emil Persson](http://www.humus.name/3D/MakeID.h).
+
+Example :
+
+```cpp
+TRangeAllocator<256> range;
+
+// initialize to a certain range
+range.init(range.alignedSize(128 * 1024 * 1024));
+
+...
+
+// allocate a sub range
+// example
+uint32_t size = vertexBufferSize;
+uint32_t alignment = vertexAlignment;
+
+uint32_t allocOffset;
+uint32_t allocSize;
+uint32_t alignedOffset;
+
+if (range.subAllocate(size, alignment, allocOffset, alignedOffset, allocSize)) {
+... use the allocation space
+// [alignedOffset + size] is guaranteed to be within [allocOffset + allocSize]
+}
+
+// give back the memory range for re-use
+range.subFree(allocOffset, allocSize);
+
+...
+
+// at the end cleanup
+range.deinit();
+```
--- a/raytracer/nvpro_core/nvh/alignment.hpp
+++ b/raytracer/nvpro_core/nvh/alignment.hpp
@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+/// @DOC_SKIP (keyword to exclude this file from automatic README.md generation)
+
+#pragma once
+
+#ifndef NVH_ALIGNEMENT_HPP
+#define NVH_ALIGNEMENT_HPP 1
+
+#include <stddef.h>  // for size_t
+
+namespace nvh {
+template <class integral>
+constexpr bool is_aligned(integral x, size_t a) noexcept
+{
+  return (x & (integral(a) - 1)) == 0;
+}
+
+template <class integral>
+constexpr integral align_up(integral x, size_t a) noexcept
+{
+  return integral((x + (integral(a) - 1)) & ~integral(a - 1));
+}
+
+template <class integral>
+constexpr integral align_down(integral x, size_t a) noexcept
+{
+  return integral(x & ~integral(a - 1));
+}
+}  // namespace nvh
+
+#endif  // !NVH_ALIGNEMENT_HPP
--- a/raytracer/nvpro_core/nvh/appwindowcamerainertia.hpp
+++ b/raytracer/nvpro_core/nvh/appwindowcamerainertia.hpp
@ -0,0 +1,429 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+//--------------------------------------------------------------------
+#include <nvpwindow.hpp>
+#ifdef WIN32
+#include <windows.h>
+#endif
+
+#include "nvh/camerainertia.hpp"
+#include "nvh/timesampler.hpp"
+#include <imgui/imgui_helper.h>
+
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+#include "nvh/nsightevents.h"
+#else
+// Note: they are defined inside "nsightevents.h"
+// but let's define them again here as empty defines for the case when NSIGHT is not needed at all
+#define NX_RANGE int
+#define NX_MARK(name)
+#define NX_RANGESTART(name) 0
+#define NX_RANGEEND(id)
+#define NX_RANGEPUSH(name)
+#define NX_RANGEPUSHCOL(name, c)
+#define NX_RANGEPOP()
+#define NXPROFILEFUNC(name)
+#define NXPROFILEFUNCCOL(name, c)
+#define NXPROFILEFUNCCOL2(name, c, a)
+#endif
+
+#include <map>
+using std::map;
+
+#define KEYTAU 0.10f
+//-----------------------------------------------------------------------------
+// GLOBALS
+//-----------------------------------------------------------------------------
+#ifndef WIN32
+struct POINT
+{
+  int x;
+  int y;
+};
+#endif
+struct ToggleInfo
+{
+  bool*       p;
+  bool        addToUI;
+  std::string desc;
+};
+#ifdef WINDOWINERTIACAMERA_EXTERN
+extern std::map<char, ToggleInfo> g_toggleMap;
+#else
+std::map<char, ToggleInfo> g_toggleMap;
+#endif
+inline void addToggleKey(char c, bool* target, const char* desc, bool addToUI = true)
+{
+  LOGI("%s", desc);
+  g_toggleMap[c].desc    = desc;
+  g_toggleMap[c].p       = target;
+  g_toggleMap[c].addToUI = addToUI;
+}
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+inline void DrawToggles()
+{
+  for(auto& it : g_toggleMap)
+  {
+    if(!it.second.addToUI)
+      continue;
+    bool* pB        = it.second.p;
+    bool  prevValue = *pB;
+    ImGui::Checkbox(it.second.desc.c_str(), pB);
+  }
+}
+
+/* @DOC_START
+# class AppWindowCameraInertia
+>  AppWindowCameraInertia is a Window base for samples, adding a camera with inertia
+
+It derives the Window for this sample
+@DOC_END */
+class AppWindowCameraInertia : public NVPWindow
+{
+public:
+  AppWindowCameraInertia(const glm::vec3 eye    = glm::vec3(0.0f, 1.0f, -3.0f),
+                         const glm::vec3 focus  = glm::vec3(0, 0, 0),
+                         const glm::vec3 object = glm::vec3(0, 0, 0),
+                         float           fov_   = 50.0,
+                         float           near_  = 0.01f,
+                         float           far_   = 10.0)
+      : m_camera(eye, focus, object)
+  {
+    m_renderCnt          = 1;
+    m_bCameraMode        = true;
+    m_bContinue          = true;
+    m_moveStep           = 0.2f;
+    m_ptLastMousePosit.x = m_ptLastMousePosit.y = 0;
+    m_ptCurrentMousePosit.x = m_ptCurrentMousePosit.y = 0;
+    m_ptOriginalMousePosit.x = m_ptOriginalMousePosit.y = 0;
+    m_bMousing                                          = false;
+    m_bRMousing                                         = false;
+    m_bMMousing                                         = false;
+    m_bNewTiming                                        = false;
+    m_bAdjustTimeScale                                  = true;
+    m_fov                                               = fov_;
+    m_near                                              = near_;
+    m_far                                               = far_;
+  }
+
+  bool  m_bCameraMode;
+  bool  m_bContinue;
+  float m_moveStep;
+  POINT m_ptLastMousePosit;
+  POINT m_ptCurrentMousePosit;
+  POINT m_ptOriginalMousePosit;
+  bool  m_bMousing;
+  bool  m_bRMousing;
+  bool  m_bMMousing;
+  bool  m_bNewTiming;
+  bool  m_bAdjustTimeScale;
+
+  int           m_renderCnt;
+  TimeSampler   m_realtime;
+  bool          m_timingGlitch;
+  InertiaCamera m_camera;
+  glm::mat4     m_projection;
+  float         m_fov, m_near, m_far;
+
+public:
+  inline glm::mat4& projMat() { return m_projection; }
+  inline glm::mat4& viewMat() { return m_camera.m4_view; }
+  inline bool&      nonStopRendering() { return m_realtime.bNonStopRendering; }
+
+  bool open(int posX, int posY, int width, int height, const char* title, bool requireGLContext) override;
+
+  virtual void onWindowClose() override;
+  virtual void onWindowResize(int w, int h) override;
+  virtual void onWindowRefresh() override;
+  virtual void onMouseMotion(int x, int y) override;
+  virtual void onMouseWheel(int delta) override;
+  virtual void onMouseButton(NVPWindow::MouseButton button, ButtonAction action, int mods, int x, int y) override;
+  virtual void onKeyboard(AppWindowCameraInertia::KeyCode key, ButtonAction action, int mods, int x, int y) override;
+  virtual void onKeyboardChar(unsigned char key, int mods, int x, int y) override;
+
+  virtual int idle();
+
+  const char* getHelpText(int* lines = NULL)
+  {
+    if(lines)
+      *lines = 7;
+    return "Left mouse button: rotate around the target\n"
+           "Right mouse button: translate target forward backward (+ Y axis rotate)\n"
+           "Middle mouse button: Pan target along view plane\n"
+           "Mouse wheel or PgUp/PgDn: zoom in/out\n"
+           "Arrow keys: rotate around the target\n"
+           "Ctrl+Arrow keys: Pan target\n"
+           "Ctrl+PgUp/PgDn: translate target forward/backward\n";
+  }
+};
+#ifndef WINDOWINERTIACAMERA_EXTERN
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+bool AppWindowCameraInertia::open(int posX, int posY, int width, int height, const char* title, bool requireGLContext)
+{
+  m_realtime.bNonStopRendering = true;
+
+  float r      = (float)width / (float)height;
+  m_projection = glm::perspective(glm::radians(m_fov), r, m_near, m_far);
+
+  ImGuiH::Init(width, height, this);
+  return NVPWindow::open(posX, posY, width, height, title, requireGLContext);
+}
+
+void AppWindowCameraInertia::onWindowClose()
+{
+  ImGuiH::Deinit();
+}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+#define CAMERATAU 0.03f
+void AppWindowCameraInertia::onMouseMotion(int x, int y)
+{
+  m_ptCurrentMousePosit.x = x;
+  m_ptCurrentMousePosit.y = y;
+  if(ImGuiH::mouse_pos(x, y))
+    return;
+  //---------------------------- LEFT
+  if(m_bMousing)
+  {
+    float hval   = 2.0f * (float)(m_ptCurrentMousePosit.x - m_ptLastMousePosit.x) / (float)getWidth();
+    float vval   = 2.0f * (float)(m_ptCurrentMousePosit.y - m_ptLastMousePosit.y) / (float)getHeight();
+    m_camera.tau = CAMERATAU;
+    m_camera.rotateH(hval);
+    m_camera.rotateV(vval);
+    m_renderCnt++;
+  }
+  //---------------------------- MIDDLE
+  if(m_bMMousing)
+  {
+    float hval   = 2.0f * (float)(m_ptCurrentMousePosit.x - m_ptLastMousePosit.x) / (float)getWidth();
+    float vval   = 2.0f * (float)(m_ptCurrentMousePosit.y - m_ptLastMousePosit.y) / (float)getHeight();
+    m_camera.tau = CAMERATAU;
+    m_camera.rotateH(hval, true);
+    m_camera.rotateV(vval, true);
+    m_renderCnt++;
+  }
+  //---------------------------- RIGHT
+  if(m_bRMousing)
+  {
+    float hval   = 2.0f * (float)(m_ptCurrentMousePosit.x - m_ptLastMousePosit.x) / (float)getWidth();
+    float vval   = -2.0f * (float)(m_ptCurrentMousePosit.y - m_ptLastMousePosit.y) / (float)getHeight();
+    m_camera.tau = CAMERATAU;
+    m_camera.rotateH(hval, !!(getKeyModifiers() & KMOD_CONTROL));
+    m_camera.move(vval, !!(getKeyModifiers() & KMOD_CONTROL));
+    m_renderCnt++;
+  }
+
+  m_ptLastMousePosit.x = m_ptCurrentMousePosit.x;
+  m_ptLastMousePosit.y = m_ptCurrentMousePosit.y;
+}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onMouseWheel(int delta)
+{
+  if(ImGuiH::mouse_wheel(delta))
+    return;
+  m_camera.tau = KEYTAU;
+  m_camera.move(delta > 0 ? m_moveStep : -m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+  m_renderCnt++;
+}
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onMouseButton(NVPWindow::MouseButton button, NVPWindow::ButtonAction state, int mods, int x, int y)
+{
+  if(ImGuiH::mouse_button(button, state))
+    return;
+  switch(button)
+  {
+    case NVPWindow::MOUSE_BUTTON_LEFT:
+      if(state == NVPWindow::BUTTON_PRESS)
+      {
+        m_renderCnt++;
+        // TODO: equivalent of glfwSetInputMode(window, GLFW_CURSOR, GLFW_CURSOR_DISABLED/NORMAL);
+        m_bMousing = true;
+        m_renderCnt++;
+        if(getKeyModifiers() & KMOD_CONTROL)
+        {
+        }
+        else if(getKeyModifiers() & KMOD_SHIFT)
+        {
+        }
+      }
+      else
+      {
+        m_bMousing = false;
+        m_renderCnt++;
+      }
+      break;
+    case NVPWindow::MOUSE_BUTTON_RIGHT:
+      if(state == NVPWindow::BUTTON_PRESS)
+      {
+        m_ptLastMousePosit.x = m_ptCurrentMousePosit.x = x;
+        m_ptLastMousePosit.y = m_ptCurrentMousePosit.y = y;
+        m_bRMousing                                    = true;
+        m_renderCnt++;
+        if(getKeyModifiers() & KMOD_CONTROL)
+        {
+        }
+      }
+      else
+      {
+        m_bRMousing = false;
+        m_renderCnt++;
+      }
+      break;
+    case NVPWindow::MOUSE_BUTTON_MIDDLE:
+      if(state == NVPWindow::BUTTON_PRESS)
+      {
+        m_ptLastMousePosit.x = m_ptCurrentMousePosit.x = x;
+        m_ptLastMousePosit.y = m_ptCurrentMousePosit.y = y;
+        m_bMMousing                                    = true;
+        m_renderCnt++;
+      }
+      else
+      {
+        m_bMMousing = false;
+        m_renderCnt++;
+      }
+      break;
+  }
+}
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onKeyboard(NVPWindow::KeyCode key, NVPWindow::ButtonAction action, int mods, int x, int y)
+{
+  m_renderCnt++;
+  if(ImGuiH::key_button(key, action, mods))
+    return;
+  if(action == NVPWindow::BUTTON_RELEASE)
+    return;
+  switch(key)
+  {
+    case NVPWindow::KEY_F1:
+      break;
+    case NVPWindow::KEY_F2:
+      break;
+    case NVPWindow::KEY_F3:
+    case NVPWindow::KEY_F4:
+    case NVPWindow::KEY_F5:
+    case NVPWindow::KEY_F6:
+    case NVPWindow::KEY_F7:
+    case NVPWindow::KEY_F8:
+    case NVPWindow::KEY_F9:
+    case NVPWindow::KEY_F10:
+    case NVPWindow::KEY_F11:
+      break;
+    case NVPWindow::KEY_F12:
+      break;
+    case NVPWindow::KEY_LEFT:
+      m_camera.tau = KEYTAU;
+      m_camera.rotateH(m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_UP:
+      m_camera.tau = KEYTAU;
+      m_camera.rotateV(m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_RIGHT:
+      m_camera.tau = KEYTAU;
+      m_camera.rotateH(-m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_DOWN:
+      m_camera.tau = KEYTAU;
+      m_camera.rotateV(-m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_PAGE_UP:
+      m_camera.tau = KEYTAU;
+      m_camera.move(m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_PAGE_DOWN:
+      m_camera.tau = KEYTAU;
+      m_camera.move(-m_moveStep, !!(getKeyModifiers() & KMOD_CONTROL));
+      break;
+    case NVPWindow::KEY_ESCAPE:
+      close();
+      break;
+  }
+}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onKeyboardChar(unsigned char key, int mods, int x, int y)
+{
+  m_renderCnt++;
+  if(ImGuiH::key_char(key))
+    return;
+  // check registered toggles
+  auto it = g_toggleMap.find(key);
+  if(it != g_toggleMap.end())
+  {
+    it->second.p[0] = it->second.p[0] ? false : true;
+  }
+}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+int AppWindowCameraInertia::idle()
+{
+  //
+  // Camera motion
+  //
+  m_bContinue = m_camera.update((float)m_realtime.getFrameDT());
+  //
+  // time sampling
+  //
+  m_realtime.update(m_bContinue, &m_timingGlitch);
+  //
+  // if requested: trigger again the next frame for rendering
+  //
+  if(m_bContinue || m_realtime.bNonStopRendering)
+    m_renderCnt++;
+  return m_renderCnt;
+}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onWindowRefresh() {}
+
+//------------------------------------------------------------------------------
+//
+//------------------------------------------------------------------------------
+void AppWindowCameraInertia::onWindowResize(int w, int h)
+{
+  NVPWindow::onWindowResize(w, h);
+  auto& imgui_io       = ImGui::GetIO();
+  imgui_io.DisplaySize = ImVec2(float(w), float(h));
+
+  float r      = (float)w / (float)h;
+  m_projection = glm::perspective(glm::radians(m_fov), r, m_near, m_far);
+  m_renderCnt++;
+}
+#endif  //WINDOWINERTIACAMERA_EXTERN
--- a/raytracer/nvpro_core/nvh/appwindowprofiler.cpp
+++ b/raytracer/nvpro_core/nvh/appwindowprofiler.cpp
@ -0,0 +1,593 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifdef _WIN32
+#ifndef NOMINMAX
+#define NOMINMAX
+#endif
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN
+#endif
+#include <windows.h>
+#endif
+
+#include "appwindowprofiler.hpp"
+
+#include <algorithm>
+#include <assert.h>
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <stdarg.h>
+#include <stdio.h>
+
+#include "fileoperations.hpp"
+#include "misc.hpp"
+#include <fileformats/bmp.hpp>
+
+namespace nvh {
+
+static void replace(std::string& str, const std::string& from, const std::string& to)
+{
+  size_t start_pos = 0;
+  while((start_pos = str.find(from, start_pos)) != std::string::npos)
+  {
+    str.replace(start_pos, from.length(), to);
+    start_pos += to.length();
+  }
+}
+
+static void fixDeviceName(std::string& deviceName)
+{
+  replace(deviceName, "INTEL(R) ", "");
+  replace(deviceName, "AMD ", "");
+  replace(deviceName, "DRI ", "");
+  replace(deviceName, "(TM) ", "");
+  replace(deviceName, " Series", "");
+  replace(deviceName, " Graphics", "");
+  replace(deviceName, "/PCIe/SSE2", "");
+  std::replace(deviceName.begin(), deviceName.end(), ' ', '_');
+
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '/'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '\\'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), ':'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '?'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '*'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '<'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '>'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '|'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), '"'), deviceName.end());
+  deviceName.erase(std::remove(deviceName.begin(), deviceName.end(), ','), deviceName.end());
+}
+
+void AppWindowProfiler::onMouseMotion(int x, int y)
+{
+  AppWindowProfiler::WindowState& window = m_windowState;
+
+  if(!window.m_mouseButtonFlags && mouse_pos(x, y))
+    return;
+
+  window.m_mouseCurrent[0] = x;
+  window.m_mouseCurrent[1] = y;
+}
+
+void AppWindowProfiler::onMouseButton(MouseButton Button, ButtonAction Action, int mods, int x, int y)
+{
+  AppWindowProfiler::WindowState& window = m_windowState;
+  m_profiler.reset();
+
+  if(mouse_button(Button, Action))
+    return;
+
+  switch(Action)
+  {
+    case BUTTON_PRESS: {
+      switch(Button)
+      {
+        case MOUSE_BUTTON_LEFT: {
+          window.m_mouseButtonFlags |= MOUSE_BUTTONFLAG_LEFT;
+        }
+        break;
+        case MOUSE_BUTTON_MIDDLE: {
+          window.m_mouseButtonFlags |= MOUSE_BUTTONFLAG_MIDDLE;
+        }
+        break;
+        case MOUSE_BUTTON_RIGHT: {
+          window.m_mouseButtonFlags |= MOUSE_BUTTONFLAG_RIGHT;
+        }
+        break;
+      }
+    }
+    break;
+    case BUTTON_RELEASE: {
+      if(!window.m_mouseButtonFlags)
+        break;
+
+      switch(Button)
+      {
+        case MOUSE_BUTTON_LEFT: {
+          window.m_mouseButtonFlags &= ~MOUSE_BUTTONFLAG_LEFT;
+        }
+        break;
+        case MOUSE_BUTTON_MIDDLE: {
+          window.m_mouseButtonFlags &= ~MOUSE_BUTTONFLAG_MIDDLE;
+        }
+        break;
+        case MOUSE_BUTTON_RIGHT: {
+          window.m_mouseButtonFlags &= ~MOUSE_BUTTONFLAG_RIGHT;
+        }
+        break;
+      }
+    }
+    break;
+  }
+}
+
+void AppWindowProfiler::onMouseWheel(int y)
+{
+  AppWindowProfiler::WindowState& window = m_windowState;
+  m_profiler.reset();
+
+  if(mouse_wheel(y))
+    return;
+
+  window.m_mouseWheel += y;
+}
+
+void AppWindowProfiler::onKeyboard(KeyCode key, ButtonAction action, int mods, int x, int y)
+{
+  AppWindowProfiler::WindowState& window = m_windowState;
+  m_profiler.reset();
+
+  if(key_button(key, action, mods))
+    return;
+
+  bool newState = false;
+
+  switch(action)
+  {
+    case BUTTON_PRESS:
+    case BUTTON_REPEAT: {
+      newState = true;
+      break;
+    }
+    case BUTTON_RELEASE: {
+      newState = false;
+      break;
+    }
+  }
+
+  window.m_keyToggled[key] = window.m_keyPressed[key] != newState;
+  window.m_keyPressed[key] = newState;
+}
+
+void AppWindowProfiler::onKeyboardChar(unsigned char key, int mods, int x, int y)
+{
+  m_profiler.reset();
+
+  if(key_char(key))
+    return;
+}
+
+void AppWindowProfiler::parseConfigFile(const char* filename)
+{
+  std::string result = loadFile(filename, false);
+  if(result.empty())
+  {
+    LOGW("file not found: %s\n", filename);
+    return;
+  }
+  std::vector<const char*> args;
+  ParameterList::tokenizeString(result, args);
+
+  std::string path = getFilePath(filename);
+
+  parseConfig(uint32_t(args.size()), args.data(), path);
+}
+
+void AppWindowProfiler::onWindowClose()
+{
+  exitScreenshot();
+}
+
+void AppWindowProfiler::onWindowResize(int width, int height)
+{
+  m_profiler.reset();
+
+  if(width == 0 || height == 0)
+  {
+    return;
+  }
+
+  m_windowState.m_winSize[0] = width;
+  m_windowState.m_winSize[1] = height;
+  if(m_activeContext)
+  {
+    swapResize(width, height);
+  }
+  if(m_active)
+  {
+    resize(m_windowState.m_swapSize[0], m_windowState.m_swapSize[1]);
+  }
+}
+
+
+void AppWindowProfiler::setVsync(bool state)
+{
+  if(m_internal)
+  {
+    swapVsync(state);
+    LOGI("vsync: %s\n", state ? "on" : "off");
+  }
+  m_config.vsyncstate = state;
+  m_vsync             = state;
+}
+
+int AppWindowProfiler::run(const std::string& title, int argc, const char** argv, int width, int height, bool requireGLContext)
+{
+  m_config.winsize[0] = m_config.winsize[0] ? m_config.winsize[0] : width;
+  m_config.winsize[1] = m_config.winsize[1] ? m_config.winsize[1] : height;
+
+  // skip first argument here (exe file)
+  parseConfig(argc - 1, argv + 1, ".");
+  if(!validateConfig())
+  {
+    return EXIT_FAILURE;
+  }
+
+  if(!NVPWindow::open(m_config.winpos[0], m_config.winpos[1], m_config.winsize[0], m_config.winsize[1], title.c_str(), requireGLContext))
+  {
+    LOGE("Could not create window\n");
+    return EXIT_FAILURE;
+  }
+  m_windowState.m_winSize[0] = m_config.winsize[0];
+  m_windowState.m_winSize[1] = m_config.winsize[1];
+
+  postConfigPreContext();
+  contextInit();
+  m_activeContext = true;
+
+  // hack to react on $DEVICE$ filename
+  if(!m_config.logFilename.empty())
+  {
+    parameterCallback(m_paramLog);
+  }
+
+  if(contextGetDeviceName())
+  {
+    std::string deviceName = contextGetDeviceName();
+    fixDeviceName(deviceName);
+    LOGOK("DEVICE: %s\n", deviceName.c_str());
+  }
+
+  initBenchmark();
+
+  setVsync(m_config.vsyncstate);
+
+  bool Run = begin();
+  m_active = true;
+
+  bool quickExit = m_config.quickexit;
+  if(m_config.frameLimit)
+  {
+    m_profilerPrint = false;
+    quickExit       = true;
+  }
+
+  double timeStart = getTime();
+  double timeBegin = getTime();
+  double frames    = 0;
+
+  bool lastVsync = m_vsync;
+
+  m_hadProfilerPrint = false;
+
+  double lastProfilerPrintTime = 0;
+
+
+  if(Run)
+  {
+    while(pollEvents())
+    {
+      bool wasClosed = false;
+      while(!isOpen())
+      {
+        NVPSystem::waitEvents();
+        wasClosed = true;
+      }
+      if(wasClosed)
+      {
+        continue;
+      }
+
+      if(m_windowState.onPress(KEY_V))
+      {
+        setVsync(!m_vsync);
+      }
+
+      std::string stats;
+      {
+        bool   benchmarkActive = m_benchmark.sequence.isActive();
+        double curTime         = getTime();
+        double printInterval   = m_profilerPrint && !benchmarkActive ? float(m_config.intervalSeconds) : float(FLT_MAX);
+        bool   printStats      = ((curTime - lastProfilerPrintTime) > printInterval);
+
+        if(printStats)
+        {
+          lastProfilerPrintTime = curTime;
+        }
+        m_profiler.beginFrame();
+
+        swapPrepare();
+        {
+          //const nvh::Profiler::Section profile(m_profiler, "App");
+          think(getTime() - timeStart);
+        }
+        memset(m_windowState.m_keyToggled, 0, sizeof(m_windowState.m_keyToggled));
+        swapBuffers();
+
+        m_profiler.endFrame();
+        if(printStats)
+        {
+          m_profiler.print(stats);
+        }
+      }
+
+      m_hadProfilerPrint = false;
+
+      if(m_profilerPrint && !stats.empty())
+      {
+        if(!m_config.timerLimit || m_config.timerLimit == 1)
+        {
+          LOGI("%s\n", stats.c_str());
+          m_hadProfilerPrint = true;
+        }
+        if(m_config.timerLimit == 1)
+        {
+          m_config.frameLimit = 1;
+        }
+        if(m_config.timerLimit)
+        {
+          m_config.timerLimit--;
+        }
+      }
+
+      advanceBenchmark();
+      postProfiling();
+
+      frames++;
+
+      double timeCurrent = getTime();
+      double timeDelta   = timeCurrent - timeBegin;
+      if(timeDelta > double(m_config.intervalSeconds) || lastVsync != m_vsync || m_config.frameLimit == 1)
+      {
+        std::ostringstream combined;
+
+        if(lastVsync != m_vsync)
+        {
+          timeDelta = 0;
+        }
+
+        if(m_timeInTitle)
+        {
+          combined << title << ": " << (timeDelta * 1000.0 / (frames)) << " [ms]"
+                   << (m_vsync ? " (vsync on - V for toggle)" : "");
+          setTitle(combined.str().c_str());
+        }
+
+        if(m_config.frameLimit == 1)
+        {
+          LOGI("frametime: %f ms\n", (timeDelta * 1000.0 / (frames)));
+        }
+
+        frames    = 0;
+        timeBegin = timeCurrent;
+        lastVsync = m_vsync;
+      }
+
+      if(m_windowState.m_keyPressed[KEY_ESCAPE] || m_config.frameLimit == 1)
+        break;
+
+      if(m_config.frameLimit)
+        m_config.frameLimit--;
+    }
+  }
+  contextSync();
+  exitScreenshot();
+
+  if(quickExit)
+  {
+    exit(EXIT_SUCCESS);
+    return EXIT_SUCCESS;
+  }
+
+  end();
+  m_active = false;
+  contextDeinit();
+  postEnd();
+
+  return Run ? EXIT_SUCCESS : EXIT_FAILURE;
+}
+
+void AppWindowProfiler::leave()
+{
+  m_config.frameLimit = 1;
+}
+
+std::string AppWindowProfiler::specialStrings(const char* original)
+{
+  std::string str(original);
+
+  if(strstr(original, "$DEVICE$"))
+  {
+    if(contextGetDeviceName())
+    {
+      std::string deviceName = contextGetDeviceName();
+      fixDeviceName(deviceName);
+      if(deviceName.empty())
+      {
+        // no proper device name available
+        return std::string();
+      }
+
+      // replace $DEVICE$
+      replace(str, "$DEVICE$", deviceName);
+    }
+    else
+    {
+      // no proper device name available
+      return std::string();
+    }
+  }
+  return str;
+}
+
+void AppWindowProfiler::parameterCallback(uint32_t param)
+{
+  if(param == m_paramLog)
+  {
+    std::string logfileName = specialStrings(m_config.logFilename.c_str());
+    if(!logfileName.empty())
+    {
+      nvprintSetLogFileName(logfileName.c_str());
+    }
+  }
+  else if(param == m_paramCfg || param == m_paramBat)
+  {
+    parseConfigFile(m_config.configFilename.c_str());
+  }
+  else if(param == m_paramWinsize)
+  {
+    if(m_internal)
+    {
+      setWindowSize(m_config.winsize[0], m_config.winsize[1]);
+    }
+  }
+
+  if(!m_active)
+    return;
+
+  if(param == m_paramVsync)
+  {
+    setVsync(m_config.vsyncstate);
+  }
+  else if(param == m_paramScreenshot)
+  {
+    std::string filename = specialStrings(m_config.screenshotFilename.c_str());
+    if(!filename.empty())
+    {
+      screenshot(filename.c_str());
+    }
+  }
+  else if(param == m_paramClear)
+  {
+    clear(m_config.clearColor[0], m_config.clearColor[1], m_config.clearColor[2]);
+  }
+}
+
+void AppWindowProfiler::setupParameters()
+{
+  nvh::ParameterList::Callback callback = [&](uint32_t param) { parameterCallback(param); };
+
+  m_paramWinsize = m_parameterList.add("winsize|Set window size (width and height)", m_config.winsize, callback, 2);
+  m_paramVsync   = m_parameterList.add("vsync|Enable or disable vsync", &m_config.vsyncstate, callback);
+  m_paramLog     = m_parameterList.addFilename("logfile|Set logfile", &m_config.logFilename, callback);
+  m_paramCfg = m_parameterList.addFilename(".cfg|load parameters from this config file", &m_config.configFilename, callback);
+  m_paramBat = m_parameterList.addFilename(".bat|load parameters from this batch file", &m_config.configFilename, callback);
+  m_parameterList.add("winpos|Set window position (x and y)", m_config.winpos, nullptr, 2);
+  m_parameterList.add("frames|Set number of frames to render before exit", &m_config.frameLimit);
+  m_parameterList.add("timerprints|Set number of timerprints to do, before exit", &m_config.timerLimit);
+  m_parameterList.add("timerinterval|Set interval of timer prints in seconds", &m_config.intervalSeconds);
+  m_parameterList.add("bmpatexit|Set file to store a bitmap image of the last frame at exit", &m_config.dumpatexitFilename);
+  m_parameterList.addFilename("benchmark|Set benchmark filename", &m_benchmark.filename);
+  m_parameterList.add("benchmarkframes|Set number of benchmarkframes", &m_benchmark.frameLength);
+  m_parameterList.add("quickexit|skips tear down", &m_config.quickexit);
+  m_paramScreenshot = m_parameterList.add("screenshot|makes a screenshot into this file", &m_config.screenshotFilename, callback);
+  m_paramClear = m_parameterList.add("clear|clears window color (r,b,g in 0-255) using OS", m_config.clearColor, callback, 3);
+}
+
+void AppWindowProfiler::exitScreenshot()
+{
+  if(!m_config.dumpatexitFilename.empty() && !m_hadScreenshot)
+  {
+    screenshot(m_config.dumpatexitFilename.c_str());
+    m_hadScreenshot = true;
+  }
+}
+
+void AppWindowProfiler::initBenchmark()
+{
+  if(m_benchmark.filename.empty())
+    return;
+
+  m_benchmark.content = loadFile(m_benchmark.filename.c_str(), false);
+  if(!m_benchmark.content.empty())
+  {
+    std::vector<const char*> tokens;
+    ParameterList::tokenizeString(m_benchmark.content, tokens);
+
+    std::string path = getFilePath(m_benchmark.filename.c_str());
+
+    m_benchmark.sequence.init(&m_parameterList, tokens);
+
+    // do first iteration manually, due to custom arg parsing
+    uint32_t argBegin;
+    uint32_t argCount;
+    if(!m_benchmark.sequence.advanceIteration("benchmark", 1, argBegin, argCount))
+    {
+      parseConfig(argCount, &tokens[argBegin], path);
+    }
+
+    m_profiler.reset(nvh::Profiler::CONFIG_DELAY);
+
+    m_benchmark.frame = 0;
+    m_profilerPrint   = false;
+  }
+}
+
+void AppWindowProfiler::advanceBenchmark()
+{
+  if(!m_benchmark.sequence.isActive())
+    return;
+
+  m_benchmark.frame++;
+
+  if(m_benchmark.frame > m_benchmark.frameLength + nvh::Profiler::CONFIG_DELAY + nvh::Profiler::FRAME_DELAY)
+  {
+    m_benchmark.frame = 0;
+
+    std::string stats;
+    m_profiler.print(stats);
+    LOGI("BENCHMARK %d \"%s\" {\n", m_benchmark.sequence.getIteration(), m_benchmark.sequence.getSeparatorArg(0));
+    LOGI("%s}\n\n", stats.c_str());
+
+    bool done = m_benchmark.sequence.applyIteration("benchmark", 1, "-");
+    m_profiler.reset(nvh::Profiler::CONFIG_DELAY);
+
+    postBenchmarkAdvance();
+
+    if(done)
+    {
+      leave();
+    }
+  }
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/appwindowprofiler.hpp
+++ b/raytracer/nvpro_core/nvh/appwindowprofiler.hpp
@ -0,0 +1,252 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_PROJECTBASE_INCLUDED
+#define NV_PROJECTBASE_INCLUDED
+
+#include <nvpwindow.hpp>
+#include <string.h>  // for memset
+
+#include "parametertools.hpp"
+#include "profiler.hpp"
+
+
+namespace nvh {
+
+/** @DOC_START
+    # class nvh::AppWindowProfiler
+    nvh::AppWindowProfiler provides an alternative utility wrapper class around NVPWindow.
+    It is useful to derive single-window applications from and is used by some
+    but not all nvpro-samples.
+
+    Further functionality is provided :
+    - built-in profiler/timer reporting to console
+    - command-line argument parsing as well as config file parsing using the ParameterTools
+      see AppWindowProfiler::setupParameters() for built-in commands
+    - benchmark/automation mode using ParameterTools
+    - screenshot creation
+    - logfile based on devicename (depends on context)
+    - optional context/swapchain interface
+      the derived classes nvvk/appwindowprofiler_vk and nvgl/appwindowprofiler_gl make use of this
+@DOC_END  */
+
+#define NV_PROFILE_BASE_SECTION(name) nvh::Profiler::Section _tempTimer(m_profiler, name)
+#define NV_PROFILE_BASE_SPLIT() m_profiler.accumulationSplit()
+
+class AppWindowProfiler : public NVPWindow
+{
+public:
+  class WindowState
+  {
+  public:
+    WindowState()
+        : m_mouseButtonFlags(0)
+        , m_mouseWheel(0)
+    {
+      memset(m_keyPressed, 0, sizeof(m_keyPressed));
+      memset(m_keyToggled, 0, sizeof(m_keyToggled));
+    }
+
+    int m_winSize[2];
+    int m_swapSize[2];
+    int m_mouseCurrent[2];
+    int m_mouseButtonFlags;
+    int m_mouseWheel;
+
+    bool m_keyPressed[KEY_LAST + 1];
+    bool m_keyToggled[KEY_LAST + 1];
+
+    bool onPress(int key) { return m_keyPressed[key] && m_keyToggled[key]; }
+  };
+
+  //////////////////////////////////////////////////////////////////////////
+
+  WindowState   m_windowState;
+  nvh::Profiler m_profiler;
+  bool          m_profilerPrint;
+  bool          m_hadProfilerPrint;
+  bool          m_timeInTitle;
+
+  ParameterList m_parameterList;
+
+
+  AppWindowProfiler(bool deprecated = true)
+      : m_profilerPrint(true)
+      , m_vsync(false)
+      , m_active(false)
+      , m_timeInTitle(true)
+      , m_hadScreenshot(false)
+  {
+    setupParameters();
+  }
+
+  // Sample Related
+  //////////////////////////////////////////////////////////////////////////
+
+  // setup sample (this is executed after window/context creation)
+  virtual bool begin() { return false; }
+  // tear down sample (triggered by ESC/window close)
+  virtual void end() {}
+  // do primary logic/drawing etc. here
+  virtual void think(double time) {}
+  // react on swapchain resizes here
+  // may be different to winWidth/winHeight!
+  virtual void resize(int swapWidth, int swapHeight) {}
+
+  // return true to prevent m_window state updates
+  virtual bool mouse_pos(int x, int y) { return false; }
+  virtual bool mouse_button(int button, int action) { return false; }
+  virtual bool mouse_wheel(int wheel) { return false; }
+  virtual bool key_button(int button, int action, int modifier) { return false; }
+  virtual bool key_char(int button) { return false; }
+
+  virtual void parseConfig(int argc, const char** argv, const std::string& path)
+  {
+    // if you want to handle parameters not represented in
+    // m_parameterList then override this function accordingly.
+    m_parameterList.applyTokens(argc, argv, "-", path.c_str());
+    // This function is called before "begin" and provided with the commandline used in "run".
+    // It can also be called by the benchmarking system, and parseConfigFile.
+  }
+  virtual bool validateConfig()
+  {
+    // override if you want to test the state of app after parsing configs
+    // returning false terminates app
+    return true;
+  }
+
+  // additional special-purpose callbacks
+
+  virtual void postProfiling() {}
+  virtual void postEnd() {}
+  virtual void postBenchmarkAdvance() {}
+  virtual void postConfigPreContext(){};
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // initial kickoff (typically called from main)
+  int  run(const std::string& name, int argc, const char** argv, int width, int height, bool requireGLContext);
+  void leave();
+
+  void parseConfigFile(const char* filename);
+
+  // handles special strings (returns empty string if
+  // could not do the replacement properly)
+  // known specials:
+  // $DEVICE$
+  std::string specialStrings(const char* original);
+
+  void setVsync(bool state);
+  bool getVsync() const { return m_vsync; }
+
+  //////////////////////////////////////////////////////////////////////////
+  // Context Window (if desired, not mandatory )
+  //
+  // Used when deriving from this class for the purpose of providing 3D Api contexts
+  // nvvk/appwindowprofiler_vk or nvgl/appwindowprofiler_gl make use of this.
+
+  virtual void        contextInit() {}
+  virtual void        contextDeinit() {}
+  virtual void        contextSync() {}
+  virtual const char* contextGetDeviceName() { return NULL; }
+
+  virtual void swapResize(int winWidth, int winHeight)
+  {
+    m_windowState.m_swapSize[0] = winWidth;
+    m_windowState.m_swapSize[1] = winHeight;
+  }
+  virtual void swapPrepare() {}
+  virtual void swapBuffers() {}
+  virtual void swapVsync(bool state) {}
+
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // inherited from NVPWindow, don't use them directly, use the "Sample-related" ones
+  void onWindowClose() override;
+  void onWindowResize(int w, int h) override;
+  void onWindowRefresh() override {}  // leave empty, we call redraw ourselves in think
+  void onMouseMotion(int x, int y) override;
+  void onMouseWheel(int delta) override;
+  void onMouseButton(MouseButton button, ButtonAction action, int mods, int x, int y) override;
+  void onKeyboard(KeyCode key, ButtonAction action, int mods, int x, int y) override;
+  void onKeyboardChar(unsigned char key, int mods, int x, int y) override;
+
+private:
+  struct Benchmark
+  {
+    std::string            filename;
+    std::string            content;
+    nvh::ParameterSequence sequence;
+    uint32_t               frameLength = 256;
+    uint32_t               frame       = 0;
+  };
+
+  struct Config
+  {
+    int32_t     winpos[2];
+    int32_t     winsize[2];
+    bool        vsyncstate      = true;
+    bool        quickexit       = false;
+    uint32_t    intervalSeconds = 2;
+    uint32_t    frameLimit      = 0;
+    uint32_t    timerLimit      = 0;
+    std::string dumpatexitFilename;
+    std::string screenshotFilename;
+    std::string logFilename;
+    std::string configFilename;
+    uint32_t    clearColor[3] = {127, 0, 0};
+
+    Config()
+    {
+      winpos[0]  = 50;
+      winpos[1]  = 50;
+      winsize[0] = 0;
+      winsize[1] = 0;
+    }
+  };
+
+  void parameterCallback(uint32_t param);
+
+  void setupParameters();
+  void exitScreenshot();
+
+  void initBenchmark();
+  void advanceBenchmark();
+
+  bool      m_activeContext = false;
+  bool      m_active        = false;
+  bool      m_vsync;
+  bool      m_hadScreenshot;
+  Config    m_config;
+  Benchmark m_benchmark;
+
+  uint32_t m_paramWinsize;
+  uint32_t m_paramVsync;
+  uint32_t m_paramScreenshot;
+  uint32_t m_paramLog;
+  uint32_t m_paramCfg;
+  uint32_t m_paramBat;
+  uint32_t m_paramClear;
+};
+}  // namespace nvh
+
+
+#endif
--- a/raytracer/nvpro_core/nvh/bitarray.cpp
+++ b/raytracer/nvpro_core/nvh/bitarray.cpp
@ -0,0 +1,218 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include "bitarray.hpp"
+
+
+namespace nvh {
+/** \brief Create a new BitVector.
+  **/
+BitArray::BitArray()
+    : m_size(0)
+    , m_bits(NULL)
+{
+}
+
+/** \brief Create a new BitVector with all bits set to false
+      \param size Number of Bits in the Array
+  **/
+BitArray::BitArray(size_t size)
+    : m_size(size)
+    , m_bits(new BitStorageType[determineNumberOfElements()])
+{
+  clear();
+}
+
+BitArray::BitArray(const BitArray& rhs)
+    : m_size(rhs.m_size)
+    , m_bits(new BitStorageType[determineNumberOfElements()])
+{
+  std::copy(rhs.m_bits, rhs.m_bits + determineNumberOfElements(), m_bits);
+}
+
+BitArray::~BitArray()
+{
+  delete[] m_bits;
+}
+
+void BitArray::resize(size_t newSize, bool defaultValue)
+{
+  // if the default value for the new bits is true enabled the unused bits in the last element.
+  if(defaultValue)
+  {
+    setUnusedBits();
+  }
+
+  size_t oldNumberOfElements = determineNumberOfElements();
+  m_size                     = newSize;
+  size_t newNumberOfElements = determineNumberOfElements();
+
+  // the number of elements has changed, reallocate array
+  if(oldNumberOfElements != newNumberOfElements)
+  {
+    BitStorageType* NV_RESTRICT newBits = new BitStorageType[newNumberOfElements];
+    if(newNumberOfElements < oldNumberOfElements)
+    {
+      std::copy(m_bits, m_bits + newNumberOfElements, newBits);
+    }
+    else
+    {
+      std::copy(m_bits, m_bits + oldNumberOfElements, newBits);
+      std::fill(newBits + oldNumberOfElements, newBits + newNumberOfElements,
+                defaultValue ? ~BitStorageType(0) : BitStorageType(0));
+    }
+    delete[] m_bits;
+    m_bits = newBits;
+  }
+  clearUnusedBits();
+}
+
+BitArray& BitArray::operator=(const BitArray& rhs)
+{
+  if(m_size != rhs.m_size)
+  {
+    m_size = rhs.m_size;
+    delete[] m_bits;
+    m_bits = new BitStorageType[determineNumberOfElements()];
+  }
+  std::copy(rhs.m_bits, rhs.m_bits + determineNumberOfElements(), m_bits);
+
+  return *this;
+}
+
+bool BitArray::operator==(const BitArray& rhs)
+{
+  return (m_size == rhs.m_size) ? std::equal(m_bits, m_bits + determineNumberOfElements(), rhs.m_bits) : false;
+}
+
+BitArray BitArray::operator^(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  BitArray result(getSize());
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    result.m_bits[index] = m_bits[index] ^ rhs.m_bits[index];
+  }
+  clearUnusedBits();
+
+  return result;
+}
+
+BitArray BitArray::operator|(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  BitArray result(getSize());
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    result.m_bits[index] = m_bits[index] | rhs.m_bits[index];
+  }
+  clearUnusedBits();
+
+  return result;
+}
+
+BitArray BitArray::operator&(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  BitArray result(getSize());
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    result.m_bits[index] = m_bits[index] & rhs.m_bits[index];
+  }
+  clearUnusedBits();
+
+  return result;
+}
+
+BitArray& BitArray::operator^=(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    m_bits[index] ^= rhs.m_bits[index];
+  }
+  clearUnusedBits();
+
+  return *this;
+}
+
+BitArray& BitArray::operator|=(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    m_bits[index] |= rhs.m_bits[index];
+  }
+
+  return *this;
+}
+
+BitArray& BitArray::operator&=(BitArray const& rhs)
+{
+  NV_ASSERT(getSize() == rhs.getSize());
+
+  for(size_t index = 0; index < determineNumberOfElements(); ++index)
+  {
+    m_bits[index] &= rhs.m_bits[index];
+  }
+
+  return *this;
+}
+
+void BitArray::clear()
+{
+  std::fill(m_bits, m_bits + determineNumberOfElements(), 0);
+}
+
+void BitArray::fill()
+{
+  if(determineNumberOfElements())
+  {
+    std::fill(m_bits, m_bits + determineNumberOfElements(), ~0);
+
+    clearUnusedBits();
+  }
+}
+
+size_t BitArray::countLeadingZeroes() const
+{
+  size_t index = 0;
+
+  // first count
+  while(index < determineNumberOfElements() && !m_bits[index])
+  {
+    ++index;
+  }
+
+  size_t leadingZeroes = index * StorageBitsPerElement;
+  if(index < determineNumberOfElements())
+  {
+    leadingZeroes += ctz(m_bits[index]);
+  }
+
+  return std::min(leadingZeroes, getSize());
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/bitarray.hpp
+++ b/raytracer/nvpro_core/nvh/bitarray.hpp
@ -0,0 +1,324 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_BITARRAY_H__
+#define NV_BITARRAY_H__
+
+#include <algorithm>
+#include <platform.h>
+#if(defined(NV_X86) || defined(NV_X64)) && defined(_MSC_VER)
+#include <intrin.h>
+#endif
+
+namespace nvh {
+
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # class nvh::BitArray
+
+    > The nvh::BitArray class implements a tightly packed boolean array using single bits stored in uint64_t values.
+    Whenever you want large boolean arrays this representation is preferred for cache-efficiency.
+    The Visitor and OffsetVisitor traversal mechanisms make use of cpu intrinsics to speed up iteration over bits.
+  
+    Example:
+    ```cpp
+    BitArray modifiedObjects(1024);
+  
+    // set some bits
+    modifiedObjects.setBit(24,true);
+    modifiedObjects.setBit(37,true);
+  
+    // iterate over all set bits using the built-in traversal mechanism
+  
+    struct MyVisitor {
+    void operator()( size_t index ){
+        // called with the index of a set bit
+        myObjects[index].update();
+      }
+    };
+  
+    MyVisitor visitor;
+    modifiedObjects.traverseBits(visitor);
+    ```
+@DOC_END  */
+
+/** >  Visitor which forwards the visitor operator with a fixed offset **/
+template <typename Visitor>
+struct OffsetVisitor
+{
+  inline OffsetVisitor(Visitor& visitor, size_t offset)
+      : m_visitor(visitor)
+      , m_offset(offset)
+  {
+  }
+
+  inline void operator()(size_t index) { m_visitor(index + m_offset); }
+
+private:
+  Visitor& m_visitor;
+  size_t   m_offset;
+};
+
+
+#if(defined(NV_X86) || defined(NV_X64)) && defined(_MSC_VER)
+template <typename Visitor>
+inline void bitTraverse(uint32_t bits, Visitor& visitor)
+{
+  unsigned long localIndex;
+  while(_BitScanForward(&localIndex, bits))
+  {
+    visitor(localIndex);
+    bits ^= 1 << localIndex;  // clear the current bit so that the next one is being found by the bitscan
+  }
+}
+
+template <typename Visitor>
+inline void bitTraverse(uint64_t bits, Visitor& visitor)
+{
+  unsigned long localIndex;
+  while(_BitScanForward64(&localIndex, bits))
+  {
+    visitor(localIndex);
+    bits ^= uint64_t(1) << localIndex;  // clear the current bit so that the next one is being found by the bitscan
+  }
+}
+
+inline size_t ctz(uint64_t bits)
+{
+  unsigned long localIndex;
+  return _BitScanForward64(&localIndex, bits) ? localIndex : 64;
+}
+
+inline size_t ctz(uint32_t bits)
+{
+  unsigned long localIndex;
+  return _BitScanForward(&localIndex, bits) ? localIndex : 32;
+}
+#else
+inline size_t ctz(uint64_t bits)
+{
+  return (bits != 0) ? __builtin_ctzl(bits) : 64;
+}
+
+inline size_t ctz(uint32_t bits)
+{
+  return (bits != 0) ? __builtin_ctz(bits) : 32;
+}
+
+// TODO implement GCC version!
+template <typename BitType, typename Visitor>
+inline void bitTraverse(BitType bits, Visitor visitor)
+{
+  size_t index = 0;
+  while(bits)
+  {
+    if(bits & 0xff)  // skip ifs if the byte is 0
+    {
+      if(bits & 0x01)
+        visitor(index + 0);
+      if(bits & 0x02)
+        visitor(index + 1);
+      if(bits & 0x04)
+        visitor(index + 2);
+      if(bits & 0x08)
+        visitor(index + 3);
+      if(bits & 0x10)
+        visitor(index + 4);
+      if(bits & 0x20)
+        visitor(index + 5);
+      if(bits & 0x40)
+        visitor(index + 6);
+      if(bits & 0x80)
+        visitor(index + 7);
+    }
+    bits >>= 8;
+    index += 8;
+  }
+}
+#endif
+
+/** >  Call visitor(index) for each bit set **/
+template <typename BitType, typename Visitor>
+inline void bitTraverse(BitType* elements, size_t numberOfElements, Visitor& visitor)
+{
+  size_t baseIndex = 0;
+  for(size_t elementIndex = 0; elementIndex < numberOfElements; ++elementIndex)
+  {
+    OffsetVisitor<Visitor> offsetVisitor(visitor, baseIndex);
+    bitTraverse(elements[elementIndex], offsetVisitor);
+    baseIndex += sizeof(*elements) * 8;
+  }
+}
+
+class BitArray
+{
+public:
+  typedef uint64_t BitStorageType;
+  enum
+  {
+    StorageBitsPerElement = sizeof(BitStorageType) * 8
+  };
+
+  BitArray();
+  BitArray(size_t size);
+  BitArray(const BitArray& rhs);
+  ~BitArray();
+
+  BitArray& operator=(const BitArray& rhs);
+  bool      operator==(const BitArray& rhs);
+  BitArray  operator^(BitArray const& rhs);
+  BitArray  operator&(BitArray const& rhs);
+  BitArray  operator|(BitArray const& rhs);
+  BitArray& operator^=(BitArray const& rhs);
+  BitArray& operator&=(BitArray const& rhs);
+  BitArray& operator|=(BitArray const& rhs);
+
+  void clear();
+  void fill();
+
+  /** >  Change the number of bits in this array. The state of remaining bits is being kept.
+               New bits will be initialized to false.
+        \param size New number of bits in this array
+        \param defaultValue The new default value for the new bits
+    **/
+  void resize(size_t size, bool defaultValue = false);
+
+  size_t getSize() const { return m_size; }
+
+  // inline functions
+  void enableBit(size_t index);
+  void disableBit(size_t index);
+  void setBit(size_t index, bool value);
+  bool getBit(size_t index) const;
+
+  BitStorageType const* getBits() const;
+
+  template <typename Visitor>
+  void traverseBits(Visitor visitor);
+
+  size_t countLeadingZeroes() const;
+
+private:
+  size_t                      m_size;
+  BitStorageType* NV_RESTRICT m_bits;
+
+  void   determineBitPosition(size_t index, size_t& element, size_t& bit) const;
+  size_t determineNumberOfElements() const;
+
+  /** >  Clear the last unused bits in the last element.
+        \remarks Clear bits whose number is >= m_size. those are traversed unconditional and would produce invalid results.
+                 restrict shifting range to 0 to StorageBitsPerElement - 1 to handle the case usedBitsInLastElement==0
+                 which would result in shifting StorageBitsPerElement which is undefined by the standard and not the desired operation.
+    **/
+  void clearUnusedBits();
+
+  /** >  Set the last unused bits in the last element.
+        \remarks Set bits whose number is >= m_size. This is required when expanding the vector with the bits set to true.
+    **/
+  void setUnusedBits();
+};
+
+/** >  Determine the element / bit for the given index **/
+inline void BitArray::determineBitPosition(size_t index, size_t& element, size_t& bit) const
+{
+  element = index / StorageBitsPerElement;
+  bit     = index % StorageBitsPerElement;
+}
+
+inline size_t BitArray::determineNumberOfElements() const
+{
+  return (m_size + StorageBitsPerElement - 1) / StorageBitsPerElement;
+}
+
+inline void BitArray::enableBit(size_t index)
+{
+  NV_ASSERT(index < m_size);
+  size_t element;
+  size_t bit;
+  determineBitPosition(index, element, bit);
+  m_bits[element] |= BitStorageType(1) << bit;
+}
+
+inline void BitArray::disableBit(size_t index)
+{
+  NV_ASSERT(index < m_size);
+
+  size_t element;
+  size_t bit;
+  determineBitPosition(index, element, bit);
+  m_bits[element] &= ~(BitStorageType(1) << bit);
+}
+
+inline void BitArray::setBit(size_t index, bool value)
+{
+  NV_ASSERT(index < m_size);
+  if(value)
+  {
+    enableBit(index);
+  }
+  else
+  {
+    disableBit(index);
+  }
+}
+
+inline BitArray::BitStorageType const* BitArray::getBits() const
+{
+  return m_bits;
+}
+
+inline bool BitArray::getBit(size_t index) const
+{
+  NV_ASSERT(index < m_size);
+  size_t element;
+  size_t bit;
+  determineBitPosition(index, element, bit);
+  return !!(m_bits[element] & (BitStorageType(1) << bit));
+}
+
+/** >  call Visitor( size_t index ) on all bits which are set. **/
+template <typename Visitor>
+inline void BitArray::traverseBits(Visitor visitor)
+{
+  bitTraverse(m_bits, determineNumberOfElements(), visitor);
+}
+
+inline void BitArray::clearUnusedBits()
+{
+  if(m_size)
+  {
+    size_t usedBitsInLastElement = m_size % StorageBitsPerElement;
+    m_bits[determineNumberOfElements() - 1] &=
+        ~BitStorageType(0) >> ((StorageBitsPerElement - usedBitsInLastElement) & (StorageBitsPerElement - 1));
+  }
+}
+
+inline void BitArray::setUnusedBits()
+{
+  if(m_size)
+  {
+    size_t usedBitsInLastElement = m_size % StorageBitsPerElement;
+    m_bits[determineNumberOfElements() - 1] |= ~BitStorageType(0) << usedBitsInLastElement;
+  }
+}
+}  // namespace nvh
+
+
+#endif
--- a/raytracer/nvpro_core/nvh/boundingbox.hpp
+++ b/raytracer/nvpro_core/nvh/boundingbox.hpp
@ -0,0 +1,121 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2022 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#include <glm/gtc/matrix_access.hpp>
+
+namespace nvh {
+
+/* @DOC_START
+
+```nvh::Bbox``` is a class to create bounding boxes.
+It grows by adding 3d vector, can combine other bound boxes.
+And it returns information, like its volume, its center, the min, max, etc..
+
+@DOC_END */
+struct Bbox
+{
+  Bbox() = default;
+  Bbox(glm::vec3 _min, glm::vec3 _max)
+      : m_min(_min)
+      , m_max(_max)
+  {
+  }
+  Bbox(const std::vector<glm::vec3>& corners)
+  {
+    for(auto& c : corners)
+    {
+      insert(c);
+    }
+  }
+
+  void insert(const glm::vec3& v)
+  {
+    m_min = {std::min(m_min.x, v.x), std::min(m_min.y, v.y), std::min(m_min.z, v.z)};
+    m_max = {std::max(m_max.x, v.x), std::max(m_max.y, v.y), std::max(m_max.z, v.z)};
+  }
+
+  void insert(const Bbox& b)
+  {
+    insert(b.m_min);
+    insert(b.m_max);
+  }
+
+  inline Bbox& operator+=(float v)
+  {
+    m_min -= v;
+    m_max += v;
+    return *this;
+  }
+
+  inline bool isEmpty() const
+  {
+    return m_min == glm::vec3{std::numeric_limits<float>::max()} || m_max == glm::vec3{std::numeric_limits<float>::lowest()};
+  }
+
+  inline uint32_t rank() const
+  {
+    uint32_t result{0};
+    result += m_min.x < m_max.x;
+    result += m_min.y < m_max.y;
+    result += m_min.z < m_max.z;
+    return result;
+  }
+  inline bool      isPoint() const { return m_min == m_max; }
+  inline bool      isLine() const { return rank() == 1u; }
+  inline bool      isPlane() const { return rank() == 2u; }
+  inline bool      isVolume() const { return rank() == 3u; }
+  inline glm::vec3 min() { return m_min; }
+  inline glm::vec3 max() { return m_max; }
+  inline glm::vec3 extents() { return m_max - m_min; }
+  inline glm::vec3 center() { return (m_min + m_max) * 0.5f; }
+  inline float     radius() { return glm::length(m_max - m_min) * 0.5f; }
+
+  Bbox transform(glm::mat4 mat)
+  {
+    // Make sure this is a 3D transformation + translation:
+    auto        r       = glm::row(mat, 3);
+    const float epsilon = 1e-6f;
+    assert(fabs(r.x) < epsilon && fabs(r.y) < epsilon && fabs(r.z) < epsilon && fabs(r.w - 1.0f) < epsilon);
+
+    std::vector<glm::vec3> corners(8);
+    corners[0] = glm::vec3(mat * glm::vec4(m_min.x, m_min.y, m_min.z, 1.f));
+    corners[1] = glm::vec3(mat * glm::vec4(m_min.x, m_min.y, m_max.z, 1.f));
+    corners[2] = glm::vec3(mat * glm::vec4(m_min.x, m_max.y, m_min.z, 1.f));
+    corners[3] = glm::vec3(mat * glm::vec4(m_min.x, m_max.y, m_max.z, 1.f));
+    corners[4] = glm::vec3(mat * glm::vec4(m_max.x, m_min.y, m_min.z, 1.f));
+    corners[5] = glm::vec3(mat * glm::vec4(m_max.x, m_min.y, m_max.z, 1.f));
+    corners[6] = glm::vec3(mat * glm::vec4(m_max.x, m_max.y, m_min.z, 1.f));
+    corners[7] = glm::vec3(mat * glm::vec4(m_max.x, m_max.y, m_max.z, 1.f));
+
+    Bbox result(corners);
+    return result;
+  }
+
+private:
+  glm::vec3 m_min{std::numeric_limits<float>::max()};
+  glm::vec3 m_max{-std::numeric_limits<float>::max()};
+};
+
+template <typename T, typename TFlag>
+inline bool hasFlag(T a, TFlag flag)
+{
+  return (a & flag) == flag;
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/cameracontrol.hpp
+++ b/raytracer/nvpro_core/nvh/cameracontrol.hpp
@ -0,0 +1,236 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_CAMCONTROL_INCLUDED
+#define NV_CAMCONTROL_INCLUDED
+
+#include <algorithm>
+#include <glm/ext/matrix_transform.hpp>
+#include <glm/gtx/euler_angles.hpp>
+
+
+namespace nvh {
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # class nvh::CameraControl
+
+    > nvh::CameraControl is a utility class to create a viewmatrix based on mouse inputs.
+  
+    It can operate in perspective or orthographic mode (`m_sceneOrtho==true`).
+  
+    perspective:
+    - LMB: rotate
+    - RMB or WHEEL: zoom via dolly movement
+    - MMB: pan/move within camera plane
+  
+    ortho:
+    - LMB: pan/move within camera plane
+    - RMB or WHEEL: zoom via dolly movement, application needs to use `m_sceneOrthoZoom` for projection matrix adjustment
+    - MMB: rotate
+  
+    The camera can be orbiting (`m_useOrbit==true`) around `m_sceneOrbit` or
+    otherwise provide "first person/fly through"-like controls.
+  
+    Speed of movement/rotation etc. is influenced by `m_sceneDimension` as well as the 
+    sensitivity values.
+@DOC_END  */
+
+class CameraControl
+{
+public:
+  CameraControl()
+      : m_lastButtonFlags(0)
+      , m_lastWheel(0)
+      , m_senseWheelZoom(0.05f / 120.0f)
+      , m_senseZoom(0.001f)
+      , m_senseRotate((glm::pi<float>() * 0.5f) / 256.0f)
+      , m_sensePan(1.0f)
+      , m_sceneOrbit(0.0f)
+      , m_sceneDimension(1.0f)
+      , m_sceneOrtho(false)
+      , m_sceneOrthoZoom(1.0f)
+      , m_useOrbit(true)
+      , m_sceneUp(0, 1, 0)
+  {
+  }
+
+  inline void processActions(const glm::ivec2& window, const glm::vec2& mouse, int mouseButtonFlags, int wheel)
+  {
+    int changed       = m_lastButtonFlags ^ mouseButtonFlags;
+    m_lastButtonFlags = mouseButtonFlags;
+
+    int panFlag  = m_sceneOrtho ? 1 << 0 : 1 << 2;
+    int zoomFlag = 1 << 1;
+    int rotFlag  = m_sceneOrtho ? 1 << 2 : 1 << 0;
+
+
+    m_panning      = !!(mouseButtonFlags & panFlag);
+    m_zooming      = !!(mouseButtonFlags & zoomFlag);
+    m_rotating     = !!(mouseButtonFlags & rotFlag);
+    m_zoomingWheel = wheel != m_lastWheel;
+
+    m_startZoomWheel = m_lastWheel;
+    m_lastWheel      = wheel;
+
+    if(m_rotating)
+    {
+      m_panning = false;
+      m_zooming = false;
+    }
+
+    if(m_panning && (changed & panFlag))
+    {
+      // pan
+      m_startPan    = mouse;
+      m_startMatrix = m_viewMatrix;
+    }
+    if(m_zooming && (changed & zoomFlag))
+    {
+      // zoom
+      m_startMatrix    = m_viewMatrix;
+      m_startZoom      = mouse;
+      m_startZoomOrtho = m_sceneOrthoZoom;
+    }
+    if(m_rotating && (changed & rotFlag))
+    {
+      // rotate
+      m_startRotate = mouse;
+      m_startMatrix = m_viewMatrix;
+    }
+
+    if(m_zooming || m_zoomingWheel)
+    {
+      float dist = m_zooming ? -(glm::dot(mouse - m_startZoom, glm::vec2(-1, 1)) * m_sceneDimension * m_senseZoom) :
+                               (float(wheel - m_startZoomWheel) * m_sceneDimension * m_senseWheelZoom);
+
+      if(m_zoomingWheel)
+      {
+        m_startZoomOrtho = m_sceneOrthoZoom;
+        m_startMatrix    = m_viewMatrix;
+      }
+
+      if(m_sceneOrtho)
+      {
+        float newzoom = m_startZoomOrtho - (dist);
+        if(m_zoomingWheel)
+        {
+          if(newzoom < 0)
+          {
+            m_sceneOrthoZoom *= 0.5;
+          }
+          else if(m_sceneOrthoZoom < abs(dist))
+          {
+            m_sceneOrthoZoom *= 2.0;
+          }
+          else
+          {
+            m_sceneOrthoZoom = newzoom;
+          }
+        }
+        else
+        {
+          m_sceneOrthoZoom = newzoom;
+        }
+        m_sceneOrthoZoom = std::max(0.0001f, m_sceneOrthoZoom);
+      }
+      else
+      {
+        glm::mat4 delta = glm::translate(glm::mat4(1), glm::vec3(0, 0, dist * 2.0f));
+        m_viewMatrix    = delta * m_startMatrix;
+      }
+    }
+
+    if(m_panning)
+    {
+      float aspect = float(window.x) / float(window.y);
+
+      glm::vec3 winsize(window.x, window.y, 1.0f);
+      glm::vec3 ortho(m_sceneOrthoZoom * aspect, m_sceneOrthoZoom, 1.0f);
+      glm::vec3 sub(mouse - m_startPan, 0.0f);
+      sub /= winsize;
+      sub *= ortho;
+      sub.y *= -1.0;
+      if(!m_sceneOrtho)
+      {
+        sub *= m_sensePan * m_sceneDimension;
+      }
+
+      glm::mat4 delta = glm::translate(glm::mat4(1), sub);
+      m_viewMatrix    = delta * m_startMatrix;
+    }
+
+    if(m_rotating)
+    {
+      float aspect = float(window.x) / float(window.y);
+
+      glm::vec2 angles = (mouse - m_startRotate) * m_senseRotate;
+
+
+      if(m_useOrbit)
+      {
+        glm::mat4 rot    = glm::yawPitchRoll(angles.x, angles.y, 0.0f);
+        glm::vec3 center = glm::vec3(m_startMatrix * glm::vec4(m_sceneOrbit, 1.0f));
+        glm::mat4 delta  = glm::translate(glm::mat4(1), center) * rot * glm::translate(glm::mat4(1), -center);
+
+        m_viewMatrix = delta * m_startMatrix;
+      }
+      else
+      {
+        // FIXME use sceneUP
+        glm::mat4 rot = glm::yawPitchRoll(angles.x, angles.y, 0.0f);
+
+        m_viewMatrix = rot * m_startMatrix;
+      }
+    }
+  }
+
+  bool  m_useOrbit;
+  bool  m_sceneOrtho;
+  float m_sceneOrthoZoom;
+  float m_sceneDimension;
+
+  glm::vec3 m_sceneUp;
+  glm::vec3 m_sceneOrbit;
+  glm::mat4 m_viewMatrix;
+
+  float m_senseWheelZoom;
+  float m_senseZoom;
+  float m_senseRotate;
+  float m_sensePan;
+
+private:
+  bool m_zooming;
+  bool m_zoomingWheel;
+  bool m_panning;
+  bool m_rotating;
+
+  glm::vec2 m_startPan;
+  glm::vec2 m_startZoom;
+  glm::vec2 m_startRotate;
+  glm::mat4 m_startMatrix;
+  int       m_startZoomWheel;
+  float     m_startZoomOrtho;
+
+  int m_lastButtonFlags;
+  int m_lastWheel;
+};
+}  // namespace nvh
+
+#endif
--- a/raytracer/nvpro_core/nvh/camerainertia.hpp
+++ b/raytracer/nvpro_core/nvh/camerainertia.hpp
@ -0,0 +1,237 @@
+/*
+ * Copyright (c) 2013-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2013-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+//--------------------------------------------------------------------
+#pragma once
+
+#include <nvh/nvprint.hpp>
+#include <glm/glm.hpp>
+
+#include <cmath>
+#include "glm/gtc/matrix_transform.hpp"
+
+/* @DOC_START
+# struct InertiaCamera
+>  Struct that offers a camera moving with some inertia effect around a target point
+
+InertiaCamera exposes a mix of pseudo polar rotation around a target point and
+some other movements to translate the target point, zoom in and out.
+
+Either the keyboard or mouse can be used for all of the moves.
+@DOC_END */
+struct InertiaCamera
+{
+  glm::vec3 curEyePos, curFocusPos, curObjectPos;  ///< Current position of the motion
+  glm::vec3 eyePos, focusPos, objectPos;           ///< expected posiions to reach
+  float     tau;                                   ///< acceleration factor in the motion function
+  float     epsilon;
+  float     eyeD;
+  float     focusD;
+  float     objectD;
+  glm::mat4 m4_view;  ///< transformation matrix resulting from the computation
+  //------------------------------------------------------------------------------
+  //
+  //------------------------------------------------------------------------------
+  InertiaCamera(const glm::vec3 eye    = glm::vec3(0.0f, 1.0f, -3.0f),
+                const glm::vec3 focus  = glm::vec3(0, 0, 0),
+                const glm::vec3 object = glm::vec3(0, 0, 0))
+  {
+    epsilon          = 0.001f;
+    tau              = 0.2f;
+    curEyePos        = eye;
+    eyePos           = eye;
+    curFocusPos      = focus;
+    focusPos         = focus;
+    curObjectPos     = object;
+    objectPos        = object;
+    eyeD             = 0.0f;
+    focusD           = 0.0f;
+    objectD          = 0.0f;
+    m4_view          = glm::mat4(1);
+    glm::mat4 Lookat = glm::lookAt(curEyePos, curFocusPos, glm::vec3(0, 1, 0));
+    m4_view *= Lookat;
+  }
+  //------------------------------------------------------------------------------
+  //
+  //------------------------------------------------------------------------------
+  void rotateH(float s, bool bPan = false)
+  {
+    glm::vec3 p  = eyePos;
+    glm::vec3 o  = focusPos;
+    glm::vec3 po = p - o;
+    float     l  = glm::length(po);
+    glm::vec3 dv = glm::cross(po, glm::vec3(0, 1, 0));
+    dv *= s;
+    p += dv;
+    po       = p - o;
+    float l2 = glm::length(po);
+    l        = l2 - l;
+    p -= (l / l2) * (po);
+    eyePos = p;
+    if(bPan)
+      focusPos += dv;
+  }
+  //------------------------------------------------------------------------------
+  //
+  //------------------------------------------------------------------------------
+  void rotateV(float s, bool bPan = false)
+  {
+    glm::vec3 p   = eyePos;
+    glm::vec3 o   = focusPos;
+    glm::vec3 po  = p - o;
+    float     l   = glm::length(po);
+    glm::vec3 dv  = glm::cross(po, glm::vec3(0, -1, 0));
+    dv            = glm::normalize(dv);
+    glm::vec3 dv2 = glm::cross(po, dv);
+    dv2 *= s;
+    p += dv2;
+    po       = p - o;
+    float l2 = glm::length(po);
+
+    if(bPan)
+      focusPos += dv2;
+
+    // protect against gimbal lock
+    if(std::fabs(dot(po / l2, glm::vec3(0, 1, 0))) > 0.99)
+      return;
+
+    l = l2 - l;
+    p -= (l / l2) * (po);
+    eyePos = p;
+  }
+  //------------------------------------------------------------------------------
+  //
+  //------------------------------------------------------------------------------
+  void move(float s, bool bPan)
+  {
+    glm::vec3 p  = eyePos;
+    glm::vec3 o  = focusPos;
+    glm::vec3 po = p - o;
+    po *= s;
+    p -= po;
+    if(bPan)
+      focusPos -= po;
+    eyePos = p;
+  }
+  //------------------------------------------------------------------------------------
+  /// >  simulation step to call with a proper time interval to update the animation
+  //------------------------------------------------------------------------------------
+  bool update(float dt)
+  {
+    if(dt > (1.0f / 60.0f))
+      dt = (1.0f / 60.0f);
+    bool             bContinue = false;
+    static glm::vec3 eyeVel    = glm::vec3(0, 0, 0);
+    static glm::vec3 eyeAcc    = glm::vec3(0, 0, 0);
+    eyeD                       = glm::length(curEyePos - eyePos);
+    if(eyeD > epsilon)
+    {
+      bContinue    = true;
+      glm::vec3 dV = curEyePos - eyePos;
+      eyeAcc       = (-2.0f / tau) * eyeVel - dV / (tau * tau);
+      // integrate
+      eyeVel += eyeAcc * glm::vec3(dt, dt, dt);
+      curEyePos += eyeVel * glm::vec3(dt, dt, dt);
+    }
+    else
+    {
+      eyeVel = glm::vec3(0, 0, 0);
+      eyeAcc = glm::vec3(0, 0, 0);
+    }
+
+    static glm::vec3 focusVel = glm::vec3(0, 0, 0);
+    static glm::vec3 focusAcc = glm::vec3(0, 0, 0);
+    focusD                    = glm::length(curFocusPos - focusPos);
+    if(focusD > epsilon)
+    {
+      bContinue    = true;
+      glm::vec3 dV = curFocusPos - focusPos;
+      focusAcc     = (-2.0f / tau) * focusVel - dV / (tau * tau);
+      // integrate
+      focusVel += focusAcc * glm::vec3(dt, dt, dt);
+      curFocusPos += focusVel * glm::vec3(dt, dt, dt);
+    }
+    else
+    {
+      focusVel = glm::vec3(0, 0, 0);
+      focusAcc = glm::vec3(0, 0, 0);
+    }
+
+    static glm::vec3 objectVel = glm::vec3(0, 0, 0);
+    static glm::vec3 objectAcc = glm::vec3(0, 0, 0);
+    objectD                    = glm::length(curObjectPos - objectPos);
+    if(objectD > epsilon)
+    {
+      bContinue    = true;
+      glm::vec3 dV = curObjectPos - objectPos;
+      objectAcc    = (-2.0f / tau) * objectVel - dV / (tau * tau);
+      // integrate
+      objectVel += objectAcc * glm::vec3(dt, dt, dt);
+      curObjectPos += objectVel * glm::vec3(dt, dt, dt);
+    }
+    else
+    {
+      objectVel = glm::vec3(0, 0, 0);
+      objectAcc = glm::vec3(0, 0, 0);
+    }
+    //
+    // Camera View matrix
+    //
+    glm::vec3 up(0, 1, 0);
+    m4_view          = glm::mat4(1);
+    glm::mat4 Lookat = glm::lookAt(curEyePos, curFocusPos, up);
+    m4_view *= Lookat;
+    return bContinue;
+  }
+  //------------------------------------------------------------------------------
+  /// >  Call this function to update the camera position and targets position
+  /// \arg *reset* set to true will directly update the actual positions without
+  /// performing the animation for transitioning.
+  //------------------------------------------------------------------------------
+  void look_at(const glm::vec3& eye, const glm::vec3& center /*, const glm::vec3& up*/, bool reset = false)
+  {
+    eyePos   = eye;
+    focusPos = center;
+    if(reset)
+    {
+      curEyePos   = eye;
+      curFocusPos = center;
+      glm::vec3 up(0, 1, 0);
+      m4_view          = glm::mat4(1);
+      glm::mat4 Lookat = glm::lookAt(curEyePos, curFocusPos, up);
+      m4_view *= Lookat;
+    }
+  }
+  //------------------------------------------------------------------------------
+  /// >  debug information of camera position and target position
+  /// Particularily useful to record a bunch of positions that can later be
+  /// reuses as "recorded" presets
+  //------------------------------------------------------------------------------
+  void print_look_at(bool cppLike = false)
+  {
+    if(cppLike)
+    {
+      LOGI("{glm::vec3(%.2f, %.2f, %.2f), glm::vec3(%.2f, %.2f, %.2f)},\n", eyePos.x, eyePos.y, eyePos.z, focusPos.x,
+           focusPos.y, focusPos.z);
+    }
+    else
+    {
+      LOGI("%.2f %.2f %.2f %.2f %.2f %.2f 0.0\n", eyePos.x, eyePos.y, eyePos.z, focusPos.x, focusPos.y, focusPos.z);
+    }
+  }
+};
--- a/raytracer/nvpro_core/nvh/cameramanipulator.cpp
+++ b/raytracer/nvpro_core/nvh/cameramanipulator.cpp
@ -0,0 +1,564 @@
+/*
+ * Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2018-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+//--------------------------------------------------------------------
+
+#include "cameramanipulator.hpp"
+#include <chrono>
+#include <iostream>
+#include <nvpwindow.hpp>
+
+namespace nvh {
+
+//--------------------------------------------------------------------------------------------------
+//
+//
+CameraManipulator::CameraManipulator()
+{
+  update();
+}
+
+//--------------------------------------------------------------------------------------------------
+// Set the new camera as a goal
+//
+void CameraManipulator::setCamera(Camera camera, bool instantSet /*=true*/)
+{
+  m_anim_done = true;
+
+  if(instantSet)
+  {
+    m_current = camera;
+    update();
+  }
+  else if(camera != m_current)
+  {
+    m_goal       = camera;
+    m_snapshot   = m_current;
+    m_anim_done  = false;
+    m_start_time = getSystemTime();
+    findBezierPoints();
+  }
+}
+
+//--------------------------------------------------------------------------------------------------
+// Creates a viewing matrix derived from an eye point, a reference point indicating the center of
+// the scene, and an up vector
+//
+void CameraManipulator::setLookat(const glm::vec3& eye, const glm::vec3& center, const glm::vec3& up, bool instantSet)
+{
+  Camera camera{eye, center, up, m_current.fov};
+  setCamera(camera, instantSet);
+}
+
+//-----------------------------------------------------------------------------
+// Get the current camera's look-at parameters.
+void CameraManipulator::getLookat(glm::vec3& eye, glm::vec3& center, glm::vec3& up) const
+{
+  eye    = m_current.eye;
+  center = m_current.ctr;
+  up     = m_current.up;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Pan the camera perpendicularly to the light of sight.
+//
+void CameraManipulator::pan(float dx, float dy)
+{
+  if(m_mode == Fly)
+  {
+    dx *= -1;
+    dy *= -1;
+  }
+
+  glm::vec3 z(m_current.eye - m_current.ctr);
+  float     length = static_cast<float>(glm::length(z)) / 0.785f;  // 45 degrees
+  z                = glm::normalize(z);
+  glm::vec3 x      = glm::cross(m_current.up, z);
+  glm::vec3 y      = glm::cross(z, x);
+  x                = glm::normalize(x);
+  y                = glm::normalize(y);
+
+  glm::vec3 panVector = (-dx * x + dy * y) * length;
+  m_current.eye += panVector;
+  m_current.ctr += panVector;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Orbit the camera around the center of interest. If 'invert' is true,
+// then the camera stays in place and the interest orbit around the camera.
+//
+void CameraManipulator::orbit(float dx, float dy, bool invert)
+{
+  if(dx == 0 && dy == 0)
+    return;
+
+  // Full width will do a full turn
+  dx *= glm::two_pi<float>();
+  dy *= glm::two_pi<float>();
+
+  // Get the camera
+  glm::vec3 origin(invert ? m_current.eye : m_current.ctr);
+  glm::vec3 position(invert ? m_current.ctr : m_current.eye);
+
+  // Get the length of sight
+  glm::vec3 centerToEye(position - origin);
+  float     radius = glm::length(centerToEye);
+  centerToEye      = glm::normalize(centerToEye);
+  glm::vec3 axe_z  = centerToEye;
+
+  // Find the rotation around the UP axis (Y)
+  glm::mat4 rot_y = glm::rotate(glm::mat4(1), -dx, m_current.up);
+
+  // Apply the (Y) rotation to the eye-center vector
+  centerToEye = rot_y * glm::vec4(centerToEye, 0);
+
+  // Find the rotation around the X vector: cross between eye-center and up (X)
+  glm::vec3 axe_x = glm::normalize(glm::cross(m_current.up, axe_z));
+  glm::mat4 rot_x = glm::rotate(glm::mat4(1), -dy, axe_x);
+
+  // Apply the (X) rotation to the eye-center vector
+  glm::vec3 vect_rot = rot_x * glm::vec4(centerToEye, 0);
+
+  if(glm::sign(vect_rot.x) == glm::sign(centerToEye.x))
+    centerToEye = vect_rot;
+
+  // Make the vector as long as it was originally
+  centerToEye *= radius;
+
+  // Finding the new position
+  glm::vec3 newPosition = centerToEye + origin;
+
+  if(!invert)
+  {
+    m_current.eye = newPosition;  // Normal: change the position of the camera
+  }
+  else
+  {
+    m_current.ctr = newPosition;  // Inverted: change the interest point
+  }
+}
+
+//--------------------------------------------------------------------------------------------------
+// Move the camera toward the interest point, but don't cross it
+//
+void CameraManipulator::dolly(float dx, float dy)
+{
+  glm::vec3 z      = m_current.ctr - m_current.eye;
+  float     length = static_cast<float>(glm::length(z));
+
+  // We are at the point of interest, and don't know any direction, so do nothing!
+  if(length < 0.000001f)
+    return;
+
+  // Use the larger movement.
+  float dd;
+  if(m_mode != Examine)
+    dd = -dy;
+  else
+    dd = fabs(dx) > fabs(dy) ? dx : -dy;
+  float factor = m_speed * dd;
+
+  // Adjust speed based on distance.
+  if(m_mode == Examine)
+  {
+    // Don't move over the point of interest.
+    if(factor >= 1.0f)
+      return;
+
+    z *= factor;
+  }
+  else
+  {
+    // Normalize the Z vector and make it faster
+    z *= factor / length * 10.0f;
+  }
+
+  // Not going up
+  if(m_mode == Walk)
+  {
+    if(m_current.up.y > m_current.up.z)
+      z.y = 0;
+    else
+      z.z = 0;
+  }
+
+  m_current.eye += z;
+
+  // In fly mode, the interest moves with us.
+  if(m_mode != Examine)
+    m_current.ctr += z;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Modify the position of the camera over time
+// - The camera can be updated through keys. A key set a direction which is added to both
+//   eye and center, until the key is released
+// - A new position of the camera is defined and the camera will reach that position
+//   over time.
+void CameraManipulator::updateAnim()
+{
+  auto elapse = static_cast<float>(getSystemTime() - m_start_time) / 1000.f;
+
+  // Key animation
+  if(m_key_vec != glm::vec3(0, 0, 0))
+  {
+    m_current.eye += m_key_vec * elapse;
+    m_current.ctr += m_key_vec * elapse;
+    update();
+    m_start_time = getSystemTime();
+    return;
+  }
+
+  // Camera moving to new position
+  if(m_anim_done)
+    return;
+
+  float t = std::min(elapse / float(m_duration), 1.0f);
+  // Evaluate polynomial (smoother step from Perlin)
+  t = t * t * t * (t * (t * 6.0f - 15.0f) + 10.0f);
+  if(t >= 1.0f)
+  {
+    m_current   = m_goal;
+    m_anim_done = true;
+    return;
+  }
+
+  // Interpolate camera position and interest
+  // The distance of the camera between the interest is preserved to
+  // create a nicer interpolation
+  m_current.ctr = glm::mix(m_snapshot.ctr, m_goal.ctr, t);
+  m_current.up  = glm::mix(m_snapshot.up, m_goal.up, t);
+  m_current.eye = computeBezier(t, m_bezier[0], m_bezier[1], m_bezier[2]);
+  m_current.fov = glm::mix(m_snapshot.fov, m_goal.fov, t);
+
+  update();
+}
+
+//--------------------------------------------------------------------------------------------------
+//
+void CameraManipulator::setMatrix(const glm::mat4& matrix, bool instantSet, float centerDistance)
+{
+  Camera camera;
+  camera.eye = matrix[3];
+
+  auto rotMat = glm::mat3(matrix);
+  camera.ctr  = {0, 0, -centerDistance};
+  camera.ctr  = camera.eye + (rotMat * camera.ctr);
+  camera.up   = {0, 1, 0};
+  camera.fov  = m_current.fov;
+
+  m_anim_done = instantSet;
+
+  if(instantSet)
+  {
+    m_current = camera;
+  }
+  else
+  {
+    m_goal       = camera;
+    m_snapshot   = m_current;
+    m_start_time = getSystemTime();
+    findBezierPoints();
+  }
+  update();
+}
+
+//--------------------------------------------------------------------------------------------------
+//
+//
+void CameraManipulator::setMousePosition(int x, int y)
+{
+  m_mouse = glm::vec2(x, y);
+}
+
+//--------------------------------------------------------------------------------------------------
+//
+//
+void CameraManipulator::getMousePosition(int& x, int& y)
+{
+  x = static_cast<int>(m_mouse.x);
+  y = static_cast<int>(m_mouse.y);
+}
+
+//--------------------------------------------------------------------------------------------------
+//
+//
+void CameraManipulator::setWindowSize(int w, int h)
+{
+  m_width  = w;
+  m_height = h;
+}
+
+//--------------------------------------------------------------------------------------------------
+//
+// Low level function for when the camera move.
+//
+void CameraManipulator::motion(int x, int y, int action)
+{
+  float dx = float(x - m_mouse[0]) / float(m_width);
+  float dy = float(y - m_mouse[1]) / float(m_height);
+
+  switch(action)
+  {
+    case Orbit:
+      orbit(dx, dy, false);
+      break;
+    case CameraManipulator::Dolly:
+      dolly(dx, dy);
+      break;
+    case CameraManipulator::Pan:
+      pan(dx, dy);
+      break;
+    case CameraManipulator::LookAround:
+      orbit(dx, -dy, true);
+      break;
+  }
+
+  // Resetting animation
+  m_anim_done = true;
+
+  update();
+
+  m_mouse[0] = static_cast<float>(x);
+  m_mouse[1] = static_cast<float>(y);
+}
+
+//
+// Function for when the camera move with keys (ex. WASD).
+//
+void CameraManipulator::keyMotion(float dx, float dy, int action)
+{
+  if(action == NoAction)
+  {
+    m_key_vec = {0, 0, 0};
+    return;
+  }
+
+  auto d = glm::normalize(m_current.ctr - m_current.eye);
+  dx *= m_speed * 2.f;
+  dy *= m_speed * 2.f;
+
+  glm::vec3 key_vec;
+  if(action == Dolly)
+  {
+    key_vec = d * dx;
+    if(m_mode == Walk)
+    {
+      if(m_current.up.y > m_current.up.z)
+        key_vec.y = 0;
+      else
+        key_vec.z = 0;
+    }
+  }
+  else if(action == Pan)
+  {
+    auto r  = glm::cross(d, m_current.up);
+    key_vec = r * dx + m_current.up * dy;
+  }
+
+  m_key_vec += key_vec;
+
+  // Resetting animation
+  m_start_time = getSystemTime();
+}
+
+//--------------------------------------------------------------------------------------------------
+// To call when the mouse is moving
+// It find the appropriate camera operator, based on the mouse button pressed and the
+// keyboard modifiers (shift, ctrl, alt)
+//
+// Returns the action that was activated
+//
+CameraManipulator::Actions CameraManipulator::mouseMove(int x, int y, const Inputs& inputs)
+{
+  if(!inputs.lmb && !inputs.rmb && !inputs.mmb)
+  {
+    setMousePosition(x, y);
+    return NoAction;  // no mouse button pressed
+  }
+
+  Actions curAction = NoAction;
+  if(inputs.lmb)
+  {
+    if(((inputs.ctrl) && (inputs.shift)) || inputs.alt)
+      curAction = m_mode == Examine ? LookAround : Orbit;
+    else if(inputs.shift)
+      curAction = Dolly;
+    else if(inputs.ctrl)
+      curAction = Pan;
+    else
+      curAction = m_mode == Examine ? Orbit : LookAround;
+  }
+  else if(inputs.mmb)
+    curAction = Pan;
+  else if(inputs.rmb)
+    curAction = Dolly;
+
+  if(curAction != NoAction)
+    motion(x, y, curAction);
+
+  return curAction;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Trigger a dolly when the wheel change, or change the FOV if the shift key was pressed
+//
+void CameraManipulator::wheel(int value, const Inputs& inputs)
+{
+  float fval(static_cast<float>(value));
+  float dx = (fval * fabsf(fval)) / static_cast<float>(m_width);
+
+  if(inputs.shift)
+  {
+    setFov(m_current.fov + fval);
+  }
+  else
+  {
+    dolly(dx * m_speed, dx * m_speed);
+    update();
+  }
+}
+
+// Set and clamp FOV between 0.01 and 179 degrees
+void CameraManipulator::setFov(float _fov)
+{
+  m_current.fov = std::min(std::max(_fov, 0.01f), 179.0f);
+}
+
+glm::vec3 CameraManipulator::computeBezier(float t, glm::vec3& p0, glm::vec3& p1, glm::vec3& p2)
+{
+  float u  = 1.f - t;
+  float tt = t * t;
+  float uu = u * u;
+
+  glm::vec3 p = uu * p0;  // first term
+  p += 2 * u * t * p1;    // second term
+  p += tt * p2;           // third term
+
+  return p;
+}
+
+void CameraManipulator::findBezierPoints()
+{
+  glm::vec3 p0 = m_current.eye;
+  glm::vec3 p2 = m_goal.eye;
+  glm::vec3 p1, pc;
+
+  // point of interest
+  glm::vec3 pi = (m_goal.ctr + m_current.ctr) * 0.5f;
+
+  glm::vec3 p02    = (p0 + p2) * 0.5f;                            // mid p0-p2
+  float     radius = (length(p0 - pi) + length(p2 - pi)) * 0.5f;  // Radius for p1
+  glm::vec3 p02pi(p02 - pi);                                      // Vector from interest to mid point
+  p02pi = glm::normalize(p02pi);
+  p02pi *= radius;
+  pc   = pi + p02pi;                        // Calculated point to go through
+  p1   = 2.f * pc - p0 * 0.5f - p2 * 0.5f;  // Computing p1 for t=0.5
+  p1.y = p02.y;                             // Clamping the P1 to be in the same height as p0-p2
+
+  m_bezier[0] = p0;
+  m_bezier[1] = p1;
+  m_bezier[2] = p2;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Return the time in fraction of milliseconds
+//
+double CameraManipulator::getSystemTime()
+{
+  auto now(std::chrono::system_clock::now());
+  auto duration = now.time_since_epoch();
+  return std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000.0;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Return a string which can be included in help dialogs
+//
+const std::string& CameraManipulator::getHelp()
+{
+  static std::string helpText =
+      "LMB: rotate around the target\n"
+      "RMB: Dolly in/out\n"
+      "MMB: Pan along view plane\n"
+      "LMB + Shift: Dolly in/out\n"
+      "LMB + Ctrl: Pan\n"
+      "LMB + Alt: Look aroundPan\n"
+      "Mouse wheel: Dolly in/out\n"
+      "Mouse wheel + Shift: Zoom in/out\n";
+  return helpText;
+}
+
+//--------------------------------------------------------------------------------------------------
+// Move the camera closer or further from the center of the the bounding box, to see it completely
+//
+// boxMin - lower corner of the bounding box
+// boxMax - upper corner of the bounding box
+// instantFit - true: set the new position, false: will animate to new position.
+// tight - true: fit exactly the corner, false: fit to radius (larger view, will not get closer or further away)
+// aspect - aspect ratio of the window.
+//
+void CameraManipulator::fit(const glm::vec3& boxMin, const glm::vec3& boxMax, bool instantFit /*= true*/, bool tightFit /*=false*/, float aspect /*=1.0f*/)
+{
+  // Calculate the half extents of the bounding box
+  const glm::vec3 boxHalfSize = 0.5f * (boxMax - boxMin);
+
+  // Calculate the center of the bounding box
+  const glm::vec3 boxCenter = 0.5f * (boxMin + boxMax);
+
+  const float yfov = tan(glm::radians(m_current.fov * 0.5f));
+  const float xfov = yfov * aspect;
+
+  // Calculate the ideal distance for a tight fit or fit to radius
+  float idealDistance = 0;
+
+  if(tightFit)
+  {
+    // Get only the rotation matrix
+    glm::mat3 mView = glm::lookAt(m_current.eye, boxCenter, m_current.up);
+
+    // Check each 8 corner of the cube
+    for(int i = 0; i < 8; i++)
+    {
+      // Rotate the bounding box in the camera view
+      glm::vec3 vct(i & 1 ? boxHalfSize.x : -boxHalfSize.x,   //
+                    i & 2 ? boxHalfSize.y : -boxHalfSize.y,   //
+                    i & 4 ? boxHalfSize.z : -boxHalfSize.z);  //
+      vct = mView * vct;
+
+      if(vct.z < 0)  // Take only points in front of the center
+      {
+        // Keep the largest offset to see that vertex
+        idealDistance = std::max(fabs(vct.y) / yfov + fabs(vct.z), idealDistance);
+        idealDistance = std::max(fabs(vct.x) / xfov + fabs(vct.z), idealDistance);
+      }
+    }
+  }
+  else  // Using the bounding sphere
+  {
+    const float radius = glm::length(boxHalfSize);
+    idealDistance      = std::max(radius / xfov, radius / yfov);
+  }
+
+  // Calculate the new camera position based on the ideal distance
+  const glm::vec3 newEye = boxCenter - idealDistance * glm::normalize(boxCenter - m_current.eye);
+
+  // Set the new camera position and interest point
+  setLookat(newEye, boxCenter, m_current.up, instantFit);
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/cameramanipulator.hpp
+++ b/raytracer/nvpro_core/nvh/cameramanipulator.hpp
@ -0,0 +1,252 @@
+/*
+ * Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2018-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+//--------------------------------------------------------------------
+
+#pragma once
+
+#include <array>
+#include <glm/glm.hpp>
+#include <glm/gtc/matrix_transform.hpp>
+#include <string>
+
+namespace nvh {
+/** @DOC_START
+  # class nvh::CameraManipulator
+
+  nvh::CameraManipulator is a camera manipulator help class
+  It allow to simply do
+  - Orbit        (LMB)
+  - Pan          (LMB + CTRL  | MMB)
+  - Dolly        (LMB + SHIFT | RMB)
+  - Look Around  (LMB + ALT   | LMB + CTRL + SHIFT)
+
+  In a various ways:
+  - examiner(orbit around object)
+  - walk (look up or down but stays on a plane)
+  - fly ( go toward the interest point)
+
+  Do use the camera manipulator, you need to do the following
+  - Call setWindowSize() at creation of the application and when the window size change
+  - Call setLookat() at creation to initialize the camera look position
+  - Call setMousePosition() on application mouse down
+  - Call mouseMove() on application mouse move
+
+  Retrieve the camera matrix by calling getMatrix()
+
+  See: appbase_vkpp.hpp
+
+  Note: There is a singleton `CameraManip` which can be use across the entire application
+
+  ```cpp
+  // Retrieve/set camera information
+  CameraManip.getLookat(eye, center, up);
+  CameraManip.setLookat(eye, center, glm::vec3(m_upVector == 0, m_upVector == 1, m_upVector == 2));
+  CameraManip.getFov();
+  CameraManip.setSpeed(navSpeed);
+  CameraManip.setMode(navMode == 0 ? nvh::CameraManipulator::Examine : nvh::CameraManipulator::Fly);
+  // On mouse down, keep mouse coordinates
+  CameraManip.setMousePosition(x, y);
+  // On mouse move and mouse button down
+  if(m_inputs.lmb || m_inputs.rmb || m_inputs.mmb)
+  {
+  CameraManip.mouseMove(x, y, m_inputs);
+  }
+  // Wheel changes the FOV
+  CameraManip.wheel(delta > 0 ? 1 : -1, m_inputs);
+  // Retrieve the matrix to push to the shader
+  m_ubo.view = CameraManip.getMatrix();	
+  ````
+
+@DOC_END */
+
+class CameraManipulator
+{
+public:
+  // clang-format off
+    enum Modes { Examine, Fly, Walk};
+    enum Actions { NoAction, Orbit, Dolly, Pan, LookAround };
+    struct Inputs {bool lmb=false; bool mmb=false; bool rmb=false; 
+                   bool shift=false; bool ctrl=false; bool alt=false;};
+  // clang-format on
+
+  struct Camera
+  {
+    glm::vec3 eye = glm::vec3(10, 10, 10);
+    glm::vec3 ctr = glm::vec3(0, 0, 0);
+    glm::vec3 up  = glm::vec3(0, 1, 0);
+    float     fov = 60.0f;
+
+    bool operator!=(const Camera& rhr) const
+    {
+      return (eye != rhr.eye) || (ctr != rhr.ctr) || (up != rhr.up) || (fov != rhr.fov);
+    }
+    bool operator==(const Camera& rhr) const
+    {
+      return (eye == rhr.eye) && (ctr == rhr.ctr) && (up == rhr.up) && (fov == rhr.fov);
+    }
+  };
+
+public:
+  // Main function to call from the application
+  // On application mouse move, call this function with the current mouse position, mouse
+  // button presses and keyboard modifier. The camera matrix will be updated and
+  // can be retrieved calling getMatrix
+  Actions mouseMove(int x, int y, const Inputs& inputs);
+
+  // Set the camera to look at the interest point
+  // instantSet = true will not interpolate to the new position
+  void setLookat(const glm::vec3& eye, const glm::vec3& center, const glm::vec3& up, bool instantSet = true);
+
+  // This should be called in an application loop to update the camera matrix if this one is animated: new position, key movement
+  void updateAnim();
+
+  // To call when the size of the window change.  This allows to do nicer movement according to the window size.
+  void setWindowSize(int w, int h);
+
+  // Setting the current mouse position, to call on mouse button down. Allow to compute properly the deltas
+  void setMousePosition(int x, int y);
+
+  Camera getCamera() const { return m_current; }
+  void   setCamera(Camera camera, bool instantSet = true);
+
+  // Retrieve the position, interest and up vector of the camera
+  void      getLookat(glm::vec3& eye, glm::vec3& center, glm::vec3& up) const;
+  glm::vec3 getEye() const { return m_current.eye; }
+  glm::vec3 getCenter() const { return m_current.ctr; }
+  glm::vec3 getUp() const { return m_current.up; }
+
+  // Set the manipulator mode, from Examiner, to walk, to fly, ...
+  void setMode(Modes mode) { m_mode = mode; }
+
+  // Retrieve the current manipulator mode
+  Modes getMode() const { return m_mode; }
+
+  // Retrieving the transformation matrix of the camera
+  const glm::mat4& getMatrix() const { return m_matrix; }
+
+  // Set the position, interest from the matrix.
+  // instantSet = true will not interpolate to the new position
+  // centerDistance is the distance of the center from the eye
+  void setMatrix(const glm::mat4& mat_, bool instantSet = true, float centerDistance = 1.f);
+
+  // Changing the default speed movement
+  void setSpeed(float speed) { m_speed = speed; }
+
+  // Retrieving the current speed
+  float getSpeed() { return m_speed; }
+
+  // Retrieving the last mouse position
+  void getMousePosition(int& x, int& y);
+
+  // Main function which is called to apply a camera motion.
+  // It is preferable to
+  void motion(int x, int y, int action = 0);
+
+  void keyMotion(float dx, float dy, int action);
+
+  // To call when the mouse wheel change
+  void wheel(int value, const Inputs& inputs);
+
+  // Retrieve the screen dimension
+  int   getWidth() const { return m_width; }
+  int   getHeight() const { return m_height; }
+  float getAspectRatio() const { return static_cast<float>(m_width) / static_cast<float>(m_height); }
+
+  // Field of view in degrees
+  void  setFov(float _fov);
+  float getFov() { return m_current.fov; }
+
+  // Clip planes
+  void             setClipPlanes(glm::vec2 clip) { m_clipPlanes = clip; }
+  const glm::vec2& getClipPlanes() const { return m_clipPlanes; }
+
+  // Animation duration
+  double getAnimationDuration() const { return m_duration; }
+  void   setAnimationDuration(double val) { m_duration = val; }
+  bool   isAnimated() { return m_anim_done == false; }
+
+  // Returning a default help string
+  const std::string& getHelp();
+
+  // Fitting the camera position and interest to see the bounding box
+  void fit(const glm::vec3& boxMin, const glm::vec3& boxMax, bool instantFit = true, bool tight = false, float aspect = 1.0f);
+
+protected:
+  CameraManipulator();
+
+private:
+  // Update the internal matrix.
+  void update() { m_matrix = glm::lookAt(m_current.eye, m_current.ctr, m_current.up); }
+
+  // Do panning: movement parallels to the screen
+  void pan(float dx, float dy);
+  // Do orbiting: rotation around the center of interest. If invert, the interest orbit around the camera position
+  void orbit(float dx, float dy, bool invert = false);
+  // Do dolly: movement toward the interest.
+  void dolly(float dx, float dy);
+
+
+  double getSystemTime();
+
+  glm::vec3 computeBezier(float t, glm::vec3& p0, glm::vec3& p1, glm::vec3& p2);
+  void      findBezierPoints();
+
+protected:
+  glm::mat4 m_matrix = glm::mat4(1);
+
+  Camera m_current;   // Current camera position
+  Camera m_goal;      // Wish camera position
+  Camera m_snapshot;  // Current camera the moment a set look-at is done
+
+  // Animation
+  std::array<glm::vec3, 3> m_bezier;
+  double                   m_start_time = 0;
+  double                   m_duration   = 0.5;
+  bool                     m_anim_done{true};
+  glm::vec3                m_key_vec{0, 0, 0};
+
+  // Screen
+  int m_width  = 1;
+  int m_height = 1;
+
+  // Other
+  float     m_speed      = 3.f;
+  glm::vec2 m_mouse      = glm::vec2(0.f, 0.f);
+  glm::vec2 m_clipPlanes = glm::vec2(0.001f, 100000000.f);
+
+  bool  m_button = false;  // Button pressed
+  bool  m_moving = false;  // Mouse is moving
+  float m_tbsize = 0.8f;   // Trackball size;
+
+  Modes m_mode = Examine;
+
+public:
+  // Factory.
+  static CameraManipulator& Singleton()
+  {
+    static CameraManipulator manipulator;
+    return manipulator;
+  }
+};
+
+// Global Manipulator
+
+}  // namespace nvh
+
+#define CameraManip nvh::CameraManipulator::Singleton()
--- a/raytracer/nvpro_core/nvh/commandlineparser.hpp
+++ b/raytracer/nvpro_core/nvh/commandlineparser.hpp
@ -0,0 +1,232 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+#include <iostream>
+#include <string>
+#include <variant>
+#include <vector>
+#include <algorithm>
+#include <iomanip>
+#include <sstream>
+
+#include "nvprint.hpp"
+
+
+static constexpr int MAX_LINE_WIDTH = 60;
+
+namespace nvh {
+/* @DOC_START
+Command line parser.
+```cpp
+ std::string inFilename = "";
+ bool printHelp = false;
+ CommandLineParser args("Test Parser");
+ args.addArgument({"-f", "--filename"}, &inFilename, "Input filename");
+ args.addArgument({"-h", "--help"}, &printHelp, "Print Help");
+ bool result = args.parse(argc, argv);
+```
+@DOC_END */
+class CommandLineParser
+{
+public:
+  // These are the possible variables the options may point to. Bool and
+  // std::string are handled in a special way, all other values are parsed
+  // with a std::stringstream. This std::variant can be easily extended if
+  // the stream operator>> is overloaded. If not, you have to add a special
+  // case to the parse() method.
+  using Value = std::variant<int32_t*, uint32_t*, double*, float*, bool*, std::string*>;
+
+  // The description is printed as part of the help message.
+  CommandLineParser(const std::string& description)
+      : m_description(description)
+  {
+  }
+
+  void addArgument(std::vector<std::string> const& flags, Value const& value, std::string const& help)
+  {
+    m_arguments.emplace_back(Argument{flags, value, help});
+  }
+
+  // Prints the description given to the constructor and the help for each option.
+  void printHelp(std::ostream& os = std::cout) const
+  {
+    // Print the general description.
+    os << m_description << std::endl;
+
+    // Find the argument with the longest combined flag length (in order to align the help messages).
+    uint32_t maxFlagLength = 0;
+    for(auto const& argument : m_arguments)
+    {
+      uint32_t flagLength = 0;
+      for(auto const& flag : argument.m_flags)
+      {
+        // Plus comma and space.
+        flagLength += static_cast<uint32_t>(flag.size()) + 2;
+      }
+
+      maxFlagLength = std::max(maxFlagLength, flagLength);
+    }
+
+    // Now print each argument.
+    for(auto const& argument : m_arguments)
+    {
+      std::string flags;
+      for(auto const& flag : argument.m_flags)
+      {
+        flags += flag + ", ";
+      }
+
+      // Remove last comma and space and add padding according to the longest flags in order to align the help messages.
+      std::stringstream sstr;
+      sstr << std::left << std::setw(maxFlagLength) << flags.substr(0, flags.size() - 2);
+
+      // Print the help for each argument. This is a bit more involved since we do line wrapping for long descriptions.
+      size_t spacePos  = 0;
+      size_t lineWidth = 0;
+      while(spacePos != std::string::npos)
+      {
+        size_t nextspacePos = argument.m_help.find_first_of(' ', spacePos + 1);
+        sstr << argument.m_help.substr(spacePos, nextspacePos - spacePos);
+        lineWidth += nextspacePos - spacePos;
+        spacePos = nextspacePos;
+
+        if(lineWidth > MAX_LINE_WIDTH)
+        {
+          os << sstr.str() << std::endl;
+          sstr = std::stringstream();
+          sstr << std::left << std::setw(maxFlagLength - 1) << " ";
+          lineWidth = 0;
+        }
+      }
+    }
+  }
+
+
+  // The command line arguments are traversed from start to end. That means,
+  // if an option is set multiple times, the last will be the one which is
+  // finally used. This call will throw a std::runtime_error if a value is
+  // missing for a given option. Unknown flags will cause a warning on
+  // std::cerr.
+  bool parse(int argc, char* argv[])
+  {
+    bool result = true;
+
+    // Skip the first argument (name of the program).
+    int i = 1;
+    while(i < argc)
+    {
+      // First we have to identify whether the value is separated by a space or a '='.
+      std::string flag(argv[i]);
+      std::string value;
+      bool        valueIsSeparate = false;
+
+      // If there is an '=' in the flag, the part after the '=' is actually
+      // the value.
+      size_t equalPos = flag.find('=');
+      if(equalPos != std::string::npos)
+      {
+        value = flag.substr(equalPos + 1);
+        flag  = flag.substr(0, equalPos);
+      }
+      // Else the following argument is the value.
+      else if(i + 1 < argc)
+      {
+        value           = argv[i + 1];
+        valueIsSeparate = true;
+      }
+
+      // Search for an argument with the provided flag.
+      bool foundArgument = false;
+
+      for(auto const& argument : m_arguments)
+      {
+        if(std::find(argument.m_flags.begin(), argument.m_flags.end(), flag) != std::end(argument.m_flags))
+        {
+
+          foundArgument = true;
+
+          // In the case of booleans, the value is not needed.
+          if(std::holds_alternative<bool*>(argument.m_value))
+          {
+            if(!value.empty() && value != "true" && value != "false")
+            {
+              valueIsSeparate = false;  // No value
+            }
+            *std::get<bool*>(argument.m_value) = (value != "false");
+          }
+          // In all other cases there must be a value.
+          else if(value.empty())
+          {
+            LOGE("Failed to parse command line arguments. Missing value for argument %s\n", flag.c_str());
+            return false;
+          }
+          // For a std::string, we take the entire value.
+          else if(std::holds_alternative<std::string*>(argument.m_value))
+          {
+            *std::get<std::string*>(argument.m_value) = value;
+          }
+          // In all other cases we use a std::stringstream to convert the value.
+          else
+          {
+            std::visit(
+                [&value](auto&& arg) {
+                  std::stringstream sstr(value);
+                  sstr >> *arg;
+                },
+                argument.m_value);
+          }
+
+          break;
+        }
+      }
+
+      // Print a warning if there was an unknown argument.
+      if(!foundArgument)
+      {
+        std::cerr << "Ignoring unknown command line argument \"" << flag << "\"." << std::endl;
+        result = false;
+      }
+
+      // Advance to the next flag.
+      ++i;
+
+      // If the value was separated, we have to advance our index once more.
+      if(foundArgument && valueIsSeparate)
+      {
+        ++i;
+      }
+    }
+
+    return result;
+  }
+
+private:
+  struct Argument
+  {
+    std::vector<std::string> m_flags;
+    Value                    m_value;
+    std::string              m_help;
+  };
+
+  std::string           m_description;
+  std::vector<Argument> m_arguments;
+};
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/container_utils.hpp
+++ b/raytracer/nvpro_core/nvh/container_utils.hpp
@ -0,0 +1,105 @@
+#ifndef NVPRO_CORE_NVH_CONTAINER_UTILS_HPP_
+#define NVPRO_CORE_NVH_CONTAINER_UTILS_HPP_
+
+#include <array>
+#include <cassert>
+#include <stddef.h>
+#include <stdint.h>
+#include <vector>
+
+/// @DOC_SKIP (keyword to exclude this file from automatic README.md generation)
+
+// constexpr array size functions for C and C++ style arrays.
+// Truncated to 32-bits (with error checking) to support the common case in Vulkan.
+template <typename T, size_t size>
+constexpr uint32_t arraySize(const T (&)[size])
+{
+  constexpr uint32_t u32_size = static_cast<uint32_t>(size);
+  static_assert(size == u32_size, "32-bit overflow");
+  return u32_size;
+}
+
+template <typename T, size_t size>
+constexpr uint32_t arraySize(const std::array<T, size>&)
+{
+  constexpr uint32_t u32_size = static_cast<uint32_t>(size);
+  static_assert(size == u32_size, "32-bit overflow");
+  return u32_size;
+}
+
+// Checked 32-bit array size function for vectors.
+template <typename T, typename Allocator>
+constexpr uint32_t arraySize(const std::vector<T, Allocator>& vector)
+{
+  auto     size     = vector.size();
+  uint32_t u32_size = static_cast<uint32_t>(size);
+  if(u32_size != size)
+  {
+    assert(!"32-bit overflow");
+  }
+  return u32_size;
+}
+
+namespace nvh {
+
+//---- Hash Combination ----
+// http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf
+template <typename T>
+void hashCombine(std::size_t& seed, const T& val)
+{
+  seed ^= std::hash<T>()(val) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
+}
+// Auxiliary generic functions to create a hash value using a seed
+template <typename T, typename... Types>
+void hashCombine(std::size_t& seed, const T& val, const Types&... args)
+{
+  hashCombine(seed, val);
+  hashCombine(seed, args...);
+}
+// Optional auxiliary generic functions to support hash_val() without arguments
+inline void hashCombine(std::size_t& seed) {}
+// Generic function to create a hash value out of a heterogeneous list of arguments
+template <typename... Types>
+std::size_t hashVal(const Types&... args)
+{
+  std::size_t seed = 0;
+  hashCombine(seed, args...);
+  return seed;
+}
+//--------------
+
+template <typename T>
+std::size_t hashAligned32(const T& v)
+{
+  const uint32_t  size  = sizeof(T) / sizeof(uint32_t);
+  const uint32_t* vBits = reinterpret_cast<const uint32_t*>(&v);
+  std::size_t     seed  = 0;
+  for(uint32_t i = 0u; i < size; i++)
+  {
+    hashCombine(seed, vBits[i]);
+  }
+  return seed;
+}
+
+
+// Generic hash function to use when using a struct aligned to 32-bit as std::map-like container key
+// Important: this only works if the struct contains integral types, as it will not
+// do any pointer chasing
+template <typename T>
+struct HashAligned32
+{
+  std::size_t operator()(const T& s) const { return hashAligned32(s); }
+};
+
+// Generic equal function to use when using a struct as std::map-like container key
+// Important: this only works if the struct contains integral types, as it will not
+// do any pointer chasing
+template <typename T>
+struct EqualMem
+{
+  bool operator()(const T& l, const T& r) const { return memcmp(&l, &r, sizeof(T)) == 0; }
+};
+
+}  // namespace nvh
+
+#endif
--- a/raytracer/nvpro_core/nvh/filemapping.cpp
+++ b/raytracer/nvpro_core/nvh/filemapping.cpp
@ -0,0 +1,219 @@
+/*
+ * Copyright (c) 2020-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2020-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include "filemapping.hpp"
+#include <assert.h>
+
+#if defined(LINUX)
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#endif
+
+#if defined(_WIN32)
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN
+#endif
+#include <windows.h>
+
+inline DWORD HIDWORD(size_t x)
+{
+  return (DWORD)(x >> 32);
+}
+inline DWORD LODWORD(size_t x)
+{
+  return (DWORD)x;
+}
+#endif
+
+
+namespace nvh {
+
+bool FileMapping::open(const char* fileName, MappingType mappingType, size_t fileSize)
+{
+  if(!g_pageSize)
+  {
+#if defined(_WIN32)
+    SYSTEM_INFO si;
+    GetSystemInfo(&si);
+    g_pageSize = (size_t)si.dwAllocationGranularity;
+#elif defined(LINUX)
+    g_pageSize = (size_t)getpagesize();
+#endif
+  }
+
+  m_mappingType = mappingType;
+
+  if(mappingType == MAPPING_READOVERWRITE)
+  {
+    assert(fileSize);
+    m_fileSize    = fileSize;
+    m_mappingSize = ((fileSize + g_pageSize - 1) / g_pageSize) * g_pageSize;
+
+    // check if the current process is allowed to save a file of that size
+#if defined(_WIN32)
+    TCHAR          dir[MAX_PATH + 1];
+    BOOL           success = FALSE;
+    ULARGE_INTEGER numFreeBytes;
+
+    DWORD length = GetVolumePathName(fileName, dir, MAX_PATH + 1);
+
+    if(length > 0)
+    {
+      success = GetDiskFreeSpaceEx(dir, NULL, NULL, &numFreeBytes);
+    }
+
+    m_isValid = (!!success) && (m_mappingSize <= numFreeBytes.QuadPart);
+#elif defined(LINUX)
+    struct rlimit rlim;
+    getrlimit(RLIMIT_FSIZE, &rlim);
+    m_isValid = (m_mappingSize <= rlim.rlim_cur);
+#endif
+    if(!m_isValid)
+    {
+      return false;
+    }
+  }
+
+#if defined(_WIN32)
+  m_win32.file = mappingType == MAPPING_READONLY ?
+                     CreateFile(fileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, NULL) :
+                     CreateFile(fileName, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
+
+  m_isValid = (m_win32.file != INVALID_HANDLE_VALUE);
+  if(m_isValid)
+  {
+    if(mappingType == MAPPING_READONLY)
+    {
+      DWORD sizeHi  = 0;
+      DWORD sizeLo  = GetFileSize(m_win32.file, &sizeHi);
+      m_mappingSize = (static_cast<size_t>(sizeHi) << 32) | sizeLo;
+      m_fileSize    = m_mappingSize;
+    }
+
+    m_win32.fileMapping = CreateFileMapping(m_win32.file, NULL, mappingType == MAPPING_READONLY ? PAGE_READONLY : PAGE_READWRITE,
+                                            HIDWORD(m_mappingSize), LODWORD(m_mappingSize), NULL);
+
+    m_isValid = (m_win32.fileMapping != NULL);
+    if(m_isValid)
+    {
+      m_mappingPtr = MapViewOfFile(m_win32.fileMapping, mappingType == MAPPING_READONLY ? FILE_MAP_READ : FILE_MAP_ALL_ACCESS,
+                                   HIDWORD(0), LODWORD(0), (SIZE_T)0);
+      if(!m_mappingPtr)
+      {
+#if 0
+      DWORD err = GetLastError();
+#endif
+        CloseHandle(m_win32.file);
+        m_isValid = false;
+      }
+    }
+    else
+    {
+      CloseHandle(m_win32.file);
+    }
+  }
+#elif defined(LINUX)
+  m_unix.file = mappingType == MAPPING_READONLY ? ::open(fileName, O_RDONLY) : ::open(fileName, O_RDWR | O_CREAT | O_TRUNC, 0666);
+
+  m_isValid = (m_unix.file != -1);
+  if(m_isValid)
+  {
+    if(mappingType == MAPPING_READONLY)
+    {
+      struct stat s;
+      m_isValid &= (fstat(m_unix.file, &s) >= 0);
+      m_mappingSize = s.st_size;
+    }
+    else
+    {
+      // make file large enough to hold the complete scene
+      m_isValid &= (lseek(m_unix.file, m_mappingSize - 1, SEEK_SET) >= 0);
+      m_isValid &= (write(m_unix.file, "", 1) >= 0);
+      m_isValid &= (lseek(m_unix.file, 0, SEEK_SET) >= 0);
+    }
+    m_fileSize = m_mappingSize;
+    if(m_isValid)
+    {
+      m_mappingPtr = mmap(0, m_mappingSize, mappingType == MAPPING_READONLY ? PROT_READ : (PROT_READ | PROT_WRITE),
+                          MAP_SHARED, m_unix.file, 0);
+      m_isValid = (m_mappingPtr != MAP_FAILED);
+    }
+    if(!m_isValid)
+    {
+      ::close(m_unix.file);
+      m_unix.file = -1;
+    }
+  }
+#endif
+  return m_isValid;
+}
+
+void FileMapping::close()
+{
+  if(m_isValid)
+  {
+#if defined(_WIN32)
+    assert((m_win32.file != INVALID_HANDLE_VALUE) && (m_win32.fileMapping != NULL));
+
+    UnmapViewOfFile(m_mappingPtr);
+    CloseHandle(m_win32.fileMapping);
+
+    if(m_mappingType == MAPPING_READOVERWRITE)
+    {
+      // truncate file to minimum size
+      // To work with 64-bit file pointers, you can declare a LONG, treat it as the upper half
+      // of the 64-bit file pointer, and pass its address in lpDistanceToMoveHigh. This means
+      // you have to treat two different variables as a logical unit, which is error-prone.
+      // The problems can be ameliorated by using the LARGE_INTEGER structure to create a 64-bit
+      // value and passing the two 32-bit values by means of the appropriate elements of the union.
+      // (see msdn documentation on SetFilePointer)
+      LARGE_INTEGER li;
+      li.QuadPart = (__int64)m_fileSize;
+      SetFilePointer(m_win32.file, li.LowPart, &li.HighPart, FILE_BEGIN);
+
+      SetEndOfFile(m_win32.file);
+    }
+    CloseHandle(m_win32.file);
+
+    m_mappingPtr        = nullptr;
+    m_win32.fileMapping = nullptr;
+    m_win32.file        = nullptr;
+
+#elif defined(LINUX)
+    assert(m_unix.file != -1);
+
+    munmap(m_mappingPtr, m_mappingSize);
+    ::close(m_unix.file);
+
+    m_mappingPtr = nullptr;
+    m_unix.file = -1;
+#endif
+
+    m_isValid = false;
+  }
+}
+
+size_t FileMapping::g_pageSize = 0;
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/filemapping.hpp
+++ b/raytracer/nvpro_core/nvh/filemapping.hpp
@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2020-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2020-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+/// @DOC_SKIP (keyword to exclude this file from automatic README.md generation)
+
+#pragma once
+
+#include <cstddef>
+#include <utility>
+
+namespace nvh {
+
+class FileMapping
+{
+public:
+  FileMapping(FileMapping&& other) noexcept { this->operator=(std::move(other)); };
+
+  FileMapping& operator=(FileMapping&& other) noexcept
+  {
+    m_isValid     = other.m_isValid;
+    m_fileSize    = other.m_fileSize;
+    m_mappingType = other.m_mappingType;
+    m_mappingPtr  = other.m_mappingPtr;
+    m_mappingSize = other.m_mappingSize;
+#ifdef _WIN32
+    m_win32.file              = other.m_win32.file;
+    m_win32.fileMapping       = other.m_win32.fileMapping;
+    other.m_win32.file        = nullptr;
+    other.m_win32.fileMapping = nullptr;
+#else
+    m_unix.file       = other.m_unix.file;
+    other.m_unix.file = -1;
+#endif
+    other.m_isValid    = false;
+    other.m_mappingPtr = nullptr;
+
+    return *this;
+  }
+
+  FileMapping(const FileMapping&)                  = delete;
+  FileMapping& operator=(const FileMapping& other) = delete;
+  FileMapping() {}
+
+  ~FileMapping() { close(); }
+
+  enum MappingType
+  {
+    MAPPING_READONLY,       // opens existing file for read-only access
+    MAPPING_READOVERWRITE,  // creates new file with read/write access, overwriting existing files
+  };
+
+  // fileSize only for write access
+  bool open(const char* filename, MappingType mappingType, size_t fileSize = 0);
+  void close();
+
+  const void* data() const { return m_mappingPtr; }
+  void*       data() { return m_mappingPtr; }
+  size_t      size() const { return m_mappingSize; }
+  bool        valid() const { return m_isValid; }
+
+protected:
+  static size_t g_pageSize;
+
+#ifdef _WIN32
+  struct
+  {
+    void* file        = nullptr;
+    void* fileMapping = nullptr;
+  } m_win32;
+#else
+  struct
+  {
+    int file = -1;
+  } m_unix;
+#endif
+
+  bool        m_isValid     = false;
+  size_t      m_fileSize    = 0;
+  MappingType m_mappingType = MappingType::MAPPING_READONLY;
+  void*       m_mappingPtr  = nullptr;
+  size_t      m_mappingSize = 0;
+};
+
+// convenience types
+class FileReadMapping : private FileMapping
+{
+public:
+  bool        open(const char* filename) { return FileMapping::open(filename, MAPPING_READONLY, 0); }
+  void        close() { FileMapping::close(); }
+  const void* data() const { return m_mappingPtr; }
+  size_t      size() const { return m_fileSize; }
+  bool        valid() const { return m_isValid; }
+};
+
+class FileReadOverWriteMapping : private FileMapping
+{
+public:
+  bool open(const char* filename, size_t fileSize)
+  {
+    return FileMapping::open(filename, MAPPING_READOVERWRITE, fileSize);
+  }
+  void   close() { FileMapping::close(); }
+  void*  data() { return m_mappingPtr; }
+  size_t size() const { return m_fileSize; }
+  bool   valid() const { return m_isValid; }
+};
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/fileoperations.hpp
+++ b/raytracer/nvpro_core/nvh/fileoperations.hpp
@ -0,0 +1,181 @@
+/*
+ * Copyright (c) 2019-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2019-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#pragma once
+
+#include <algorithm>  // std::max
+#include <fstream>
+#include <sstream>
+#include <vector>
+
+#include "nvprint.hpp"
+
+/** @DOC_START
+ # functions in nvh
+
+ - nvh::fileExists : check if file exists
+ - nvh::findFile : finds filename in provided search directories
+ - nvh::loadFile : (multiple overloads) loads file as std::string, binary or text, can also search in provided directories
+ - nvh::getFileName : splits filename from filename with path
+ - nvh::getFilePath : splits filepath from filename with path
+ @DOC_END */
+
+namespace nvh {
+
+inline bool fileExists(const char* filename)
+{
+  std::ifstream stream;
+  stream.open(filename);
+  return stream.is_open();
+}
+
+// returns first found filename (searches within directories provided)
+inline std::string findFile(const std::string& infilename, const std::vector<std::string>& directories, bool warn = false)
+{
+  std::ifstream stream;
+
+  {
+    stream.open(infilename.c_str());
+    if(stream.is_open())
+    {
+      // nvprintfLevel(LOGLEVEL_INFO, "Found: %s\n", infilename.c_str());
+      return infilename;
+    }
+  }
+
+  for(const auto& directory : directories)
+  {
+    std::string filename = directory + "/" + infilename;
+    stream.open(filename.c_str());
+    if(stream.is_open())
+    {
+      // nvprintfLevel(LOGLEVEL_INFO, "Found: %s\n", filename.c_str());
+      return filename;
+    }
+  }
+
+  if(warn)
+  {
+    nvprintfLevel(LOGLEVEL_WARNING, "File not found: %s\n", infilename.c_str());
+    nvprintfLevel(LOGLEVEL_WARNING, "In directories: \n");
+    for(const auto& directory : directories)
+    {
+      nvprintfLevel(LOGLEVEL_WARNING, " - %s\n", directory.c_str());
+    }
+    nvprintfLevel(LOGLEVEL_WARNING, "\n");
+  }
+
+  return {};
+}
+
+inline std::string loadFile(const std::string& filename, bool binary)
+{
+  std::string   result;
+  std::ifstream stream(filename, std::ios::ate | (binary ? std::ios::binary : std::ios_base::openmode(0)));
+
+  if(!stream.is_open())
+  {
+    return result;
+  }
+
+  result.reserve(stream.tellg());
+  stream.seekg(0, std::ios::beg);
+
+  result.assign((std::istreambuf_iterator<char>(stream)), std::istreambuf_iterator<char>());
+  return result;
+}
+
+inline std::string loadFile(const char* filename, bool binary)
+{
+  std::string name(filename);
+  return loadFile(name, binary);
+}
+
+inline std::string loadFile(const std::string&              filename,
+                            bool                            binary,
+                            const std::vector<std::string>& directories,
+                            std::string&                    filenameFound,
+                            bool                            warn = false)
+{
+  filenameFound = findFile(filename, directories, warn);
+  if(filenameFound.empty())
+  {
+    return {};
+  }
+  else
+  {
+    return loadFile(filenameFound, binary);
+  }
+}
+
+inline std::string loadFile(const std::string filename, bool binary, const std::vector<std::string>& directories, bool warn = false)
+{
+  std::string filenameFound;
+  return loadFile(filename, binary, directories, filenameFound, warn);
+}
+
+// splits filename excluding path
+inline std::string getFileName(std::string const& fullPath)
+{
+  // Determine the last occurrence of path separator
+  std::size_t lastSeparator = fullPath.find_last_of("/\\");
+  if(lastSeparator == std::string::npos)
+  {
+    // If no separator found, return fullPath as it is (considered as filename)
+    return fullPath;
+  }
+  // Extract the filename from fullPath
+  return fullPath.substr(lastSeparator + 1);
+}
+
+// splits path from filename
+inline std::string getFilePath(const char* filename)
+{
+  std::string path;
+  // find path in filename
+  {
+    std::string filepath(filename);
+
+    size_t pos0 = filepath.rfind('\\');
+    size_t pos1 = filepath.rfind('/');
+
+    pos0 = pos0 == std::string::npos ? 0 : pos0;
+    pos1 = pos1 == std::string::npos ? 0 : pos1;
+
+    path = filepath.substr(0, std::max(pos0, pos1));
+  }
+
+  if(path.empty())
+  {
+    path = ".";
+  }
+
+  return path;
+}
+
+// Return true if the filename ends with ending. i.e. ".png"
+inline bool endsWith(std::string const& value, std::string const& ending)
+{
+  if(ending.size() > value.size())
+    return false;
+  return std::equal(ending.rbegin(), ending.rend(), value.rbegin());
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/geometry.hpp
+++ b/raytracer/nvpro_core/nvh/geometry.hpp
@ -0,0 +1,547 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_GEOMETRY_INCLUDED
+#define NV_GEOMETRY_INCLUDED
+
+#include <glm/glm.hpp>
+#include <glm/gtc/constants.hpp>
+#include <glm/gtc/matrix_transform.hpp>
+
+#include <stdint.h>
+
+#include <cmath>
+#include <vector>
+
+namespace nvh {
+
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # namespace nvh::geometry
+    The geometry namespace provides a few procedural mesh primitives
+    that are subdivided.
+    
+    nvh::geometry::Mesh template uses the provided TVertex which must have a 
+    constructor from nvh::geometry::Vertex. You can also use nvh::geometry::Vertex
+    directly.
+    
+    It provides triangle indices, as well as outline line indices. The outline indices
+    are typical feature lines (rectangle for plane, some circles for sphere/torus).
+    
+    All basic primitives are within -1,1 ranges along the axis they use
+    
+    - nvh::geometry::Plane (x,y subdivision)
+    - nvh::geometry::Box (x,y,z subdivision, made of 6 planes)
+    - nvh::geometry::Sphere (lat,long subdivision)
+    - nvh::geometry::Torus (inner, outer circle subdivision)
+    - nvh::geometry::RandomMengerSponge (subdivision, tree depth, probability)
+
+    Example:
+
+    ```cpp
+    // single primitive
+    nvh::geometry::Box<nvh::geometry::Vertex> box(4,4,4);
+
+    // construct from primitives
+
+    ```
+@DOC_END  */
+
+
+namespace geometry {
+struct Vertex
+{
+  Vertex(glm::vec3 const& position, glm::vec3 const& normal, glm::vec2 const& texcoord)
+      : position(glm::vec4(position, 1.0f))
+      , normal(glm::vec4(normal, 0.0f))
+      , texcoord(glm::vec4(texcoord, 0.0f, 0.0f))
+  {
+  }
+
+  glm::vec4 position;
+  glm::vec4 normal;
+  glm::vec4 texcoord;
+};
+
+
+// The provided TVertex must have a constructor from Vertex
+
+template <class TVertex = Vertex>
+class Mesh
+{
+public:
+  std::vector<TVertex>    m_vertices;
+  std::vector<glm::uvec3> m_indicesTriangles;
+  std::vector<glm::uvec2> m_indicesOutline;
+
+  void append(Mesh<TVertex>& geo)
+  {
+    m_vertices.reserve(geo.m_vertices.size() + m_vertices.size());
+    m_indicesTriangles.reserve(geo.m_indicesTriangles.size() + m_indicesTriangles.size());
+    m_indicesOutline.reserve(geo.m_indicesOutline.size() + m_indicesOutline.size());
+
+    uint32_t offset = uint32_t(m_vertices.size());
+
+    for(size_t i = 0; i < geo.m_vertices.size(); i++)
+    {
+      m_vertices.push_back(geo.m_vertices[i]);
+    }
+
+    for(size_t i = 0; i < geo.m_indicesTriangles.size(); i++)
+    {
+      m_indicesTriangles.push_back(geo.m_indicesTriangles[i] + glm::uvec3(offset));
+    }
+
+    for(size_t i = 0; i < geo.m_indicesOutline.size(); i++)
+    {
+      m_indicesOutline.push_back(geo.m_indicesOutline[i] + glm::uvec2(offset));
+    }
+  }
+
+  void flipWinding()
+  {
+    for(size_t i = 0; i < m_indicesTriangles.size(); i++)
+    {
+      std::swap(m_indicesTriangles[i].x, m_indicesTriangles[i].z);
+    }
+  }
+
+  size_t getTriangleIndicesSize() const { return m_indicesTriangles.size() * sizeof(glm::uvec3); }
+
+  uint32_t getTriangleIndicesCount() const { return (uint32_t)m_indicesTriangles.size() * 3; }
+
+  size_t getOutlineIndicesSize() const { return m_indicesOutline.size() * sizeof(glm::uvec2); }
+
+  uint32_t getOutlineIndicesCount() const { return (uint32_t)m_indicesOutline.size() * 2; }
+
+  size_t getVerticesSize() const { return m_vertices.size() * sizeof(TVertex); }
+
+  uint32_t getVerticesCount() const { return (uint32_t)m_vertices.size(); }
+};
+
+template <class TVertex = Vertex>
+class Plane : public Mesh<TVertex>
+{
+public:
+  static void add(Mesh<TVertex>& geo, const glm::mat4& mat, int w, int h)
+  {
+    int xdim = w;
+    int ydim = h;
+
+    float xmove = 1.0f / (float)xdim;
+    float ymove = 1.0f / (float)ydim;
+
+    int width = (xdim + 1);
+
+    uint32_t vertOffset = (uint32_t)geo.m_vertices.size();
+
+    int x, y;
+    for(y = 0; y < ydim + 1; y++)
+    {
+      for(x = 0; x < xdim + 1; x++)
+      {
+        float     xpos = ((float)x * xmove);
+        float     ypos = ((float)y * ymove);
+        glm::vec3 pos;
+        glm::vec2 uv;
+        glm::vec3 normal;
+
+        pos[0] = (xpos - 0.5f) * 2.0f;
+        pos[1] = (ypos - 0.5f) * 2.0f;
+        pos[2] = 0;
+
+        uv[0] = xpos;
+        uv[1] = ypos;
+
+        normal[0] = 0.0f;
+        normal[1] = 0.0f;
+        normal[2] = 1.0f;
+
+        Vertex vert   = Vertex(pos, normal, uv);
+        vert.position = mat * vert.position;
+        vert.normal   = mat * vert.normal;
+        geo.m_vertices.push_back(TVertex(vert));
+      }
+    }
+
+    for(y = 0; y < ydim; y++)
+    {
+      for(x = 0; x < xdim; x++)
+      {
+        // upper tris
+        geo.m_indicesTriangles.push_back(glm::uvec3((x) + (y + 1) * width + vertOffset, (x) + (y)*width + vertOffset,
+                                                    (x + 1) + (y + 1) * width + vertOffset));
+        // lower tris
+        geo.m_indicesTriangles.push_back(glm::uvec3((x + 1) + (y + 1) * width + vertOffset,
+                                                    (x) + (y)*width + vertOffset, (x + 1) + (y)*width + vertOffset));
+      }
+    }
+
+    for(y = 0; y < ydim; y++)
+    {
+      geo.m_indicesOutline.push_back(glm::uvec2((y)*width + vertOffset, (y + 1) * width + vertOffset));
+    }
+    for(y = 0; y < ydim; y++)
+    {
+      geo.m_indicesOutline.push_back(glm::uvec2((y)*width + xdim + vertOffset, (y + 1) * width + xdim + vertOffset));
+    }
+    for(x = 0; x < xdim; x++)
+    {
+      geo.m_indicesOutline.push_back(glm::uvec2((x) + vertOffset, (x + 1) + vertOffset));
+    }
+    for(x = 0; x < xdim; x++)
+    {
+      geo.m_indicesOutline.push_back(glm::uvec2((x) + ydim * width + vertOffset, (x + 1) + ydim * width + vertOffset));
+    }
+  }
+
+  Plane(int segments = 1) { add(*this, glm::mat4(1), segments, segments); }
+};
+
+template <class TVertex = Vertex>
+class Box : public Mesh<TVertex>
+{
+public:
+  static void add(Mesh<TVertex>& geo, const glm::mat4& mat, int w, int h, int d)
+  {
+    int configs[6][2] = {
+        {w, h}, {w, h},
+
+        {d, h}, {d, h},
+
+        {w, d}, {w, d},
+    };
+
+    for(int side = 0; side < 6; side++)
+    {
+      glm::mat4 matrixRot(1);
+
+      switch(side)
+      {
+        case 0:
+          break;
+        case 1:
+          matrixRot = glm::rotate(glm::mat4(1), glm::pi<float>(), glm::vec3(0, 1, 0));
+          break;
+        case 2:
+          matrixRot = glm::rotate(glm::mat4(1), glm::pi<float>() * 0.5f, glm::vec3(0, 1, 0));
+          break;
+        case 3:
+          matrixRot = glm::rotate(glm::mat4(1), glm::pi<float>() * 1.5f, glm::vec3(0, 1, 0));
+          break;
+        case 4:
+          matrixRot = glm::rotate(glm::mat4(1), glm::pi<float>() * 0.5f, glm::vec3(1, 0, 0));
+          break;
+        case 5:
+          matrixRot = glm::rotate(glm::mat4(1), glm::pi<float>() * 1.5f, glm::vec3(1, 0, 0));
+          break;
+      }
+
+      glm::mat4 matrixMove = glm::translate(glm::mat4(1.f), {0.0f, 0.0f, 1.0f});
+
+      Plane<TVertex>::add(geo, mat * matrixRot * matrixMove, configs[side][0], configs[side][1]);
+    }
+  }
+
+  Box(int segments = 1) { add(*this, glm::mat4(1), segments, segments, segments); }
+};
+
+template <class TVertex = Vertex>
+class Sphere : public Mesh<TVertex>
+{
+public:
+  static void add(Mesh<TVertex>& geo, const glm::mat4& mat, int w, int h)
+  {
+    int xydim = w;
+    int zdim  = h;
+
+    uint32_t vertOffset = (uint32_t)geo.m_vertices.size();
+
+    float xyshift = 1.0f / (float)xydim;
+    float zshift  = 1.0f / (float)zdim;
+    int   width   = xydim + 1;
+
+
+    int index = 0;
+    int xy, z;
+    for(z = 0; z < zdim + 1; z++)
+    {
+      for(xy = 0; xy < xydim + 1; xy++)
+      {
+        glm::vec3 pos;
+        glm::vec3 normal;
+        glm::vec2 uv;
+        float     curxy   = xyshift * (float)xy;
+        float     curz    = zshift * (float)z;
+        float     anglexy = curxy * glm::pi<float>() * 2.0f;
+        float     anglez  = (1.0f - curz) * glm::pi<float>();
+        pos[0]            = cosf(anglexy) * sinf(anglez);
+        pos[1]            = sinf(anglexy) * sinf(anglez);
+        pos[2]            = cosf(anglez);
+        normal            = pos;
+        uv[0]             = curxy;
+        uv[1]             = curz;
+
+        Vertex vert   = Vertex(pos, normal, uv);
+        vert.position = mat * vert.position;
+        vert.normal   = mat * vert.normal;
+
+        geo.m_vertices.push_back(TVertex(vert));
+      }
+    }
+
+    int vertex = 0;
+    for(z = 0; z < zdim; z++)
+    {
+      for(xy = 0; xy < xydim; xy++, vertex++)
+      {
+        glm::uvec3 indices;
+        if(z != zdim - 1)
+        {
+          indices[2] = vertex + vertOffset;
+          indices[1] = vertex + width + vertOffset;
+          indices[0] = vertex + width + 1 + vertOffset;
+          geo.m_indicesTriangles.push_back(indices);
+        }
+
+        if(z != 0)
+        {
+          indices[2] = vertex + width + 1 + vertOffset;
+          indices[1] = vertex + 1 + vertOffset;
+          indices[0] = vertex + vertOffset;
+          geo.m_indicesTriangles.push_back(indices);
+        }
+      }
+      vertex++;
+    }
+
+    int middlez = zdim / 2;
+
+    for(xy = 0; xy < xydim; xy++)
+    {
+      glm::uvec2 indices;
+      indices[0] = middlez * width + xy + vertOffset;
+      indices[1] = middlez * width + xy + 1 + vertOffset;
+      geo.m_indicesOutline.push_back(indices);
+    }
+
+    for(int i = 0; i < 4; i++)
+    {
+      int x = (xydim * i) / 4;
+      for(z = 0; z < zdim; z++)
+      {
+        glm::uvec2 indices;
+        indices[0] = x + width * (z) + vertOffset;
+        indices[1] = x + width * (z + 1) + vertOffset;
+        geo.m_indicesOutline.push_back(indices);
+      }
+    }
+  }
+
+  Sphere(int w = 16, int h = 8) { add(*this, glm::mat4(1), w, h); }
+};
+
+template <class TVertex = Vertex>
+class Torus : public Mesh<TVertex>
+{
+public:
+  static void add(Mesh<TVertex>& geo, const glm::mat4& mat, int w, int h)
+  {
+    // Radius of inner and outer circles
+    float innerRadius = 0.8f;
+    float outerRadius = 0.2f;
+
+    unsigned int numVertices = (w + 1) * (h + 1);
+
+    float wf = (float)w;
+    float hf = (float)h;
+
+    float phi_step   = 2.0f * glm::pi<float>() / wf;
+    float theta_step = 2.0f * glm::pi<float>() / hf;
+
+    // Setup vertices and normals
+    // Generate the Torus exactly like the sphere with rings around the origin along the latitudes.
+    for(unsigned int latitude = 0; latitude <= (unsigned int)w; latitude++)  // theta angle
+    {
+      float theta    = (float)latitude * theta_step;
+      float sinTheta = sinf(theta);
+      float cosTheta = cosf(theta);
+
+      float radius = innerRadius + outerRadius * cosTheta;
+
+      for(unsigned int longitude = 0; longitude <= (unsigned int)h; longitude++)  // phi angle
+      {
+        float phi    = (float)longitude * phi_step;
+        float sinPhi = sinf(phi);
+        float cosPhi = cosf(phi);
+
+        glm::vec3 position = glm::vec3(radius * cosPhi, outerRadius * sinTheta, radius * -sinPhi);
+        glm::vec3 normal   = glm::vec3(cosPhi * cosTheta, sinTheta, -sinPhi * cosTheta);
+        glm::vec2 uv       = glm::vec2((float)longitude / wf, (float)latitude / hf);
+
+        Vertex vertex(position, normal, uv);
+        geo.m_vertices.push_back(TVertex(vertex));
+      }
+    }
+
+    const unsigned int columns = w + 1;
+
+    // Setup indices
+    for(unsigned int latitude = 0; latitude < (unsigned int)w; latitude++)
+    {
+      for(unsigned int longitude = 0; longitude < (unsigned int)h; longitude++)
+      {
+        // Indices for triangles
+        glm::uvec3 triangle1(latitude * columns + longitude, latitude * columns + longitude + 1, (latitude + 1) * columns + longitude);
+        glm::uvec3 triangle2((latitude + 1) * columns + longitude, latitude * columns + longitude + 1,
+                             (latitude + 1) * columns + longitude + 1);
+
+        geo.m_indicesTriangles.push_back(triangle1);
+        geo.m_indicesTriangles.push_back(triangle2);
+      }
+    }
+
+    // Setup outline indices
+    // Outline for outer ring
+    for(unsigned int longitude = 0; longitude < (unsigned int)w; longitude++)
+    {
+      for(unsigned int y = 0; y < 4; y++)
+      {
+        unsigned int latitude = y * (0.25 * h);
+        glm::uvec2   line(latitude * columns + longitude, latitude * columns + longitude + 1);
+        geo.m_indicesOutline.push_back(line);
+      }
+    }
+    // Outline for inner rings
+    for(unsigned int x = 0; x < 4; x++)
+    {
+      for(unsigned int latitude = 0; latitude < (unsigned int)h; latitude++)
+      {
+        unsigned int longitude = x * (0.25 * w);
+        glm::uvec2   line(latitude * columns + longitude, (latitude + 1) * columns + longitude);
+        geo.m_indicesOutline.push_back(line);
+      }
+    }
+  }
+
+  Torus(int w = 16, int h = 16) { add(*this, glm::mat4(1), w, h); }
+};
+
+template <class TVertex = Vertex>
+class RandomMengerSponge : public Mesh<TVertex>
+{
+public:
+  static void add(Mesh<TVertex>& geo, const glm::mat4& mat, int w, int h, int d, int level = 3, float probability = -1.f)
+  {
+    struct Cube
+    {
+      glm::vec3 m_topLeftFront;
+      float     m_size;
+
+      void split(std::vector<Cube>& cubes)
+      {
+        float     size         = m_size / 3.f;
+        glm::vec3 topLeftFront = m_topLeftFront;
+        for(int x = 0; x < 3; x++)
+        {
+          topLeftFront[0] = m_topLeftFront[0] + static_cast<float>(x) * size;
+          for(int y = 0; y < 3; y++)
+          {
+            if(x == 1 && y == 1)
+              continue;
+            topLeftFront[1] = m_topLeftFront[1] + static_cast<float>(y) * size;
+            for(int z = 0; z < 3; z++)
+            {
+              if(x == 1 && z == 1)
+                continue;
+              if(y == 1 && z == 1)
+                continue;
+
+              topLeftFront[2] = m_topLeftFront[2] + static_cast<float>(z) * size;
+              cubes.push_back({topLeftFront, size});
+            }
+          }
+        }
+      }
+
+      void splitProb(std::vector<Cube>& cubes, float prob)
+      {
+
+        float     size         = m_size / 3.f;
+        glm::vec3 topLeftFront = m_topLeftFront;
+        for(int x = 0; x < 3; x++)
+        {
+          topLeftFront[0] = m_topLeftFront[0] + static_cast<float>(x) * size;
+          for(int y = 0; y < 3; y++)
+          {
+            topLeftFront[1] = m_topLeftFront[1] + static_cast<float>(y) * size;
+            for(int z = 0; z < 3; z++)
+            {
+              float sample = rand() / static_cast<float>(RAND_MAX);
+              if(sample > prob)
+                continue;
+              topLeftFront[2] = m_topLeftFront[2] + static_cast<float>(z) * size;
+              cubes.push_back({topLeftFront, size});
+            }
+          }
+        }
+      }
+    };
+
+    Cube cube = {glm::vec3(-0.25, -0.25, -0.25), 0.5f};
+    //Cube cube = { glm::vec3(-25, -25, -25), 50.f };
+    //Cube cube = { glm::vec3(-40, -40, -40), 10.f };
+
+    std::vector<Cube> cubes1 = {cube};
+    std::vector<Cube> cubes2 = {};
+
+    auto previous = &cubes1;
+    auto next     = &cubes2;
+
+    for(int i = 0; i < level; i++)
+    {
+      size_t cubeCount = previous->size();
+      for(Cube& c : *previous)
+      {
+        if(probability < 0.f)
+          c.split(*next);
+        else
+          c.splitProb(*next, probability);
+      }
+      auto temp = previous;
+      previous  = next;
+      next      = temp;
+      next->clear();
+    }
+    for(Cube& c : *previous)
+    {
+      glm::mat4 matrixMove  = glm::translate(glm::mat4(1.f), c.m_topLeftFront);
+      glm::mat4 matrixScale = glm::scale(glm::mat4(1.f), glm::vec3(c.m_size));
+      ;
+      Box<TVertex>::add(geo, matrixMove * matrixScale, 1, 1, 1);
+    }
+  }
+};
+
+}  // namespace geometry
+}  // namespace nvh
+
+
+#endif
--- a/raytracer/nvpro_core/nvh/gltfscene.cpp
+++ b/raytracer/nvpro_core/nvh/gltfscene.cpp
--- a/raytracer/nvpro_core/nvh/gltfscene.hpp
+++ b/raytracer/nvpro_core/nvh/gltfscene.hpp
@ -0,0 +1,711 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+/** @DOC_START
+# `nvh::GltfScene`
+
+  These utilities are for loading glTF models in a
+  canonical scene representation. From this representation
+  you would create the appropriate 3D API resources (buffers
+  and textures).
+ 
+  ```cpp
+  // Typical Usage
+  // Load the GLTF Scene using TinyGLTF
+ 
+  tinygltf::Model    gltfModel;
+  tinygltf::TinyGLTF gltfContext;
+  fileLoaded = gltfContext.LoadASCIIFromFile(&gltfModel, &error, &warn, m_filename);
+ 
+  // Fill the data in the gltfScene
+  gltfScene.getMaterials(tmodel);
+  gltfScene.getDrawableNodes(tmodel, GltfAttributes::Normal | GltfAttributes::Texcoord_0);
+
+  // Todo in App:
+  //   create buffers for vertices and indices, from gltfScene.m_position, gltfScene.m_index
+  //   create textures from images: using tinygltf directly
+  //   create descriptorSet for material using directly gltfScene.m_materials
+  ```
+
+@DOC_END */
+
+#pragma once
+#include <glm/glm.hpp>
+#include "tiny_gltf.h"
+#include <algorithm>
+#include <cassert>
+#include <functional>
+#include <map>
+#include <set>
+#include <string>
+#include <string.h>
+#include <unordered_map>
+#include <vector>
+
+#define KHR_LIGHTS_PUNCTUAL_EXTENSION_NAME "KHR_lights_punctual"
+
+namespace nvh {
+
+// https://github.com/KhronosGroup/glTF/blob/main/extensions/2.0/Khronos/KHR_materials_specular/README.md
+#define KHR_MATERIALS_SPECULAR_EXTENSION_NAME "KHR_materials_specular"
+struct KHR_materials_specular
+{
+  float     specularFactor{1.f};
+  int       specularTexture{-1};
+  glm::vec3 specularColorFactor{1.f, 1.f, 1.f};
+  int       specularColorTexture{-1};
+};
+
+// https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_texture_transform
+#define KHR_TEXTURE_TRANSFORM_EXTENSION_NAME "KHR_texture_transform"
+struct KHR_texture_transform
+{
+  glm::vec2 offset{0.f, 0.f};
+  float     rotation{0.f};
+  glm::vec2 scale{1.f};
+  int       texCoord{0};
+  glm::mat3 uvTransform{1};  // Computed transform of offset, rotation, scale
+};
+
+
+// https://github.com/KhronosGroup/glTF/blob/master/extensions/2.0/Khronos/KHR_materials_clearcoat/README.md
+#define KHR_MATERIALS_CLEARCOAT_EXTENSION_NAME "KHR_materials_clearcoat"
+struct KHR_materials_clearcoat
+{
+  float factor{0.f};
+  int   texture{-1};
+  float roughnessFactor{0.f};
+  int   roughnessTexture{-1};
+  int   normalTexture{-1};
+};
+
+// https://github.com/KhronosGroup/glTF/blob/master/extensions/2.0/Khronos/KHR_materials_sheen/README.md
+#define KHR_MATERIALS_SHEEN_EXTENSION_NAME "KHR_materials_sheen"
+struct KHR_materials_sheen
+{
+  glm::vec3 colorFactor{0.f, 0.f, 0.f};
+  int       colorTexture{-1};
+  float     roughnessFactor{0.f};
+  int       roughnessTexture{-1};
+};
+
+// https://github.com/DassaultSystemes-Technology/glTF/tree/KHR_materials_volume/extensions/2.0/Khronos/KHR_materials_transmission
+#define KHR_MATERIALS_TRANSMISSION_EXTENSION_NAME "KHR_materials_transmission"
+struct KHR_materials_transmission
+{
+  float factor{0.f};
+  int   texture{-1};
+};
+
+// https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_materials_unlit
+#define KHR_MATERIALS_UNLIT_EXTENSION_NAME "KHR_materials_unlit"
+struct KHR_materials_unlit
+{
+  int active{0};
+};
+
+// PBR Next : KHR_materials_anisotropy
+#define KHR_MATERIALS_ANISOTROPY_EXTENSION_NAME "KHR_materials_anisotropy"
+struct KHR_materials_anisotropy
+{
+  float     factor{0.f};
+  glm::vec3 direction{1.f, 0.f, 0.f};
+  int       texture{-1};
+};
+
+
+// https://github.com/DassaultSystemes-Technology/glTF/tree/KHR_materials_ior/extensions/2.0/Khronos/KHR_materials_ior
+#define KHR_MATERIALS_IOR_EXTENSION_NAME "KHR_materials_ior"
+struct KHR_materials_ior
+{
+  float ior{1.5f};
+};
+
+// https://github.com/DassaultSystemes-Technology/glTF/tree/KHR_materials_volume/extensions/2.0/Khronos/KHR_materials_volume
+#define KHR_MATERIALS_VOLUME_EXTENSION_NAME "KHR_materials_volume"
+struct KHR_materials_volume
+{
+  float     thicknessFactor{0};
+  int       thicknessTexture{-1};
+  float     attenuationDistance{std::numeric_limits<float>::max()};
+  glm::vec3 attenuationColor{1.f, 1.f, 1.f};
+};
+
+
+// https://github.com/KhronosGroup/glTF/blob/main/extensions/2.0/Khronos/KHR_texture_basisu/README.md
+#define KHR_TEXTURE_BASISU_NAME "KHR_texture_basisu"
+struct KHR_texture_basisu
+{
+  int source{-1};
+};
+
+// https://github.com/KhronosGroup/glTF/issues/948
+#define KHR_MATERIALS_DISPLACEMENT_NAME "KHR_materials_displacement"
+struct KHR_materials_displacement
+{
+  float displacementGeometryFactor{1.0f};
+  float displacementGeometryOffset{0.0f};
+  int   displacementGeometryTexture{-1};
+};
+
+
+// https://github.com/KhronosGroup/glTF/blob/main/extensions/2.0/Khronos/KHR_materials_emissive_strength/README.md
+#define KHR_MATERIALS_EMISSIVE_STRENGTH_NAME "KHR_materials_emissive_strength"
+struct KHR_materials_emissive_strength
+{
+  float emissiveStrength{1.0};
+};
+
+// https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/README.md#reference-material
+struct GltfMaterial
+{
+  // pbrMetallicRoughness
+  glm::vec4 baseColorFactor{1.f, 1.f, 1.f, 1.f};
+  int       baseColorTexture{-1};
+  float     metallicFactor{1.f};
+  float     roughnessFactor{1.f};
+  int       metallicRoughnessTexture{-1};
+
+  int       emissiveTexture{-1};
+  glm::vec3 emissiveFactor{0, 0, 0};
+  int       alphaMode{0};
+  float     alphaCutoff{0.5f};
+  int       doubleSided{0};
+
+  int   normalTexture{-1};
+  float normalTextureScale{1.f};
+  int   occlusionTexture{-1};
+  float occlusionTextureStrength{1};
+
+  // Extensions
+  KHR_materials_specular              specular;
+  KHR_texture_transform               textureTransform;
+  KHR_materials_clearcoat             clearcoat;
+  KHR_materials_sheen                 sheen;
+  KHR_materials_transmission          transmission;
+  KHR_materials_unlit                 unlit;
+  KHR_materials_anisotropy            anisotropy;
+  KHR_materials_ior                   ior;
+  KHR_materials_volume                volume;
+  KHR_materials_displacement          displacement;
+  KHR_materials_emissive_strength     emissiveStrength;
+
+  // Tiny Reference
+  const tinygltf::Material* tmaterial{nullptr};
+};
+
+
+struct GltfNode
+{
+  glm::mat4             worldMatrix{1};
+  int                   primMesh{0};
+  const tinygltf::Node* tnode{nullptr};
+};
+
+struct GltfPrimMesh
+{
+  uint32_t firstIndex{0};
+  uint32_t indexCount{0};
+  uint32_t vertexOffset{0};
+  uint32_t vertexCount{0};
+  int      materialIndex{0};
+
+  glm::vec3   posMin{0, 0, 0};
+  glm::vec3   posMax{0, 0, 0};
+  std::string name;
+
+  // Tiny Reference
+  const tinygltf::Mesh*      tmesh{nullptr};
+  const tinygltf::Primitive* tprim{nullptr};
+};
+
+struct GltfStats
+{
+  uint32_t nbCameras{0};
+  uint32_t nbImages{0};
+  uint32_t nbTextures{0};
+  uint32_t nbMaterials{0};
+  uint32_t nbSamplers{0};
+  uint32_t nbNodes{0};
+  uint32_t nbMeshes{0};
+  uint32_t nbLights{0};
+  uint32_t imageMem{0};
+  uint32_t nbUniqueTriangles{0};
+  uint32_t nbTriangles{0};
+};
+
+struct GltfCamera
+{
+  glm::mat4 worldMatrix{1};
+  glm::vec3 eye{0, 0, 0};
+  glm::vec3 center{0, 0, 0};
+  glm::vec3 up{0, 1, 0};
+
+  tinygltf::Camera cam;
+};
+
+// See: https://github.com/KhronosGroup/glTF/blob/master/extensions/2.0/Khronos/KHR_lights_punctual/README.md
+struct GltfLight
+{
+  glm::mat4       worldMatrix{1};
+  tinygltf::Light light;
+};
+
+
+enum class GltfAttributes : uint8_t
+{
+  NoAttribs  = 0,
+  Position   = 1,
+  Normal     = 2,
+  Texcoord_0 = 4,
+  Texcoord_1 = 8,
+  Tangent    = 16,
+  Color_0    = 32,
+  All        = 0xFF
+};
+
+using GltfAttributes_t = std::underlying_type_t<GltfAttributes>;
+
+inline GltfAttributes operator|(GltfAttributes lhs, GltfAttributes rhs)
+{
+  return static_cast<GltfAttributes>(static_cast<GltfAttributes_t>(lhs) | static_cast<GltfAttributes_t>(rhs));
+}
+
+inline GltfAttributes operator&(GltfAttributes lhs, GltfAttributes rhs)
+{
+  return static_cast<GltfAttributes>(static_cast<GltfAttributes_t>(lhs) & static_cast<GltfAttributes_t>(rhs));
+}
+
+//--------------------------------------------------------------------------------------------------
+// Class to convert gltfScene in simple draw-able format
+//
+struct GltfScene
+{
+  // Importing all materials in a vector of GltfMaterial structure
+  void importMaterials(const tinygltf::Model& tmodel);
+
+  // Import all Mesh and primitives in a vector of GltfPrimMesh,
+  // - Reads all requested GltfAttributes and create them if `forceRequested` contains it.
+  // - Create a vector of GltfNode, GltfLight and GltfCamera
+  void importDrawableNodes(const tinygltf::Model& tmodel,
+                           GltfAttributes         requestedAttributes,
+                           GltfAttributes         forceRequested = GltfAttributes::All);
+
+  void exportDrawableNodes(tinygltf::Model& tmodel, GltfAttributes requestedAttributes);
+
+  // Compute the scene bounding box
+  void computeSceneDimensions();
+
+  // Removes everything
+  void destroy();
+
+  static GltfStats getStatistics(const tinygltf::Model& tinyModel);
+
+  // Scene data
+  std::vector<GltfMaterial> m_materials;   // Material for shading
+  std::vector<GltfNode>     m_nodes;       // Drawable nodes, flat hierarchy
+  std::vector<GltfPrimMesh> m_primMeshes;  // Primitive promoted to meshes
+  std::vector<GltfCamera>   m_cameras;
+  std::vector<GltfLight>    m_lights;
+
+  // Attributes, all same length if valid
+  std::vector<glm::vec3> m_positions;
+  std::vector<uint32_t>  m_indices;
+  std::vector<glm::vec3> m_normals;
+  std::vector<glm::vec4> m_tangents;
+  std::vector<glm::vec2> m_texcoords0;
+  std::vector<glm::vec2> m_texcoords1;
+  std::vector<glm::vec4> m_colors0;
+
+  // #TODO - Adding support for Skinning
+  //using vec4us = vector4<unsigned short>;
+  //std::vector<vec4us>        m_joints0;
+  //std::vector<glm::vec4> m_weights0;
+
+  // Size of the scene
+  struct Dimensions
+  {
+    glm::vec3 min = glm::vec3(std::numeric_limits<float>::max());
+    glm::vec3 max = glm::vec3(std::numeric_limits<float>::min());
+    glm::vec3 size{0.f};
+    glm::vec3 center{0.f};
+    float     radius{0};
+  } m_dimensions;
+
+
+private:
+  void processNode(const tinygltf::Model& tmodel, int& nodeIdx, const glm::mat4& parentMatrix);
+  void processMesh(const tinygltf::Model&     tmodel,
+                   const tinygltf::Primitive& tmesh,
+                   GltfAttributes             requestedAttributes,
+                   GltfAttributes             forceRequested,
+                   const std::string&         name);
+
+  void createNormals(GltfPrimMesh& resultMesh);
+  void createTexcoords(GltfPrimMesh& resultMesh);
+  void createTangents(GltfPrimMesh& resultMesh);
+  void createColors(GltfPrimMesh& resultMesh);
+
+  // Temporary data
+  std::unordered_map<int, std::vector<uint32_t>> m_meshToPrimMeshes;
+  std::vector<uint32_t>                          primitiveIndices32u;
+  std::vector<uint16_t>                          primitiveIndices16u;
+  std::vector<uint8_t>                           primitiveIndices8u;
+
+  std::unordered_map<std::string, GltfPrimMesh> m_cachePrimMesh;
+
+  void computeCamera();
+  void checkRequiredExtensions(const tinygltf::Model& tmodel);
+  void findUsedMeshes(const tinygltf::Model& tmodel, std::set<uint32_t>& usedMeshes, int nodeIdx);
+};
+
+glm::mat4 getLocalMatrix(const tinygltf::Node& tnode);
+
+// Return a vector of data for a tinygltf::Value
+template <typename T>
+static inline std::vector<T> getVector(const tinygltf::Value& value)
+{
+  std::vector<T> result{0};
+  if(!value.IsArray())
+    return result;
+  result.resize(value.ArrayLen());
+  for(int i = 0; i < value.ArrayLen(); i++)
+  {
+    result[i] = static_cast<T>(value.Get(i).IsNumber() ? value.Get(i).Get<double>() : value.Get(i).Get<int>());
+  }
+  return result;
+}
+
+static inline void getFloat(const tinygltf::Value& value, const std::string& name, float& val)
+{
+  if(value.Has(name))
+  {
+    val = static_cast<float>(value.Get(name).Get<double>());
+  }
+}
+
+static inline void getInt(const tinygltf::Value& value, const std::string& name, int& val)
+{
+  if(value.Has(name))
+  {
+    val = value.Get(name).Get<int>();
+  }
+}
+
+static inline void getVec2(const tinygltf::Value& value, const std::string& name, glm::vec2& val)
+{
+  if(value.Has(name))
+  {
+    auto s = getVector<float>(value.Get(name));
+    val    = glm::vec2{s[0], s[1]};
+  }
+}
+
+static inline void getVec3(const tinygltf::Value& value, const std::string& name, glm::vec3& val)
+{
+  if(value.Has(name))
+  {
+    auto s = getVector<float>(value.Get(name));
+    val    = glm::vec3{s[0], s[1], s[2]};
+  }
+}
+
+static inline void getVec4(const tinygltf::Value& value, const std::string& name, glm::vec4& val)
+{
+  if(value.Has(name))
+  {
+    auto s = getVector<float>(value.Get(name));
+    val    = glm::vec4{s[0], s[1], s[2], s[3]};
+  }
+}
+
+static inline void getTexId(const tinygltf::Value& value, const std::string& name, int& val)
+{
+  if(value.Has(name))
+  {
+    val = value.Get(name).Get("index").Get<int>();
+  }
+}
+
+// Calls a function (such as a lambda function) for each (index, value) pair in
+// a sparse accessor. It's only potentially called for indices from
+// accessorFirstElement through accessorFirstElement + numElementsToProcess - 1.
+template <class T>
+void forEachSparseValue(const tinygltf::Model&                            tmodel,
+                        const tinygltf::Accessor&                         accessor,
+                        size_t                                            accessorFirstElement,
+                        size_t                                            numElementsToProcess,
+                        std::function<void(size_t index, const T* value)> fn)
+{
+  if(!accessor.sparse.isSparse)
+  {
+    return;  // Nothing to do
+  }
+
+  const auto& idxs = accessor.sparse.indices;
+  if(!(idxs.componentType == TINYGLTF_COMPONENT_TYPE_UNSIGNED_BYTE      //
+       || idxs.componentType == TINYGLTF_COMPONENT_TYPE_UNSIGNED_SHORT  //
+       || idxs.componentType == TINYGLTF_COMPONENT_TYPE_UNSIGNED_INT))
+  {
+    assert(!"Unsupported sparse accessor index type.");
+    return;
+  }
+
+  const tinygltf::BufferView& idxBufferView = tmodel.bufferViews[idxs.bufferView];
+  const unsigned char*        idxBuffer     = &tmodel.buffers[idxBufferView.buffer].data[idxBufferView.byteOffset];
+  const size_t                idxBufferByteStride =
+      idxBufferView.byteStride ? idxBufferView.byteStride : tinygltf::GetComponentSizeInBytes(idxs.componentType);
+  if(idxBufferByteStride == size_t(-1))
+    return;  // Invalid
+
+  const auto&                 vals          = accessor.sparse.values;
+  const tinygltf::BufferView& valBufferView = tmodel.bufferViews[vals.bufferView];
+  const unsigned char*        valBuffer     = &tmodel.buffers[valBufferView.buffer].data[valBufferView.byteOffset];
+  const size_t                valBufferByteStride = accessor.ByteStride(valBufferView);
+  if(valBufferByteStride == size_t(-1))
+    return;  // Invalid
+
+  // Note that this could be faster for lots of small copies, since we could
+  // binary search for the first sparse accessor index to use (since the
+  // glTF specification requires the indices be sorted)!
+  for(size_t pairIdx = 0; pairIdx < accessor.sparse.count; pairIdx++)
+  {
+    // Read the index from the index buffer, converting its type
+    size_t               index = 0;
+    const unsigned char* pIdx  = idxBuffer + idxBufferByteStride * pairIdx;
+    switch(idxs.componentType)
+    {
+      case TINYGLTF_COMPONENT_TYPE_UNSIGNED_BYTE:
+        index = *reinterpret_cast<const uint8_t*>(pIdx);
+        break;
+      case TINYGLTF_COMPONENT_TYPE_UNSIGNED_SHORT:
+        index = *reinterpret_cast<const uint16_t*>(pIdx);
+        break;
+      case TINYGLTF_COMPONENT_TYPE_UNSIGNED_INT:
+        index = *reinterpret_cast<const uint32_t*>(pIdx);
+        break;
+    }
+
+    // If it's not in range, skip it
+    if(index < accessorFirstElement || (index - accessorFirstElement) >= numElementsToProcess)
+    {
+      continue;
+    }
+
+    fn(index, reinterpret_cast<const T*>(valBuffer + valBufferByteStride * pairIdx));
+  }
+}
+
+// Copies accessor elements accessorFirstElement through
+// accessorFirstElement + numElementsToCopy - 1 to outData elements
+// outFirstElement through outFirstElement + numElementsToCopy - 1.
+// This handles sparse accessors correctly! It's intended as a replacement for
+// what would be memcpy(..., &buffer.data[...], ...) calls.
+//
+// However, it performs no conversion: it assumes (but does not check) that
+// accessor's elements are of type T. For instance, T should be a struct of two
+// floats for a VEC2 float accessor.
+//
+// This is range-checked, so elements that would be out-of-bounds are not
+// copied. We assume size_t overflow does not occur.
+// Note that outDataSizeInT is the number of elements in the outDataBuffer,
+// while numElementsToCopy is the number of elements to copy, not the number
+// of elements in accessor.
+template <class T>
+void copyAccessorData(T*                        outData,
+                      size_t                    outDataSizeInElements,
+                      size_t                    outFirstElement,
+                      const tinygltf::Model&    tmodel,
+                      const tinygltf::Accessor& accessor,
+                      size_t                    accessorFirstElement,
+                      size_t                    numElementsToCopy)
+{
+  if(outFirstElement >= outDataSizeInElements)
+  {
+    assert(!"Invalid outFirstElement!");
+    return;
+  }
+
+  if(accessorFirstElement >= accessor.count)
+  {
+    assert(!"Invalid accessorFirstElement!");
+    return;
+  }
+
+  const tinygltf::BufferView& bufferView = tmodel.bufferViews[accessor.bufferView];
+  const unsigned char* buffer = &tmodel.buffers[bufferView.buffer].data[accessor.byteOffset + bufferView.byteOffset];
+
+  const size_t maxSafeCopySize = std::min(accessor.count - accessorFirstElement, outDataSizeInElements - outFirstElement);
+  numElementsToCopy = std::min(numElementsToCopy, maxSafeCopySize);
+
+  if(bufferView.byteStride == 0)
+  {
+    memcpy(outData + outFirstElement, reinterpret_cast<const T*>(buffer) + accessorFirstElement, numElementsToCopy * sizeof(T));
+  }
+  else
+  {
+    // Must copy one-by-one
+    for(size_t i = 0; i < numElementsToCopy; i++)
+    {
+      outData[outFirstElement + i] = *reinterpret_cast<const T*>(buffer + bufferView.byteStride * i);
+    }
+  }
+
+  // Handle sparse accessors by overwriting already copied elements.
+  forEachSparseValue<T>(tmodel, accessor, accessorFirstElement, numElementsToCopy,
+                        [&outData](size_t index, const T* value) { outData[index] = *value; });
+}
+
+// Same as copyAccessorData(T*, ...), but taking a vector.
+template <class T>
+void copyAccessorData(std::vector<T>&           outData,
+                      size_t                    outFirstElement,
+                      const tinygltf::Model&    tmodel,
+                      const tinygltf::Accessor& accessor,
+                      size_t                    accessorFirstElement,
+                      size_t                    numElementsToCopy)
+{
+  copyAccessorData<T>(outData.data(), outData.size(), outFirstElement, tmodel, accessor, accessorFirstElement, numElementsToCopy);
+}
+
+// Appending to \p attribVec, all the values of \p accessor
+// Return false if the accessor is invalid.
+// T must be glm::vec2, glm::vec3, or glm::vec4.
+template <typename T>
+static bool getAccessorData(const tinygltf::Model& tmodel, const tinygltf::Accessor& accessor, std::vector<T>& attribVec)
+{
+  // Retrieving the data of the accessor
+  const auto nbElems = accessor.count;
+
+  const size_t oldNumElements = attribVec.size();
+  attribVec.resize(oldNumElements + nbElems);
+
+  // Copying the attributes
+  if(accessor.componentType == TINYGLTF_COMPONENT_TYPE_FLOAT)
+  {
+    copyAccessorData<T>(attribVec, oldNumElements, tmodel, accessor, 0, accessor.count);
+  }
+  else
+  {
+    // The component is smaller than float and need to be converted
+    const auto&          bufView    = tmodel.bufferViews[accessor.bufferView];
+    const auto&          buffer     = tmodel.buffers[bufView.buffer];
+    const unsigned char* bufferByte = &buffer.data[accessor.byteOffset + bufView.byteOffset];
+
+    // 2, 3, 4 for VEC2, VEC3, VEC4
+    const int nbComponents = tinygltf::GetNumComponentsInType(accessor.type);
+    if(nbComponents == -1)
+      return false;  // Invalid
+
+    // Stride per element
+    const size_t byteStride = accessor.ByteStride(bufView);
+    if(byteStride == size_t(-1))
+      return false;  // Invalid
+
+    if(!(accessor.componentType == TINYGLTF_COMPONENT_TYPE_BYTE || accessor.componentType == TINYGLTF_COMPONENT_TYPE_UNSIGNED_BYTE
+         || accessor.componentType == TINYGLTF_COMPONENT_TYPE_SHORT || accessor.componentType == TINYGLTF_COMPONENT_TYPE_UNSIGNED_SHORT))
+    {
+      assert(!"Unhandled tinygltf component type!");
+      return false;
+    }
+
+    const auto& copyElementFn = [&](size_t elementIdx, const unsigned char* pElement) {
+      T vecValue;
+
+      for(int c = 0; c < nbComponents; c++)
+      {
+        switch(accessor.componentType)
+        {
+          case TINYGLTF_COMPONENT_TYPE_BYTE:
+            vecValue[c] = float(*(reinterpret_cast<const char*>(pElement) + c));
+            if(accessor.normalized)
+            {
+              vecValue[c] = std::max(vecValue[c] / 127.f, -1.f);
+            }
+            break;
+          case TINYGLTF_COMPONENT_TYPE_UNSIGNED_BYTE:
+            vecValue[c] = float(*(reinterpret_cast<const unsigned char*>(pElement) + c));
+            if(accessor.normalized)
+            {
+              vecValue[c] = vecValue[c] / 255.f;
+            }
+            break;
+          case TINYGLTF_COMPONENT_TYPE_SHORT:
+            vecValue[c] = float(*(reinterpret_cast<const short*>(pElement) + c));
+            if(accessor.normalized)
+            {
+              vecValue[c] = std::max(vecValue[c] / 32767.f, -1.f);
+            }
+            break;
+          case TINYGLTF_COMPONENT_TYPE_UNSIGNED_SHORT:
+            vecValue[c] = float(*(reinterpret_cast<const unsigned short*>(pElement) + c));
+            if(accessor.normalized)
+            {
+              vecValue[c] = vecValue[c] / 65535.f;
+            }
+            break;
+        }
+      }
+
+      attribVec[oldNumElements + elementIdx] = vecValue;
+    };
+
+    for(size_t i = 0; i < nbElems; i++)
+    {
+      copyElementFn(i, bufferByte + byteStride * i);
+    }
+
+    forEachSparseValue<unsigned char>(tmodel, accessor, 0, nbElems, copyElementFn);
+  }
+
+  return true;
+}
+
+// Appending to \p attribVec, all the values of \p attribName
+// Return false if the attribute is missing or invalid.
+// T must be glm::vec2, glm::vec3, or glm::vec4.
+template <typename T>
+static bool getAttribute(const tinygltf::Model& tmodel, const tinygltf::Primitive& primitive, std::vector<T>& attribVec, const std::string& attribName)
+{
+  const auto& it = primitive.attributes.find(attribName);
+  if(it == primitive.attributes.end())
+    return false;
+  const auto& accessor = tmodel.accessors[it->second];
+  return getAccessorData(tmodel, accessor, attribVec);
+}
+
+inline bool hasExtension(const tinygltf::ExtensionMap& extensions, const std::string& name)
+{
+  return extensions.find(name) != extensions.end();
+}
+
+// This is appending the incoming data to the binary buffer (just one)
+// and return the amount in byte of data that was added.
+template <class T>
+uint32_t appendData(tinygltf::Buffer& buffer, const T& inData)
+{
+  auto*    pData = reinterpret_cast<const char*>(inData.data());
+  uint32_t len   = static_cast<uint32_t>(sizeof(inData[0]) * inData.size());
+  buffer.data.insert(buffer.data.end(), pData, pData + len);
+  return len;
+}
+
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/inputparser.h
+++ b/raytracer/nvpro_core/nvh/inputparser.h
@ -0,0 +1,118 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+//--------------------------------------------------------------------------------------------------
+/** @DOC_START
+  # class InputParser
+  > InputParser is a Simple command line parser
+  
+  Example of usage for: test.exe -f name.txt -size 200 100
+  
+  Parsing the command line: mandatory '-f' for the filename of the scene
+
+  ```cpp
+  nvh::InputParser parser(argc, argv);
+  std::string filename = parser.getString("-f");
+  if(filename.empty())  filename = "default.txt";
+  if(parser.exist("-size") {
+        auto values = parser.getInt2("-size");
+  ```
+@DOC_END */
+
+#pragma once
+#include <string>
+#include <vector>
+#include <array>
+
+class InputParser
+{
+public:
+  InputParser(int& argc, char** argv)
+  {
+    for(int i = 1; i < argc; ++i)
+    {
+      if(argv[i])
+      {
+        m_tokens.emplace_back(argv[i]);
+      }
+    }
+  }
+
+  auto findOption(const std::string& option) const { return std::find(m_tokens.begin(), m_tokens.end(), option); }
+  const std::string getString(const std::string& option, std::string defaultString = "") const
+  {
+    if(exist(option))
+    {
+      auto itr = findOption(option);
+      if(itr != m_tokens.end() && ++itr != m_tokens.end())
+      {
+        return *itr;
+      }
+    }
+
+    return defaultString;
+  }
+
+  std::vector<std::string> getString(const std::string& option, uint32_t nbElem) const
+  {
+    auto                     itr = findOption(option);
+    std::vector<std::string> items;
+    while(itr != m_tokens.end() && ++itr != m_tokens.end() && nbElem-- > 0)
+    {
+      items.push_back((*itr));
+    }
+    return items;
+  }
+
+  int getInt(const std::string& option, int defaultValue = 0) const
+  {
+    if(exist(option))
+      return std::stoi(getString(option));
+    return defaultValue;
+  }
+
+  auto getInt2(const std::string& option, std::array<int, 2> defaultValues = {0, 0}) const
+  {
+    if(exist(option))
+    {
+      auto items = getString(option, 2);
+      if(items.size() == 2)
+      {
+        defaultValues[0] = std::stoi(items[0]);
+        defaultValues[1] = std::stoi(items[1]);
+      }
+    }
+
+    return defaultValues;
+  }
+
+  float getFloat(const std::string& option, float defaultValue = 0.0f) const
+  {
+    if(exist(option))
+      return std::stof(getString(option));
+
+    return defaultValue;
+  }
+
+  bool exist(const std::string& option) const { return findOption(option) != m_tokens.end(); }
+
+private:
+  std::vector<std::string> m_tokens;
+};
--- a/raytracer/nvpro_core/nvh/misc.hpp
+++ b/raytracer/nvpro_core/nvh/misc.hpp
@ -0,0 +1,120 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_MISC_INCLUDED
+#define NV_MISC_INCLUDED
+
+#include <algorithm>
+#include <assert.h>
+#include <math.h>
+#include <stdlib.h>
+#include <string>
+#include <vector>
+
+#include "nvprint.hpp"
+
+/** @DOC_START
+ # functions in nvh
+
+ - mipMapLevels : compute number of mip maps
+ - stringFormat : sprintf for std::string
+ - frand : random float using rand()
+ - permutation : fills uint vector with random permutation of values [0... vec.size-1]
+ @DOC_END */
+
+namespace nvh {
+
+inline std::string stringFormat(const char* msg, ...)
+{
+  va_list list;
+
+  if(msg == 0)
+    return std::string();
+
+  // Speculate needed string size and vsnprintf to std::string.
+  // If it was too small, we resize and try for the second (and final) time.
+  std::string str;
+  str.resize(64);
+
+  for(int i = 0; i < 2; ++i)
+  {
+    va_start(list, msg);
+    int charsNeeded = vsnprintf(&str[0], str.size(), msg, list);  // charsNeeded doesn't count \0
+    va_end(list);
+
+    if(charsNeeded < 0)
+    {
+      assert(!"encoding error");
+      return std::string();
+    }
+
+    if(charsNeeded < str.size())
+    {  // Not <= due to \0 terminator (which we trim out of std::string)
+      str.resize(charsNeeded);
+      return str;
+    }
+    else
+    {
+      str.resize(charsNeeded + 1);  // Leave room for \0
+    }
+  }
+
+  assert(!"String should have been resized perfectly second try");
+  return std::string();
+}
+
+inline float frand()
+{
+  return float(rand() % RAND_MAX) / float(RAND_MAX);
+}
+
+inline int mipMapLevels(int size)
+{
+  int num = 0;
+  while(size)
+  {
+    num++;
+    size /= 2;
+  }
+  return num;
+}
+
+// permutation creates a random permutation of all integer values
+// 0..data.size-1 occuring once within data.
+
+inline void permutation(std::vector<unsigned int>& data)
+{
+  size_t size = data.size();
+  assert(size < RAND_MAX);
+
+  for(size_t i = 0; i < size; i++)
+  {
+    data[i] = (unsigned int)(i);
+  }
+
+  for(size_t i = size - 1; i > 0; i--)
+  {
+    size_t other = rand() % (i + 1);
+    std::swap(data[i], data[other]);
+  }
+}
+}  // namespace nvh
+
+#endif
--- a/raytracer/nvpro_core/nvh/nsightevents.h
+++ b/raytracer/nvpro_core/nvh/nsightevents.h
@ -0,0 +1,79 @@
+#ifndef __NSIGHTEVENTS__
+#define __NSIGHTEVENTS__
+
+/// @DOC_SKIP (keyword to exclude this file from automatic README.md generation)
+
+//-----------------------------------------------------------------------------
+// NSIGHT
+//-----------------------------------------------------------------------------
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+// NSight perf markers - take the whole stuff from "C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\nvToolsExt"
+#include <nvtx3/nvToolsExt.h>
+
+typedef int(NVTX_API* nvtxRangePushEx_Pfn)(const nvtxEventAttributes_t* eventAttrib);
+typedef int(NVTX_API* nvtxRangePush_Pfn)(const char* message);
+typedef int(NVTX_API* nvtxRangePop_Pfn)();
+extern nvtxRangePushEx_Pfn   nvtxRangePushEx_dyn;
+extern nvtxRangePush_Pfn     nvtxRangePush_dyn;
+extern nvtxRangePop_Pfn      nvtxRangePop_dyn;
+extern nvtxEventAttributes_t eventAttr;
+
+#define NX_RANGE nvtxRangeId_t
+#define NX_MARK(name) nvtxMark(name)
+#define NX_RANGESTART(name) nvtxRangeStart(name)
+#define NX_RANGEEND(id) nvtxRangeEnd(id)
+#define NX_RANGEPUSH(name) nvtxRangePush(name)
+#define NX_RANGEPUSHCOL(name, c)                                                                                       \
+  {                                                                                                                    \
+    nvtxEventAttributes_t eventAttrib = {0};                                                                           \
+    eventAttrib.version               = NVTX_VERSION;                                                                  \
+    eventAttrib.size                  = NVTX_EVENT_ATTRIB_STRUCT_SIZE;                                                 \
+    eventAttrib.colorType             = NVTX_COLOR_ARGB;                                                               \
+    eventAttrib.color                 = c;                                                                             \
+    eventAttrib.messageType           = NVTX_MESSAGE_TYPE_ASCII;                                                       \
+    eventAttrib.message.ascii         = name;                                                                          \
+    nvtxRangePushEx(&eventAttrib);                                                                                     \
+  }
+#define NX_RANGEPOP() nvtxRangePop()
+struct NXProfileFunc
+{
+  NXProfileFunc(const char* name, uint32_t c, /*int64_t*/ uint32_t p = 0)
+  {
+    nvtxEventAttributes_t eventAttrib = {0};
+    // set the version and the size information
+    eventAttrib.version = NVTX_VERSION;
+    eventAttrib.size    = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
+    // configure the attributes.  0 is the default for all attributes.
+    eventAttrib.colorType       = NVTX_COLOR_ARGB;
+    eventAttrib.color           = c;
+    eventAttrib.messageType     = NVTX_MESSAGE_TYPE_ASCII;
+    eventAttrib.message.ascii   = name;
+    eventAttrib.payloadType     = NVTX_PAYLOAD_TYPE_INT64;
+    eventAttrib.payload.llValue = (int64_t)p;
+    eventAttrib.category        = (uint32_t)p;
+    nvtxRangePushEx(&eventAttrib);
+  }
+  ~NXProfileFunc() { nvtxRangePop(); }
+};
+#ifdef NXPROFILEFUNC
+#undef NXPROFILEFUNC
+#undef NXPROFILEFUNCCOL
+#undef NXPROFILEFUNCCOL2
+#endif
+#define NXPROFILEFUNC(name) NXProfileFunc nxProfileMe(name, 0xFF0000FF)
+#define NXPROFILEFUNCCOL(name, c) NXProfileFunc nxProfileMe(name, c)
+#define NXPROFILEFUNCCOL2(name, c, p) NXProfileFunc nxProfileMe(name, c, p)
+#else
+#define NX_RANGE int
+#define NX_MARK(name)
+#define NX_RANGESTART(name) 0
+#define NX_RANGEEND(id)
+#define NX_RANGEPUSH(name)
+#define NX_RANGEPUSHCOL(name, c)
+#define NX_RANGEPOP()
+#define NXPROFILEFUNC(name)
+#define NXPROFILEFUNCCOL(name, c)
+#define NXPROFILEFUNCCOL2(name, c, a)
+#endif
+
+#endif  //__NSIGHTEVENTS__
--- a/raytracer/nvpro_core/nvh/nvml_monitor.cpp
+++ b/raytracer/nvpro_core/nvh/nvml_monitor.cpp
@ -0,0 +1,609 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+ *
+ * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+ * property and proprietary rights in and to this material, related
+ * documentation and any modifications thereto. Any use, reproduction,
+ * disclosure or distribution of this material and related documentation
+ * without an express license agreement from NVIDIA CORPORATION or
+ * its affiliates is strictly prohibited.
+ */
+
+
+#ifdef WIN32
+#include <windows.h>
+#endif
+
+#include "nvh/nvml_monitor.hpp"
+
+#include <iostream>
+#include <string>
+#include <vector>
+#include <chrono>
+
+#if defined(NVP_SUPPORTS_NVML)
+#define NVML_NO_UNVERSIONED_FUNC_DEFS
+#include <nvml.h>
+#ifdef _WIN32
+// The cfgmgr32 header is necessary for interrogating driver information in the registry.
+#include <cfgmgr32.h>
+// For convenience the library is also linked in automatically using the #pragma command.
+#pragma comment(lib, "Cfgmgr32.lib")
+#endif
+
+
+#define CHECK_NVML_CALL()                                                                                              \
+  if(res != NVML_SUCCESS)                                                                                              \
+  {                                                                                                                    \
+    LOGE("NVML Error %s\n", nvmlErrorString(res));                                                                     \
+  }
+
+#define CHECK_NVML(fun)                                                                                                \
+  {                                                                                                                    \
+    nvmlReturn_t res = fun;                                                                                            \
+    if(res != NVML_SUCCESS)                                                                                            \
+    {                                                                                                                  \
+      LOGE("NVML Error in %s: %s\n", #fun, nvmlErrorString(res));                                                      \
+    }                                                                                                                  \
+  }
+
+#define CHECK_NVML_SUPPORT(fun, field)                                                                                 \
+  {                                                                                                                    \
+    nvmlReturn_t res = fun;                                                                                            \
+    if(res != NVML_SUCCESS)                                                                                            \
+    {                                                                                                                  \
+      field.isSupported = false;                                                                                       \
+    }                                                                                                                  \
+    else                                                                                                               \
+    {                                                                                                                  \
+      field.isSupported = true;                                                                                        \
+    }                                                                                                                  \
+  }
+
+
+static const std::string brandToString(nvmlBrandType_t brand)
+{
+  switch(brand)
+  {
+    case NVML_BRAND_UNKNOWN:
+      return "Unknown";
+
+    case NVML_BRAND_QUADRO:
+      return "Quadro";
+    case NVML_BRAND_TESLA:
+      return "Tesla";
+    case NVML_BRAND_NVS:
+      return "NVS";
+    case NVML_BRAND_GRID:
+      return "Grid";
+    case NVML_BRAND_GEFORCE:
+      return "GeForce";
+    case NVML_BRAND_TITAN:
+      return "Titan";
+    case NVML_BRAND_NVIDIA_VAPPS:
+      return "NVIDIA Virtual Applications";
+
+    case NVML_BRAND_NVIDIA_VPC:
+      return "NVIDIA Virtual PC";
+    case NVML_BRAND_NVIDIA_VCS:
+      return "NVIDIA Virtual Compute Server";
+    case NVML_BRAND_NVIDIA_VWS:
+      return "NVIDIA RTX Virtual Workstation";
+    case NVML_BRAND_NVIDIA_CLOUD_GAMING:
+      return "NVIDIA Cloud Gaming";
+    case NVML_BRAND_QUADRO_RTX:
+      return "Quadro RTX";
+    case NVML_BRAND_NVIDIA_RTX:
+      return "NVIDIA RTX";
+    case NVML_BRAND_NVIDIA:
+      return "NVIDIA";
+    case NVML_BRAND_GEFORCE_RTX:
+      return "GeForce RTX";
+    case NVML_BRAND_TITAN_RTX:
+      return "Titan RTX";
+  }
+  return "Unknown";
+}
+
+static const std::string computeModeToString(nvmlComputeMode_t computeMode)
+{
+  switch(computeMode)
+  {
+    case NVML_COMPUTEMODE_DEFAULT:
+      return "Default";
+    case NVML_COMPUTEMODE_EXCLUSIVE_THREAD:
+      return "Exclusive thread";
+    case NVML_COMPUTEMODE_PROHIBITED:
+      return "Compute prohibited";
+    case NVML_COMPUTEMODE_EXCLUSIVE_PROCESS:
+      return "Exclusive process";
+    default:
+      return "Unknown";
+  }
+}
+#endif
+
+//-------------------------------------------------------------------------------------------------
+//
+//
+nvvkhl::NvmlMonitor::NvmlMonitor(uint32_t interval /*= 100*/, uint32_t limit /*= 100*/)
+    : m_maxElements(limit)     // limit : number of measures
+    , m_minInterval(interval)  // interval : ms between sampling
+{
+#if defined(NVP_SUPPORTS_NVML)
+
+  nvmlReturn_t result;
+  result = nvmlInit();
+  if(result != NVML_SUCCESS)
+    return;
+  if(nvmlDeviceGetCount(&m_physicalGpuCount) != NVML_SUCCESS)
+    return;
+
+  m_deviceInfo.resize(m_physicalGpuCount);
+  m_deviceMemory.resize(m_physicalGpuCount);
+  m_deviceUtilization.resize(m_physicalGpuCount);
+  m_devicePerformanceState.resize(m_physicalGpuCount);
+  m_devicePowerState.resize(m_physicalGpuCount);
+
+  // System Info
+  m_sysInfo.cpu.resize(m_maxElements);
+
+  // Get driver version
+  char driverVersion[80];
+  result = nvmlSystemGetDriverVersion(driverVersion, 80);
+  if(result == NVML_SUCCESS)
+    m_sysInfo.driverVersion = driverVersion;
+
+  // Loop over all GPUs
+  for(int i = 0; i < (int)m_physicalGpuCount; i++)
+  {
+    // Sizing the data
+    m_deviceMemory[i].init(m_maxElements);
+    m_deviceUtilization[i].init(m_maxElements);
+    m_devicePerformanceState[i].init(m_maxElements);
+    m_devicePowerState[i].init(m_maxElements);
+
+    // Retrieving general capabilities
+    nvmlDevice_t device;
+
+    result = nvmlDeviceGetHandleByIndex(i, &device);
+    m_deviceInfo[i].refresh(device);
+  }
+  m_valid = true;
+#endif
+}
+
+//-------------------------------------------------------------------------------------------------
+// Destructor: shutting down NVML
+//
+nvvkhl::NvmlMonitor::~NvmlMonitor()
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlShutdown();
+#endif
+}
+
+#if defined(NVP_SUPPORTS_NVML)
+
+//-------------------------------------------------------------------------------------------------
+// Returning the current amount of memory is used by the device
+static uint64_t getMemory(nvmlDevice_t device)
+{
+  try
+  {
+    nvmlMemory_t memory{};
+    nvmlDeviceGetMemoryInfo(device, &memory);
+    return memory.used;
+  }
+  catch(std::exception ex)
+  {
+    return 0ULL;
+  }
+}
+
+static float getLoad(nvmlDevice_t device)
+{
+  nvmlUtilization_t utilization{};
+  nvmlReturn_t      result = nvmlDeviceGetUtilizationRates(device, &utilization);
+  if(result != NVML_SUCCESS)
+    return 0.0f;
+  return static_cast<float>(utilization.gpu);
+}
+
+
+static float getCpuLoad()
+{
+#ifdef _WIN32
+  static uint64_t s_previousTotalTicks = 0;
+  static uint64_t s_previousIdleTicks  = 0;
+
+  FILETIME idleTime, kernelTime, userTime;
+  if(!GetSystemTimes(&idleTime, &kernelTime, &userTime))
+    return 0.0f;
+
+  auto fileTimeToInt64 = [](const FILETIME& ft) {
+    return (((uint64_t)(ft.dwHighDateTime)) << 32) | ((uint64_t)ft.dwLowDateTime);
+  };
+
+  auto totalTicks = fileTimeToInt64(kernelTime) + fileTimeToInt64(userTime);
+  auto idleTicks  = fileTimeToInt64(idleTime);
+
+  uint64_t totalTicksSinceLastTime = totalTicks - s_previousTotalTicks;
+  uint64_t idleTicksSinceLastTime  = idleTicks - s_previousIdleTicks;
+
+  float result = 1.0f - ((totalTicksSinceLastTime > 0) ? ((float)idleTicksSinceLastTime) / totalTicksSinceLastTime : 0);
+
+  s_previousTotalTicks = totalTicks;
+  s_previousIdleTicks  = idleTicks;
+
+  return result * 100.f;
+#else
+  return 0;
+#endif
+}
+
+#endif
+
+//-------------------------------------------------------------------------------------------------
+// Pulling the information from NVML and storing the data
+// Note: the interval is important, as it cannot be query too quickly
+//
+void nvvkhl::NvmlMonitor::refresh()
+{
+#if defined(NVP_SUPPORTS_NVML)
+
+  static std::chrono::high_resolution_clock::time_point s_startTime;
+
+  if(!m_valid)
+    return;
+
+  // Pulling the information only when it is over the defined interval
+  const auto now = std::chrono::high_resolution_clock::now();
+  const auto t   = std::chrono::duration_cast<std::chrono::milliseconds>(now - s_startTime).count();
+  if(t < m_minInterval)
+    return;
+  s_startTime = now;
+
+  // Increasing where to store the value
+  m_offset = (m_offset + 1) % m_maxElements;
+
+  // System
+  m_sysInfo.cpu[m_offset] = getCpuLoad();
+
+  // All GPUs
+  for(unsigned int gpu_id = 0; gpu_id < m_physicalGpuCount; gpu_id++)
+  {
+    nvmlDevice_t device;
+    nvmlReturn_t result = nvmlDeviceGetHandleByIndex(gpu_id, &device);
+
+    m_deviceMemory[gpu_id].refresh(device, m_offset);
+    m_deviceUtilization[gpu_id].refresh(device, m_offset);
+    m_devicePerformanceState[gpu_id].refresh(device, m_offset);
+    m_devicePowerState[gpu_id].refresh(device, m_offset);
+  }
+
+#endif  //  NVP_SUPPORTS_NVML
+}
+
+void nvvkhl::NvmlMonitor::DeviceInfo::refresh(void* dev)
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlDevice_t device = reinterpret_cast<nvmlDevice_t>(dev);
+
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetBoardId(device, &boardId.get()), boardId);
+
+  partNumber.get().resize(NVML_DEVICE_PART_NUMBER_BUFFER_SIZE);
+  CHECK_NVML_SUPPORT(
+      nvmlDeviceGetBoardPartNumber(device, partNumber.get().data(), static_cast<uint32_t>(partNumber.get().size())), partNumber);
+
+  nvmlBrandType_t brandType;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetBrand(device, &brandType), brand);
+  brand.get() = brandToString(brandType);
+
+  nvmlBridgeChipHierarchy_t bridgeChipHierarchy{};
+  CHECK_NVML_SUPPORT(nvmlDeviceGetBridgeChipInfo(device, &bridgeChipHierarchy), bridgeHierarchy);
+  bridgeHierarchy.get().resize(bridgeChipHierarchy.bridgeCount);
+  for(int i = 0; i < bridgeChipHierarchy.bridgeCount; i++)
+  {
+    bridgeHierarchy.get()[i].first = ((bridgeChipHierarchy.bridgeChipInfo[i].type == NVML_BRIDGE_CHIP_PLX) ? "PLX" : "BRO4");
+    bridgeHierarchy.get()[i].second = fmt::format("#{}", bridgeChipHierarchy.bridgeChipInfo[i].fwVersion);
+  }
+
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetCpuAffinity(device, 1, (unsigned long*)&cpuAffinity.get()), cpuAffinity);
+
+  nvmlComputeMode_t cMode;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetComputeMode(device, &cMode), computeMode);
+  computeMode = computeModeToString(cMode);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetCudaComputeCapability(device, &computeCapabilityMajor.get(), &computeCapabilityMinor.get()),
+                     computeCapabilityMajor);
+  computeCapabilityMinor.isSupported = computeCapabilityMajor.isSupported;
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetCurrPcieLinkGeneration(device, &pcieLinkGen.get()), pcieLinkGen);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetCurrPcieLinkWidth(device, &pcieLinkWidth.get()), pcieLinkWidth);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetDefaultApplicationsClock(device, NVML_CLOCK_GRAPHICS, &clockDefaultGraphics.get()),
+                     clockDefaultGraphics);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_GRAPHICS, &clockMaxGraphics.get()), clockMaxGraphics);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxCustomerBoostClock(device, NVML_CLOCK_GRAPHICS, &clockBoostGraphics.get()), clockBoostGraphics);
+
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetDefaultApplicationsClock(device, NVML_CLOCK_SM, &clockDefaultSM.get()), clockDefaultSM);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_SM, &clockMaxSM.get()), clockMaxSM);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxCustomerBoostClock(device, NVML_CLOCK_SM, &clockBoostSM.get()), clockBoostSM);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetDefaultApplicationsClock(device, NVML_CLOCK_MEM, &clockDefaultMem.get()), clockDefaultMem);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_MEM, &clockMaxMem.get()), clockMaxMem);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxCustomerBoostClock(device, NVML_CLOCK_MEM, &clockBoostMem.get()), clockBoostMem);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetDefaultApplicationsClock(device, NVML_CLOCK_VIDEO, &clockDefaultVideo.get()), clockDefaultVideo);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_VIDEO, &clockMaxVideo.get()), clockMaxVideo);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxCustomerBoostClock(device, NVML_CLOCK_VIDEO, &clockBoostVideo.get()), clockBoostVideo);
+
+#ifdef _WIN32
+  nvmlDriverModel_t currentDM, pendingDM;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetDriverModel(device, &currentDM, &pendingDM), currentDriverModel);
+  currentDriverModel             = (currentDM == NVML_DRIVER_WDDM) ? "WDDM" : "TCC";
+  pendingDriverModel             = (pendingDM == NVML_DRIVER_WDDM) ? "WDDM" : "TCC";
+  pendingDriverModel.isSupported = currentDriverModel.isSupported;
+#endif
+
+  nvmlEnableState_t currentES, pendingES;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetEccMode(device, &currentES, &pendingES), currentEccMode);
+  currentEccMode             = (currentES == NVML_FEATURE_ENABLED);
+  pendingEccMode             = (pendingES == NVML_FEATURE_ENABLED);
+  pendingEccMode.isSupported = currentEccMode.isSupported;
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetEncoderCapacity(device, NVML_ENCODER_QUERY_H264, &encoderCapacityH264.get()), encoderCapacityH264);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetEncoderCapacity(device, NVML_ENCODER_QUERY_HEVC, &encoderCapacityHEVC.get()), encoderCapacityHEVC);
+
+  infoROMImageVersion.get().resize(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetInforomImageVersion(device, infoROMImageVersion.get().data(),
+                                                      static_cast<uint32_t>(infoROMImageVersion.get().size())),
+                     infoROMImageVersion);
+
+  infoROMOEMVersion.get().resize(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE);
+  infoROMECCVersion.get().resize(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE);
+  infoROMPowerVersion.get().resize(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetInforomVersion(device, NVML_INFOROM_OEM, infoROMOEMVersion.get().data(),
+                                                 static_cast<uint32_t>(infoROMOEMVersion.get().size())),
+                     infoROMOEMVersion);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetInforomVersion(device, NVML_INFOROM_ECC, infoROMECCVersion.get().data(),
+                                                 static_cast<uint32_t>(infoROMECCVersion.get().size())),
+                     infoROMECCVersion);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetInforomVersion(device, NVML_INFOROM_POWER, infoROMPowerVersion.get().data(),
+                                                 static_cast<uint32_t>(infoROMPowerVersion.get().size())),
+                     infoROMPowerVersion);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxPcieLinkGeneration(device, &maxLinkGen.get()), maxLinkGen);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMaxPcieLinkWidth(device, &maxLinkWidth.get()), maxLinkWidth);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMinorNumber(device, &minorNumber.get()), minorNumber);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMultiGpuBoard(device, &multiGpuBool.get()), multiGpuBool);
+  deviceName.get().resize(NVML_DEVICE_NAME_V2_BUFFER_SIZE);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetName(device, deviceName.get().data(), static_cast<uint32_t>(deviceName.get().size())), deviceName);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetSupportedClocksThrottleReasons(device, reinterpret_cast<long long unsigned int*>(
+                                                                             &supportedClocksThrottleReasons.get())),
+                     supportedClocksThrottleReasons);
+
+  vbiosVersion.get().resize(NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE);
+  CHECK_NVML_SUPPORT(
+      nvmlDeviceGetVbiosVersion(device, vbiosVersion.get().data(), static_cast<uint32_t>(vbiosVersion.get().size())), vbiosVersion);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_SHUTDOWN,
+                                                       &tempThresholdShutdown.get()),
+                     tempThresholdShutdown);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_SLOWDOWN,
+                                                       &tempThresholdHWSlowdown.get()),
+                     tempThresholdHWSlowdown);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_MEM_MAX,
+                                                       &tempThresholdSWSlowdown.get()),
+                     tempThresholdSWSlowdown);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_GPU_MAX,
+                                                       &tempThresholdDropBelowBaseClock.get()),
+                     tempThresholdDropBelowBaseClock);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetPowerManagementLimit(device, &powerLimit.get()), powerLimit);
+  // Milliwatt to watt
+  powerLimit.get() /= 1000;
+
+  uint32_t supportedClockCount = 0;
+  if(nvmlDeviceGetSupportedMemoryClocks(device, &supportedClockCount, nullptr) == NVML_ERROR_INSUFFICIENT_SIZE)
+  {
+    supportedMemoryClocks.isSupported = true;
+    supportedMemoryClocks.get().resize(supportedClockCount);
+    nvmlDeviceGetSupportedMemoryClocks(device, &supportedClockCount, supportedMemoryClocks.get().data());
+  }
+
+  for(size_t i = 0; i < supportedMemoryClocks.get().size(); i++)
+  {
+    supportedClockCount = 0;
+    if(nvmlDeviceGetSupportedGraphicsClocks(device, supportedMemoryClocks.get()[i], &supportedClockCount, nullptr) == NVML_ERROR_INSUFFICIENT_SIZE)
+    {
+      supportedGraphicsClocks.isSupported = true;
+      auto& graphicsClocks                = supportedGraphicsClocks.get()[supportedMemoryClocks.get()[i]];
+      graphicsClocks.resize(supportedClockCount);
+      nvmlDeviceGetSupportedGraphicsClocks(device, supportedMemoryClocks.get()[i], &supportedClockCount, graphicsClocks.data());
+    }
+  }
+#endif
+}
+
+void nvvkhl::NvmlMonitor::DeviceMemory::init(uint32_t maxElements)
+{
+  memoryFree.get().resize(maxElements);
+  memoryUsed.get().resize(maxElements);
+
+  bar1Free.get().resize(maxElements);
+  bar1Used.get().resize(maxElements);
+}
+
+void nvvkhl::NvmlMonitor::DeviceMemory::refresh(void* dev, uint32_t offset)
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlDevice_t device = reinterpret_cast<nvmlDevice_t>(dev);
+
+  nvmlBAR1Memory_t bar1Memory{};
+  nvmlMemory_t     memory{};
+  CHECK_NVML_SUPPORT(nvmlDeviceGetBAR1MemoryInfo(device, &bar1Memory), bar1Total);
+
+  bar1Total              = bar1Memory.bar1Total;
+  bar1Used.get()[offset] = bar1Memory.bar1Used;
+  bar1Used.isSupported   = bar1Total.isSupported;
+
+  bar1Free.get()[offset] = bar1Memory.bar1Free;
+  bar1Free.isSupported   = bar1Total.isSupported;
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetMemoryInfo(device, &memory), memoryTotal);
+  memoryTotal              = memory.total;
+  memoryUsed.get()[offset] = memory.used;
+  memoryUsed.isSupported   = memoryTotal.isSupported;
+  memoryFree.get()[offset] = memory.free;
+  memoryFree.isSupported   = memoryTotal.isSupported;
+#endif
+}
+
+void nvvkhl::NvmlMonitor::DeviceUtilization::init(uint32_t maxElements)
+{
+  gpuUtilization.get().resize(maxElements);
+  memUtilization.get().resize(maxElements);
+  ;
+  computeProcesses.get().resize(maxElements);
+  ;
+  graphicsProcesses.get().resize(maxElements);
+  ;
+}
+
+void nvvkhl::NvmlMonitor::DeviceUtilization::refresh(void* dev, uint32_t offset)
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlDevice_t device = reinterpret_cast<nvmlDevice_t>(dev);
+
+  nvmlUtilization_t utilization;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetUtilizationRates(device, &utilization), gpuUtilization);
+  gpuUtilization.get()[offset] = utilization.gpu;
+  memUtilization.get()[offset] = utilization.memory;
+  memUtilization.isSupported   = gpuUtilization.isSupported;
+
+
+  computeProcesses.get()[offset]  = 0;
+  graphicsProcesses.get()[offset] = 0;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetComputeRunningProcesses(device, &computeProcesses.get()[offset], nullptr), computeProcesses);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetGraphicsRunningProcesses(device, &graphicsProcesses.get()[offset], nullptr), graphicsProcesses);
+#endif
+}
+
+void nvvkhl::NvmlMonitor::DevicePerformanceState::init(uint32_t maxElements)
+{
+  clockGraphics.get().resize(maxElements);
+  clockSM.get().resize(maxElements);
+  clockMem.get().resize(maxElements);
+  clockVideo.get().resize(maxElements);
+  throttleReasons.get().resize(maxElements);
+}
+
+void nvvkhl::NvmlMonitor::DevicePerformanceState::refresh(void* dev, uint32_t offset)
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlDevice_t device = reinterpret_cast<nvmlDevice_t>(dev);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetClockInfo(device, NVML_CLOCK_GRAPHICS, &clockGraphics.get()[offset]), clockGraphics);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetClockInfo(device, NVML_CLOCK_SM, &clockSM.get()[offset]), clockSM);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetClockInfo(device, NVML_CLOCK_MEM, &clockMem.get()[offset]), clockMem);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetClockInfo(device, NVML_CLOCK_VIDEO, &clockVideo.get()[offset]), clockVideo);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetCurrentClocksThrottleReasons(device, reinterpret_cast<unsigned long long*>(
+                                                                           &throttleReasons.get()[offset])),
+                     throttleReasons);
+#endif
+}
+
+std::vector<std::string> nvvkhl::NvmlMonitor::DevicePerformanceState::getThrottleReasonStrings(uint64_t reason)
+{
+  std::vector<std::string> reasonStrings;
+#if defined(NVP_SUPPORTS_NVML)
+
+
+  if(reason & nvmlClocksThrottleReasonGpuIdle)
+  {
+    reasonStrings.push_back("Idle");
+  }
+
+  if(reason & nvmlClocksThrottleReasonApplicationsClocksSetting)
+  {
+    reasonStrings.push_back("App clock setting");
+  }
+  if(reason & nvmlClocksThrottleReasonSwPowerCap)
+  {
+    reasonStrings.push_back("SW power cap");
+  }
+  if(reason & nvmlClocksThrottleReasonHwSlowdown)
+  {
+    reasonStrings.push_back("HW slowdown");
+  }
+  if(reason & nvmlClocksThrottleReasonSyncBoost)
+  {
+    reasonStrings.push_back("Sync boost");
+  }
+  if(reason & nvmlClocksThrottleReasonSwThermalSlowdown)
+  {
+    reasonStrings.push_back("SW Thermal slowdown");
+  }
+  if(reason & nvmlClocksThrottleReasonHwThermalSlowdown)
+  {
+    reasonStrings.push_back("HW Thermal slowdown");
+  }
+  if(reason & nvmlClocksThrottleReasonHwPowerBrakeSlowdown)
+  {
+    reasonStrings.push_back("Power brake slowdown");
+  }
+  if(reasonStrings.empty())
+  {
+    reasonStrings.push_back("Full speed");
+  }
+#endif
+  return reasonStrings;
+}
+
+const std::vector<uint64_t>& nvvkhl::NvmlMonitor::DevicePerformanceState::getAllThrottleReasonList()
+{
+  static std::vector<uint64_t> s_reasonList =
+#if defined(NVP_SUPPORTS_NVML)
+      {nvmlClocksThrottleReasonGpuIdle,
+       nvmlClocksThrottleReasonApplicationsClocksSetting,
+       nvmlClocksThrottleReasonSwPowerCap,
+       nvmlClocksThrottleReasonHwSlowdown,
+       nvmlClocksThrottleReasonSyncBoost,
+       nvmlClocksThrottleReasonSwThermalSlowdown,
+       nvmlClocksThrottleReasonHwThermalSlowdown,
+       nvmlClocksThrottleReasonHwPowerBrakeSlowdown,
+       nvmlClocksThrottleReasonNone};
+
+#else
+      {};
+#endif
+  return s_reasonList;
+}
+
+void nvvkhl::NvmlMonitor::DevicePowerState::init(uint32_t maxElements)
+{
+  power.get().resize(maxElements);
+  temperature.get().resize(maxElements);
+  fanSpeed.get().resize(maxElements);
+}
+
+void nvvkhl::NvmlMonitor::DevicePowerState::refresh(void* dev, uint32_t offset)
+{
+#if defined(NVP_SUPPORTS_NVML)
+  nvmlDevice_t device = reinterpret_cast<nvmlDevice_t>(dev);
+
+  CHECK_NVML_SUPPORT(nvmlDeviceGetTemperature(device, NVML_TEMPERATURE_GPU, &temperature.get()[offset]), temperature);
+  CHECK_NVML_SUPPORT(nvmlDeviceGetPowerUsage(device, &power.get()[offset]), power);
+  // Milliwatt to watt
+  power.get()[offset] /= 1000;
+  CHECK_NVML_SUPPORT(nvmlDeviceGetFanSpeed(device, &fanSpeed.get()[offset]), fanSpeed);
+#endif
+}
--- a/raytracer/nvpro_core/nvh/nvml_monitor.hpp
+++ b/raytracer/nvpro_core/nvh/nvml_monitor.hpp
@ -0,0 +1,220 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+ *
+ * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+ * property and proprietary rights in and to this material, related
+ * documentation and any modifications thereto. Any use, reproduction,
+ * disclosure or distribution of this material and related documentation
+ * without an express license agreement from NVIDIA CORPORATION or
+ * its affiliates is strictly prohibited.
+ */
+
+#pragma once
+
+#include <string>
+#include <vector>
+#include <map>
+
+
+/** @DOC_START
+
+Capture the GPU load and memory for all GPUs on the system.
+
+Usage:
+- There should be only one instance of NvmlMonitor
+- call refresh() in each frame. It will not pull more measurement that the interval(ms)
+- isValid() : return if it can be used
+- nbGpu()   : return the number of GPU in the computer
+- getGpuInfo()     : static info about the GPU
+- getDeviceMemory() : memory consumption info
+- getDeviceUtilization() : GPU and memory utilization
+- getDevicePerformanceState() : clock speeds and throttle reasons
+- getDevicePowerState() : power, temperature and fan speed
+
+Measurements: 
+- Uses a cycle buffer. 
+- Offset is the last measurement
+
+@DOC_END */
+
+namespace nvvkhl {
+
+class NvmlMonitor
+{
+public:
+  NvmlMonitor(uint32_t interval = 100, uint32_t limit = 100);
+  ~NvmlMonitor();
+
+  template <typename T>
+  struct NVMLField
+  {
+    T    data;
+    bool isSupported;
+
+    operator T&() { return data; }
+    T&       get() { return data; }
+    const T& get() const { return data; }
+
+    T& operator=(const T& rhs)
+    {
+      data = rhs;
+      return data;
+    }
+  };
+
+  // Static device information
+  struct DeviceInfo
+  {
+    NVMLField<std::string> currentDriverModel;
+    NVMLField<std::string> pendingDriverModel;
+
+    NVMLField<uint32_t>    boardId;
+    NVMLField<std::string> partNumber;
+    NVMLField<std::string> brand;
+    // Ordered list of bridge chips, each with a type and firmware version strings
+    NVMLField<std::vector<std::pair<std::string, std::string>>> bridgeHierarchy;
+    NVMLField<uint64_t>                                         cpuAffinity;
+    NVMLField<std::string>                                      computeMode;
+    NVMLField<int32_t>                                          computeCapabilityMajor;
+    NVMLField<int32_t>                                          computeCapabilityMinor;
+    NVMLField<uint32_t>                                         pcieLinkGen;
+    NVMLField<uint32_t>                                         pcieLinkWidth;
+
+    NVMLField<uint32_t> clockDefaultGraphics;
+    NVMLField<uint32_t> clockDefaultSM;
+    NVMLField<uint32_t> clockDefaultMem;
+    NVMLField<uint32_t> clockDefaultVideo;
+
+    NVMLField<uint32_t> clockMaxGraphics;
+    NVMLField<uint32_t> clockMaxSM;
+    NVMLField<uint32_t> clockMaxMem;
+    NVMLField<uint32_t> clockMaxVideo;
+
+    NVMLField<uint32_t> clockBoostGraphics;
+    NVMLField<uint32_t> clockBoostSM;
+    NVMLField<uint32_t> clockBoostMem;
+    NVMLField<uint32_t> clockBoostVideo;
+
+
+    NVMLField<bool> currentEccMode;
+    NVMLField<bool> pendingEccMode;
+
+    NVMLField<uint32_t>    encoderCapacityH264;
+    NVMLField<uint32_t>    encoderCapacityHEVC;
+    NVMLField<std::string> infoROMImageVersion;
+    NVMLField<std::string> infoROMOEMVersion;
+    NVMLField<std::string> infoROMECCVersion;
+    NVMLField<std::string> infoROMPowerVersion;
+    NVMLField<uint64_t>    supportedClocksThrottleReasons;
+    NVMLField<std::string> vbiosVersion;
+    NVMLField<uint32_t>    maxLinkGen;
+    NVMLField<uint32_t>    maxLinkWidth;
+    NVMLField<uint32_t>    minorNumber;
+    NVMLField<uint32_t>    multiGpuBool;
+    NVMLField<std::string> deviceName;
+
+
+    NVMLField<uint32_t> tempThresholdShutdown;
+    NVMLField<uint32_t> tempThresholdHWSlowdown;
+    NVMLField<uint32_t> tempThresholdSWSlowdown;
+    NVMLField<uint32_t> tempThresholdDropBelowBaseClock;
+
+    NVMLField<uint32_t> powerLimit;
+
+    NVMLField<std::vector<uint32_t>>                     supportedMemoryClocks;
+    NVMLField<std::map<uint32_t, std::vector<uint32_t>>> supportedGraphicsClocks;
+
+
+    void refresh(void* device);
+  };
+
+  // Device memory usage
+  struct DeviceMemory
+  {
+    NVMLField<uint64_t>              bar1Total;
+    NVMLField<std::vector<uint64_t>> bar1Used;
+    NVMLField<std::vector<uint64_t>> bar1Free;
+
+    NVMLField<uint64_t>              memoryTotal;
+    NVMLField<std::vector<uint64_t>> memoryUsed;
+    NVMLField<std::vector<uint64_t>> memoryFree;
+
+    void init(uint32_t maxElements);
+    void refresh(void* device, uint32_t offset);
+  };
+
+  // Device utilization ratios
+  struct DeviceUtilization
+  {
+    NVMLField<std::vector<uint32_t>> gpuUtilization;
+    NVMLField<std::vector<uint32_t>> memUtilization;
+    NVMLField<std::vector<uint32_t>> computeProcesses;
+    NVMLField<std::vector<uint32_t>> graphicsProcesses;
+
+    void init(uint32_t maxElements);
+    void refresh(void* device, uint32_t offset);
+  };
+
+  // Device performance state: clocks and throttling
+  struct DevicePerformanceState
+  {
+    NVMLField<std::vector<uint32_t>> clockGraphics;
+    NVMLField<std::vector<uint32_t>> clockSM;
+    NVMLField<std::vector<uint32_t>> clockMem;
+    NVMLField<std::vector<uint32_t>> clockVideo;
+    NVMLField<std::vector<uint64_t>> throttleReasons;
+
+    void                            init(uint32_t maxElements);
+    void                            refresh(void* device, uint32_t offset);
+    static std::vector<std::string> getThrottleReasonStrings(uint64_t reason);
+
+    static const std::vector<uint64_t>& getAllThrottleReasonList();
+  };
+
+  // Device power and temperature
+  struct DevicePowerState
+  {
+    NVMLField<std::vector<uint32_t>> power;
+    NVMLField<std::vector<uint32_t>> temperature;
+    NVMLField<std::vector<uint32_t>> fanSpeed;
+
+    void init(uint32_t maxElements);
+    void refresh(void* device, uint32_t offset);
+  };
+
+  // Other information
+  struct SysInfo
+  {
+    std::vector<float> cpu;  // Load measurement [0, 100]
+    std::string        driverVersion;
+  };
+
+
+  void                          refresh();  // Take measurement
+  bool                          isValid() { return m_valid; }
+  uint32_t                      getGpuCount() { return m_physicalGpuCount; }
+  const DeviceInfo&             getDeviceInfo(int gpu) { return m_deviceInfo[gpu]; }
+  const DeviceMemory&           getDeviceMemory(int gpu) { return m_deviceMemory[gpu]; }
+  const DeviceUtilization&      getDeviceUtilization(int gpu) { return m_deviceUtilization[gpu]; }
+  const DevicePerformanceState& getDevicePerformanceState(int gpu) { return m_devicePerformanceState[gpu]; }
+  const DevicePowerState&       getDevicePowerState(int gpu) { return m_devicePowerState[gpu]; }
+  const SysInfo&                getSysInfo() { return m_sysInfo; }
+  int                           getOffset() { return m_offset; }
+
+
+private:
+  std::vector<DeviceInfo>             m_deviceInfo;
+  std::vector<DeviceMemory>           m_deviceMemory;
+  std::vector<DeviceUtilization>      m_deviceUtilization;
+  std::vector<DevicePerformanceState> m_devicePerformanceState;
+  std::vector<DevicePowerState>       m_devicePowerState;
+  SysInfo                             m_sysInfo;  // CPU and driver information
+  bool                                m_valid            = false;
+  uint32_t                            m_physicalGpuCount = 0;    // Number of NVIDIA GPU
+  uint32_t                            m_offset           = 0;    // Index of the most recent cpu load sample
+  uint32_t                            m_maxElements      = 100;  // Number of max stored measurements
+  uint32_t                            m_minInterval      = 100;  // Minimum interval lapse
+};
+
+}  // namespace nvvkhl
--- a/raytracer/nvpro_core/nvh/nvprint.cpp
+++ b/raytracer/nvpro_core/nvh/nvprint.cpp
@ -0,0 +1,334 @@
+/*
+ * Copyright (c) 2014-2023, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include "nvprint.hpp"
+
+#include <limits.h>
+#include <mutex>
+#include <vector>
+
+#ifdef _WIN32
+#include <io.h>
+#include <windows.h>
+#else
+#include <signal.h>
+#include <unistd.h>
+#endif
+
+enum class TriState
+{
+  eUnknown,
+  eFalse,
+  eTrue
+};
+
+static std::string         s_logFileName = "log_nvprosample.txt";
+static std::vector<char>   s_strBuffer;  // Persistent allocation for formatted text.
+static FILE*               s_fd                   = nullptr;
+static bool                s_bLogReady            = false;
+static bool                s_bPrintLogging        = true;
+static uint32_t            s_bPrintFileLogging    = LOGBITS_ALL;
+static uint32_t            s_bPrintConsoleLogging = LOGBITS_ALL;
+static uint32_t            s_bPrintBreakpoints    = 0;
+static int                 s_printLevel           = -1;  // <0 mean no level prefix
+static PFN_NVPRINTCALLBACK s_printCallback        = nullptr;
+static TriState            s_consoleSupportsColor = TriState::eUnknown;
+// Lock this when modifying any static variables.
+// Because it is a recursive mutex, its owner can lock it multiple times.
+static std::recursive_mutex s_mutex;
+
+void nvprintSetLogFileName(const char* name) noexcept
+{
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+
+  if(name == NULL || s_logFileName == name)
+    return;
+
+  try
+  {
+    s_logFileName = name;
+  }
+  catch(const std::exception& e)
+  {
+    nvprintLevel(LOGLEVEL_ERROR, "nvprintfSetLogFileName could not allocate space for new file name. Additional info below:");
+    nvprintLevel(LOGLEVEL_ERROR, e.what());
+  }
+
+  if(s_fd)
+  {
+    fclose(s_fd);
+    s_fd        = nullptr;
+    s_bLogReady = false;
+  }
+}
+void nvprintSetCallback(PFN_NVPRINTCALLBACK callback)
+{
+  s_printCallback = callback;
+}
+void nvprintSetLevel(int l)
+{
+  s_printLevel = l;
+}
+int nvprintGetLevel()
+{
+  return s_printLevel;
+}
+void nvprintSetLogging(bool b)
+{
+  s_bPrintLogging = b;
+}
+
+void nvprintSetFileLogging(bool state, uint32_t mask)
+{
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+
+  if(state)
+  {
+    s_bPrintFileLogging |= mask;
+  }
+  else
+  {
+    s_bPrintFileLogging &= ~mask;
+  }
+}
+
+void nvprintSetConsoleLogging(bool state, uint32_t mask)
+{
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+
+  if(state)
+  {
+    s_bPrintConsoleLogging |= mask;
+  }
+  else
+  {
+    s_bPrintConsoleLogging &= ~mask;
+  }
+}
+
+void nvprintSetBreakpoints(bool state, uint32_t mask)
+{
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+
+  if(state)
+  {
+    s_bPrintBreakpoints |= mask;
+  }
+  else
+  {
+    s_bPrintBreakpoints &= ~mask;
+  }
+}
+
+void nvprintfV(va_list& vlist, const char* fmt, int level) noexcept
+{
+  if(s_bPrintLogging == false)
+  {
+    return;
+  }
+
+  // Format the inputs into s_strBuffer.
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+  {
+    // Copy vlist as it may be modified by vsnprintf.
+    va_list vlistCopy;
+    va_copy(vlistCopy, vlist);
+    const int charactersNeeded = vsnprintf(s_strBuffer.data(), s_strBuffer.size(), fmt, vlistCopy);
+    va_end(vlistCopy);
+
+    // Check that:
+    // * vsnprintf did not return an error;
+    // * The string (plus null terminator) could fit in a vector.
+    if((charactersNeeded < 0) || (size_t(charactersNeeded) > s_strBuffer.max_size() - 1))
+    {
+      // Formatting error
+      nvprintLevel(LOGLEVEL_ERROR, "nvprintfV: Internal message formatting error.");
+      return;
+    }
+
+    // Increase the size of s_strBuffer as needed if there wasn't enough space.
+    if(size_t(charactersNeeded) >= s_strBuffer.size())
+    {
+      try
+      {
+        // Make sure to add 1, because vsnprintf doesn't count the terminating
+        // null character. This can potentially throw an exception.
+        s_strBuffer.resize(size_t(charactersNeeded) + 1, '\0');
+      }
+      catch(const std::exception& e)
+      {
+        nvprintLevel(LOGLEVEL_ERROR, "nvprintfV: Error resizing buffer to hold message. Additional info below:");
+        nvprintLevel(LOGLEVEL_ERROR, e.what());
+        return;
+      }
+
+      // Now format it; we know this will succeed.
+      (void)vsnprintf(s_strBuffer.data(), s_strBuffer.size(), fmt, vlist);
+    }
+  }
+
+  nvprintLevel(level, s_strBuffer.data());
+}
+
+void nvprintLevel(int level, const std::string& msg) noexcept
+{
+  nvprintLevel(level, msg.c_str());
+}
+
+void nvprintLevel(int level, const char* msg) noexcept
+{
+  std::lock_guard<std::recursive_mutex> lockGuard(s_mutex);
+
+#ifdef WIN32
+  // Note: Maybe we could consider changing to a text encoding of UTF-8 in
+  // the future, bring in calls to Windows' MultiByteToWideChar, and call
+  // OutputDebugStringW.
+  OutputDebugStringA(msg);
+#endif
+
+  if(s_bPrintFileLogging & (1 << level))
+  {
+    if(s_bLogReady == false)
+    {
+      s_fd        = fopen(s_logFileName.c_str(), "wt");
+      s_bLogReady = true;
+    }
+    if(s_fd)
+    {
+      fputs(msg, s_fd);
+    }
+  }
+
+  if(s_printCallback)
+  {
+    s_printCallback(level, msg);
+  }
+
+  if(s_bPrintConsoleLogging & (1 << level))
+  {
+    // Determine if the output supports ANSI color sequences only once to avoid
+    // many calls to isatty.
+    if(TriState::eUnknown == s_consoleSupportsColor)
+    {
+      // Determining this perfectly is difficult; terminfo does it by storing
+      // a large table of all consoles it knows about. For now, we assume
+      // all consoles support colors, and all pipes do not.
+#ifdef WIN32
+      bool supportsColor = _isatty(_fileno(stderr)) && _isatty(_fileno(stdout));
+      // This enables ANSI escape codes from the app side.
+      // We do this because on Windows 10, cmd.exe is a console, but only
+      // supports ANSI escape codes by default if the
+      // HKEY_CURRENT_USER\Console\VirtualTerminalLevel registry key is
+      // nonzero, which we don't want to assume.
+      // See https://github.com/nvpro-samples/vk_raytrace/issues/28.
+      // On failure, turn off colors.
+      if(supportsColor)
+      {
+        for(DWORD stdHandleIndex : {STD_OUTPUT_HANDLE, STD_ERROR_HANDLE})
+        {
+          const HANDLE consoleHandle = GetStdHandle(stdHandleIndex);
+          if(INVALID_HANDLE_VALUE == consoleHandle)
+          {
+            supportsColor = false;
+            break;
+          }
+          DWORD consoleMode = 0;
+          if(0 == GetConsoleMode(consoleHandle, &consoleMode))
+          {
+            supportsColor = false;
+            break;
+          }
+          SetConsoleMode(consoleHandle, consoleMode | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
+        }
+      }
+#else
+      const bool supportsColor = isatty(fileno(stderr)) && isatty(fileno(stdout));
+#endif
+      s_consoleSupportsColor = (supportsColor ? TriState::eTrue : TriState::eFalse);
+    }
+
+    FILE* outStream = (((1 << level) & LOGBITS_ERRORS) ? stderr : stdout);
+
+    if(TriState::eTrue == s_consoleSupportsColor)
+    {
+      // Set the foreground color depending on level:
+      // https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters
+      if(level == LOGLEVEL_OK)
+      {
+        fputs("\033[32m", outStream);  // Green
+      }
+      else if(level == LOGLEVEL_ERROR)
+      {
+        fputs("\033[31m", outStream);  // Red
+      }
+      else if(level == LOGLEVEL_WARNING)
+      {
+        fputs("\033[33m", outStream);  // Yellow
+      }
+      else if(level == LOGLEVEL_DEBUG)
+      {
+        fputs("\033[36m", outStream);  // Cyan
+      }
+    }
+
+    fputs(msg, outStream);
+
+    if(TriState::eTrue == s_consoleSupportsColor)
+    {
+      // Reset all attributes
+      fputs("\033[0m", outStream);
+    }
+  }
+
+  if(s_bPrintBreakpoints & (1 << level))
+  {
+#ifdef WIN32
+    DebugBreak();
+#else
+    raise(SIGTRAP);
+#endif
+  }
+}
+
+void nvprintf(
+#ifdef _MSC_VER
+    _Printf_format_string_
+#endif
+    const char* fmt,
+    ...) noexcept
+{
+  //    int r = 0;
+  va_list vlist;
+  va_start(vlist, fmt);
+  nvprintfV(vlist, fmt, s_printLevel);
+  va_end(vlist);
+}
+void nvprintfLevel(int level,
+#ifdef _MSC_VER
+                   _Printf_format_string_
+#endif
+                   const char* fmt,
+                   ...) noexcept
+{
+  va_list vlist;
+  va_start(vlist, fmt);
+  nvprintfV(vlist, fmt, level);
+  va_end(vlist);
+}
--- a/raytracer/nvpro_core/nvh/nvprint.hpp
+++ b/raytracer/nvpro_core/nvh/nvprint.hpp
@ -0,0 +1,241 @@
+/*
+ * Copyright (c) 2014-2023, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2023 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef __NVPRINT_H__
+#define __NVPRINT_H__
+
+#include <cstdarg>
+#include <fmt/format.h>
+#include <functional>
+#include <stdint.h>
+#include <string>
+
+/** @DOC_START
+  Multiple functions and macros that should be used for logging purposes,
+  rather than printf. These can print to multiple places at once
+  # Function nvprintf etc
+  
+  Configuration:
+  - nvprintSetLevel : sets default loglevel
+  - nvprintGetLevel : gets default loglevel
+  - nvprintSetLogFileName : sets log filename
+  - nvprintSetLogging : sets file logging state
+  - nvprintSetCallback : sets custom callback
+
+  Printf-style functions and macros.
+  These take printf-style specifiers.
+  - nvprintf : prints at default loglevel
+  - nvprintfLevel : nvprintfLevel print at a certain loglevel
+  - LOGI : macro that does nvprintfLevel(LOGLEVEL_INFO)
+  - LOGW : macro that does nvprintfLevel(LOGLEVEL_WARNING)
+  - LOGE : macro that does nvprintfLevel(LOGLEVEL_ERROR)
+  - LOGE_FILELINE : macro that does nvprintfLevel(LOGLEVEL_ERROR) combined with filename/line
+  - LOGD : macro that does nvprintfLevel(LOGLEVEL_DEBUG) (only in debug builds)
+  - LOGOK : macro that does nvprintfLevel(LOGLEVEL_OK)
+  - LOGSTATS : macro that does nvprintfLevel(LOGLEVEL_STATS)
+
+  std::print-style functions and macros.
+  These take std::format-style specifiers
+  (https://en.cppreference.com/w/cpp/utility/format/formatter#Standard_format_specification).
+  - nvprintLevel : print at a certain loglevel
+  - PRINTI : macro that does nvprintLevel(LOGLEVEL_INFO)
+  - PRINTW : macro that does nvprintLevel(LOGLEVEL_WARNING)
+  - PRINTE : macro that does nvprintLevel(LOGLEVEL_ERROR)
+  - PRINTE_FILELINE : macro that does nvprintLevel(LOGLEVEL_ERROR) combined with filename/line
+  - PRINTD : macro that does nvprintLevel(LOGLEVEL_DEBUG) (only in debug builds)
+  - PRINTOK : macro that does nvprintLevel(LOGLEVEL_OK)
+  - PRINTSTATS : macro that does nvprintLevel(LOGLEVEL_STATS)
+
+  Safety:
+  On error, all functions print an error message.
+  All functions are thread-safe.
+  Printf-style functions have annotations that should produce warnings at
+  compile-time or when performing static analysis. Their format strings may be
+  dynamic - but this can be bad if an adversary can choose the content of the
+  format string.
+  std::print-style functions are safer: they produce compile-time errors, and
+  their format strings must be compile-time constants. Dynamic formatting
+  should be performed outside of printing, like this:
+  ```cpp
+  ImGui::InputText("Enter a format string: ", userFormat, sizeof(userFormat));
+  try
+  {
+    std::string formatted = fmt::vformat(userFormat, ...);
+  }
+  catch (const std::exception& e)
+  {
+    (error handling...)
+  }
+  PRINTI("{}", formatted);
+  ```
+
+  Text encoding:
+  Printing to the Windows debug console is the only operation that assumes a
+  text encoding, which is ANSI. In all other cases, strings are copied into
+  the output.
+@DOC_END */
+
+
+// trick for pragma message so we can write:
+// #pragma message(__FILE__"("S__LINE__"): blah")
+#define S__(x) #x
+#define S_(x) S__(x)
+#define S__LINE__ S_(__LINE__)
+
+#ifndef LOGLEVEL_INFO
+#define LOGLEVEL_INFO 0
+#define LOGLEVEL_WARNING 1
+#define LOGLEVEL_ERROR 2
+#define LOGLEVEL_DEBUG 3
+#define LOGLEVEL_STATS 4
+#define LOGLEVEL_OK 7
+#define LOGBIT_INFO (1 << LOGLEVEL_INFO)
+#define LOGBIT_WARNING (1 << LOGLEVEL_WARNING)
+#define LOGBIT_ERROR (1 << LOGLEVEL_ERROR)
+#define LOGBIT_DEBUG (1 << LOGLEVEL_DEBUG)
+#define LOGBIT_STATS (1 << LOGLEVEL_STATS)
+#define LOGBIT_OK (1 << LOGLEVEL_OK)
+#define LOGBITS_ERRORS LOGBIT_ERROR
+#define LOGBITS_WARNINGS (LOGBITS_ERRORS | LOGBIT_WARNING)
+#define LOGBITS_INFO (LOGBITS_WARNINGS | LOGBIT_INFO)
+#define LOGBITS_DEBUG (LOGBITS_INFO | LOGBIT_DEBUG)
+#define LOGBITS_STATS (LOGBITS_DEBUG | LOGBIT_STATS)
+#define LOGBITS_OK (LOGBITS_WARNINGS | LOGBIT_OK)
+#define LOGBITS_ALL 0xffffffffu
+#endif
+
+// Set/get the default level for calls to nvprintf(). Use LOGLEVEL_*.
+void nvprintSetLevel(int l);
+int  nvprintGetLevel();
+
+void nvprintSetLogFileName(const char* name) noexcept;
+
+// Globally enable/disable all nvprint output and logging
+void nvprintSetLogging(bool b);
+
+// Update the bitmask of which levels receive file and stderr output, or
+// trigger breakpoints. `state` controls whether to enable or disable the bits
+// in `mask`. Use LOGBITS_*.
+void nvprintSetFileLogging(bool state, uint32_t mask = ~0);
+void nvprintSetConsoleLogging(bool state, uint32_t mask = ~0);
+void nvprintSetBreakpoints(bool state, uint32_t mask = LOGBITS_ERRORS);
+
+// Set a custom print handler. Called in addition to file and console logging.
+using PFN_NVPRINTCALLBACK = std::function<void(int level, const char* msg)>;
+void nvprintSetCallback(PFN_NVPRINTCALLBACK callback);
+
+// Printf-style macros and functions.
+#define LOGI(...)                                                                                                      \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_INFO, __VA_ARGS__);                                                                         \
+  }
+#define LOGW(...)                                                                                                      \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_WARNING, __VA_ARGS__);                                                                      \
+  }
+#define LOGE(...)                                                                                                      \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_ERROR, __VA_ARGS__);                                                                        \
+  }
+#define LOGE_FILELINE(...)                                                                                             \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_ERROR, __FILE__ "(" S__LINE__ "): **ERROR**:\n" __VA_ARGS__);                               \
+  }
+#ifndef NDEBUG
+#define LOGD(...)                                                                                                      \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_DEBUG, __FILE__ "(" S__LINE__ "): Debug Info:\n" __VA_ARGS__);                              \
+  }
+#else
+#define LOGD(...)
+#endif
+#define LOGOK(...)                                                                                                     \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_OK, __VA_ARGS__);                                                                           \
+  }
+#define LOGSTATS(...)                                                                                                  \
+  {                                                                                                                    \
+    nvprintfLevel(LOGLEVEL_STATS, __VA_ARGS__);                                                                        \
+  }
+
+void nvprintf(
+#ifdef _MSC_VER
+    _Printf_format_string_
+#endif
+    const char* fmt,
+    ...) noexcept
+#if defined(__GNUC__) || defined(__clang__)
+    __attribute__((format(printf, 1, 2)));
+#endif
+;
+
+void nvprintfLevel(int level,
+#ifdef _MSC_VER
+                   _Printf_format_string_
+#endif
+                   const char* fmt,
+                   ...) noexcept
+#if defined(__GNUC__) || defined(__clang__)
+    __attribute__((format(printf, 2, 3)));
+#endif
+;
+
+// std::print-style macros and functions.
+// Use fmt::format's built-in checking if the compiler supports consteval,
+// which cleans up how the macros appear in Intellisense. Otherwise, use
+// FMT_STRING; this will be messier. In either case, the last line of the
+// compiler error will point to the line with the incorrect print specifier.
+#ifdef FMT_HAS_CONSTEVAL
+#define PRINT_CHECK_FMT
+#else
+#define PRINT_CHECK_FMT FMT_STRING
+#endif
+// This macro catches exceptions from fmt::format. This gives us compile-time
+// checking, while still making these functions have the same noexcept
+// semantics as nvprintf.
+#define PRINT_CATCH(lvl, fmtstr, ...)                                                                                  \
+  {                                                                                                                    \
+    try                                                                                                                \
+    {                                                                                                                  \
+      nvprintLevel(lvl, fmt::format(PRINT_CHECK_FMT(fmtstr), __VA_ARGS__));                                            \
+    }                                                                                                                  \
+    catch(const std::exception&)                                                                                       \
+    {                                                                                                                  \
+      nvprintLevel(LOGLEVEL_ERROR, "PRINT_CATCH: Could not format string.\n");                                         \
+    }                                                                                                                  \
+  }
+#define PRINTI(fmtstr, ...) PRINT_CATCH(LOGLEVEL_INFO, fmtstr, __VA_ARGS__)
+#define PRINTW(fmtstr, ...) PRINT_CATCH(LOGLEVEL_WARNING, fmtstr, __VA_ARGS__)
+#define PRINTE(fmtstr, ...) PRINT_CATCH(LOGLEVEL_ERROR, fmtstr, __VA_ARGS__)
+#define PRINTE_FILELINE(fmtstr, ...)                                                                                   \
+  PRINT_CATCH(LOGLEVEL_ERROR, __FILE__ "(" S__LINE__ "): **ERROR**:\n" fmtstr, __VA_ARGS__)
+#ifndef NDEBUG
+#define PRINTD(fmtstr, ...) PRINT_CATCH(LOGLEVEL_DEBUG, __FILE__ "(" S__LINE__ "): Debug Info:\n" fmtstr, __VA_ARGS__)
+#else
+#define PRINTD(...)
+#endif
+#define PRINTOK(fmtstr, ...) PRINT_CATCH(LOGLEVEL_OK, fmtstr, __VA_ARGS__)
+#define PRINTSTATS(fmtstr, ...) PRINT_CATCH(LOGLEVEL_STATS, fmtstr, __VA_ARGS__)
+
+// Directly prints a message at the given level, without formatting.
+void nvprintLevel(int level, const std::string& msg) noexcept;
+void nvprintLevel(int level, const char* msg) noexcept;
+
+#endif
--- a/raytracer/nvpro_core/nvh/parallel_work.hpp
+++ b/raytracer/nvpro_core/nvh/parallel_work.hpp
@ -0,0 +1,152 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#pragma once
+
+#include <algorithm>
+#include <atomic>
+#include <cstdint>
+#include <functional>
+#include <thread>
+#include <vector>
+
+namespace nvh {
+/* @DOC_START
+Distributes batches of loops over BATCHSIZE items across multiple threads. numItems reflects the total number
+of items to process.
+
+batches: fn (uint64_t itemIndex, uint32_t threadIndex)
+         callback does single item
+ranges:  fn (uint64_t itemBegin, uint64_t itemEnd, uint32_t threadIndex)
+         callback does loop `for (uint64_t itemIndex = itemBegin; itemIndex < itemEnd; itemIndex++)`
+         
+@DOC_END */
+
+template <uint64_t BATCHSIZE = 128>
+inline void parallel_batches(uint64_t numItems, std::function<void(uint64_t)> fn, uint32_t numThreads)
+{
+  if(numThreads <= 1 || numItems < numThreads || numItems < BATCHSIZE)
+  {
+    for(uint64_t idx = 0; idx < numItems; idx++)
+    {
+      fn(idx);
+    }
+  }
+  else
+  {
+    std::atomic_uint64_t counter = 0;
+
+    auto worker = [&]() {
+      uint64_t idx;
+      while((idx = counter.fetch_add(BATCHSIZE)) < numItems)
+      {
+        uint64_t last = std::min(numItems, idx + BATCHSIZE);
+        for(uint64_t i = idx; i < last; i++)
+        {
+          fn(i);
+        }
+      }
+    };
+
+    std::vector<std::thread> threads(numThreads);
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i] = std::thread(worker);
+    }
+
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i].join();
+    }
+  }
+}
+
+template <uint64_t BATCHSIZE = 128>
+inline void parallel_batches(uint64_t numItems, std::function<void(uint64_t, uint32_t threadIdx)> fn, uint32_t numThreads)
+{
+  if(numThreads <= 1 || numItems < numThreads || numItems < BATCHSIZE)
+  {
+    for(uint64_t idx = 0; idx < numItems; idx++)
+    {
+      fn(idx, 0);
+    }
+  }
+  else
+  {
+    std::atomic_uint64_t counter = 0;
+
+    auto worker = [&](uint32_t threadIdx) {
+      uint64_t idx;
+      while((idx = counter.fetch_add(BATCHSIZE)) < numItems)
+      {
+        uint64_t last = std::min(numItems, idx + BATCHSIZE);
+        for(uint64_t i = idx; i < last; i++)
+        {
+          fn(i, threadIdx);
+        }
+      }
+    };
+
+    std::vector<std::thread> threads(numThreads);
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i] = std::thread(worker, i);
+    }
+
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i].join();
+    }
+  }
+}
+
+template <uint64_t BATCHSIZE = 128>
+inline void parallel_ranges(uint64_t numItems, std::function<void(uint64_t idxBegin, uint64_t idxEnd, uint32_t threadIdx)> fn, uint32_t numThreads)
+{
+  if(numThreads <= 1 || numItems < numThreads || numItems < BATCHSIZE)
+  {
+    fn(0, numItems, 0);
+  }
+  else
+  {
+    std::atomic_uint64_t counter = 0;
+
+    auto worker = [&](uint32_t threadIdx) {
+      uint64_t idx;
+      while((idx = counter.fetch_add(BATCHSIZE)) < numItems)
+      {
+        uint64_t last = std::min(numItems, idx + BATCHSIZE);
+        fn(idx, last, threadIdx);
+      }
+    };
+
+    std::vector<std::thread> threads(numThreads);
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i] = std::thread(worker, i);
+    }
+
+    for(uint32_t i = 0; i < numThreads; i++)
+    {
+      threads[i].join();
+    }
+  }
+}
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/parametertools.cpp
+++ b/raytracer/nvpro_core/nvh/parametertools.cpp
@ -0,0 +1,500 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include "parametertools.hpp"
+#include "nvprint.hpp"
+
+#include <algorithm>
+
+namespace nvh {
+
+ParameterList::ParameterList()
+{
+  Callback helpCallback = [&](uint32_t) { print(); };
+
+  setHelp(add("help", helpCallback), "Print help");
+  setHelp(add("h", helpCallback), "Print help");
+}
+
+void ParameterList::tokenizeString(std::string& content, std::vector<const char*>& args)
+{
+  bool wasSpace  = true;
+  bool inQuotes  = false;
+  bool inComment = false;
+  bool wasQuote  = false;
+  bool wasEscape = false;
+
+  for(size_t i = 0; i < content.size(); i++)
+  {
+    char* token     = &content[i];
+    char  current   = content[i];
+    bool  isEndline = current == '\n';
+    bool  isSpace   = (current == ' ' || current == '\t' || current == '\n');
+    bool  isQuote   = current == '"';
+    bool  isComment = current == '#';
+    bool  isEscape  = current == '\\';
+
+    if(isEndline && inComment)
+    {
+      inComment = false;
+    }
+    if(isComment && !inQuotes)
+    {
+      content[i] = 0;
+      inComment  = true;
+    }
+
+    if(inComment)
+      continue;
+
+    if(inQuotes)
+    {
+      if(wasEscape && (current == 'n' || current == 't'))
+      {
+        content[i]     = current == 'n' ? '\n' : '\t';
+        content[i - 1] = ' ';
+      }
+    }
+
+    if(isQuote)
+    {
+      inQuotes = !inQuotes;
+      // treat as space
+      content[i] = 0;
+      isSpace    = true;
+    }
+    else if(isSpace)
+    {
+      // turn space to a terminator
+      if(!inQuotes)
+      {
+        content[i] = 0;
+      }
+    }
+    else if(wasSpace && (!inQuotes || wasQuote))
+    {
+      // start a new arg unless comment
+      args.push_back(token);
+    }
+
+    wasSpace  = isSpace;
+    wasQuote  = isQuote;
+    wasEscape = isEscape;
+  }
+}
+
+ParameterList::Parameter::Parameter(Type atype, const char* aname, Callback acallback, void* adestination, uint32_t areadLength, uint32_t awriteLength)
+{
+  type            = atype;
+  callback        = acallback;
+  readLength      = areadLength;
+  writeLength     = awriteLength;
+  destination.ptr = adestination;
+
+  // Set name and if specified, helptext
+  // Split at delimiter '|'
+  std::string sname        = std::string(aname);
+  size_t      delimiterPos = sname.find_first_of('|');
+  if(delimiterPos != std::string::npos)
+  {
+    name     = sname.substr(0, delimiterPos);
+    helptext = sname.substr(delimiterPos + 1);
+  }
+  else
+  {
+    name     = sname;
+    helptext = "";
+  }
+}
+
+uint32_t ParameterList::append(const ParameterList& list)
+{
+  uint32_t index = uint32_t(m_parameters.size());
+  m_parameters.insert(m_parameters.end(), list.m_parameters.begin(), list.m_parameters.end());
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, float* destination, Callback callback, uint32_t length /*= 1*/, float min /*= -FLT_MAX*/, float max /*= FLT_MAX*/)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_FLOAT, name, callback, destination, length, length);
+  param.minmax[0].f32 = min;
+  param.minmax[1].f32 = max;
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, int32_t* destination, Callback callback, uint32_t length /*= 1*/, int32_t min /*= -INT_MAX*/, int32_t max /*= +INT_MAX*/)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_INT, name, callback, destination, length, length);
+  param.minmax[0].s32 = min;
+  param.minmax[1].s32 = max;
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, uint32_t* destination, Callback callback, uint32_t length /*= 1*/, uint32_t min /*= 0*/, uint32_t max /*= 0xFFFFFFFF*/)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_UINT, name, callback, destination, length, length);
+  param.minmax[0].u32 = min;
+  param.minmax[1].u32 = max;
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, bool* destination, Callback callback, uint32_t length /*= 1*/)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_BOOL, name, callback, destination, length, length);
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, bool* destination, bool value, Callback callback /*= nullptr*/, uint32_t length /*= 1*/)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_BOOL_VALUE, name, callback, destination, 0, length);
+  param.minmax[0].b = value;
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, std::string* destination, Callback callback, uint32_t length)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_STRING, name, callback, destination, length, length);
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::add(const char* name, Callback callback, uint32_t length)
+{
+  uint32_t  index = uint32_t(m_parameters.size());
+  Parameter param(TYPE_TRIGGER, name, callback, nullptr, length, length);
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::addFilename(const char* name, std::string* destination, Callback callback)
+{
+  uint32_t index = uint32_t(m_parameters.size());
+  // special case "." searches for specific filenames
+  Parameter param(TYPE_FILENAME, name, callback, destination, name[0] == '.' ? 0 : 1, 1);
+  m_parameters.push_back(param);
+  return index;
+}
+
+uint32_t ParameterList::setHelp(uint32_t parameterIndex, const char* helptext)
+{
+  this->m_parameters[parameterIndex].helptext = helptext;
+  return parameterIndex;
+}
+
+static bool endsWith(std::string const& s, std::string const& end)
+{
+  if(s.length() >= end.length())
+  {
+    return (0 == s.compare(s.length() - end.length(), end.length(), end));
+  }
+  else
+  {
+    return false;
+  }
+}
+
+bool ParameterList::applyParameters(uint32_t     argc,
+                                    const char** argv,
+                                    uint32_t&    a,
+                                    const char*  paramPrefix /*= nullptr*/,
+                                    const char*  defaultFilePath /*= nullptr*/) const
+{
+  std::string prefixStr(paramPrefix ? paramPrefix : "");
+  std::string defaultPathStr(defaultFilePath ? defaultFilePath : "");
+
+  for(uint32_t p = 0; p < uint32_t(m_parameters.size()); p++)
+  {
+    const Parameter& param            = m_parameters[p];
+    std::string      combined         = prefixStr + param.name;
+    bool             searchFileEnding = (param.type == TYPE_FILENAME) && (param.readLength == 0);
+    bool matched = searchFileEnding ? endsWith(argv[a], param.name) : (strcmp(argv[a], combined.c_str()) == 0);
+
+    if(matched && a + param.readLength < argc)
+    {
+      switch(param.type)
+      {
+        case TYPE_FLOAT: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.f32[i] =
+                std::min(std::max(float(atof(argv[a + i + 1])), param.minmax[0].f32), param.minmax[1].f32);
+          }
+        }
+        break;
+        case TYPE_UINT: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.u32[i] =
+                std::min(std::max(uint32_t(atoi(argv[a + i + 1])), param.minmax[0].u32), param.minmax[1].u32);
+          }
+        }
+        break;
+        case TYPE_INT: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.s32[i] =
+                std::min(std::max(int32_t(atoi(argv[a + i + 1])), param.minmax[0].s32), param.minmax[1].s32);
+          }
+        }
+        break;
+        case TYPE_BOOL: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.b[i] = atoi(argv[a + i + 1]) != 0;
+          }
+        }
+        break;
+        case TYPE_BOOL_VALUE: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.b[i] = param.minmax[0].b;
+          }
+        }
+        break;
+        case TYPE_STRING: {
+          for(uint32_t i = 0; i < param.writeLength; i++)
+          {
+            param.destination.str[i] = std::string(argv[a + i + 1]);
+          }
+        }
+        break;
+        case TYPE_FILENAME: {
+          std::string filename(argv[a + param.readLength]);
+
+          if(
+#ifdef _WIN32
+              filename.find(':') != std::string::npos
+#else
+              !filename.empty() && filename[0] == '/'
+#endif
+          )
+          {
+            param.destination.str[0] = filename;
+          }
+          else
+          {
+            param.destination.str[0] = defaultPathStr + "/" + filename;
+          }
+        }
+        break;
+        case TYPE_TRIGGER: {
+        }
+        break;
+      }
+
+      if(param.callback)
+      {
+        param.callback(p);
+      }
+
+      if(searchFileEnding)
+      {
+        LOGI("  %s \"%s\"\n", param.name.c_str(), argv[a]);
+      }
+      else
+      {
+        LOGI(" ");
+        for(uint32_t i = 0; i < param.readLength + 1; i++)
+        {
+          bool isString = i > 0 && (param.type == TYPE_FILENAME || param.type == TYPE_STRING);
+          if(isString)
+          {
+            LOGI(" \"%s\"", argv[a + i]);
+          }
+          else
+          {
+            LOGI(" %s", argv[a + i]);
+          }
+        }
+        LOGI("\n");
+      }
+
+      a += param.readLength;
+      return true;
+    }
+  }
+  return false;
+}
+
+const char* ParameterList::toString(Type typ)
+{
+  switch(typ)
+  {
+    case TYPE_FLOAT:
+      return "float   ";
+    case TYPE_INT:
+      return "int     ";
+    case TYPE_UINT:
+      return "uint    ";
+    case TYPE_BOOL:
+      return "bool    ";
+    case TYPE_BOOL_VALUE:
+      return "value   ";
+    case TYPE_STRING:
+      return "string  ";
+    case TYPE_FILENAME:
+      return "filename";
+    case TYPE_TRIGGER:
+      return "trigger ";
+  }
+
+  return "unknown";
+}
+
+void ParameterList::print() const
+{
+  // Get maximum parameter name length
+  uint32_t maxParamNameLength = 0;
+  for(const auto& it : m_parameters)
+  {
+    maxParamNameLength = std::max(uint32_t(it.name.size()), maxParamNameLength);
+  }
+
+  // Print header
+  LOGI("parameterlist:\n");
+  LOGI(" type [args] %-*s   helptext\n", maxParamNameLength, "helptext");  // Pad helptext column with blanks
+
+  // Print underline. Format: -----
+  LOGI(" ");
+  for(uint32_t i = 0; i < maxParamNameLength + 23; i++)
+  {
+    LOGI("-");
+  }
+  LOGI("\n");
+
+  // Print command line arguments
+  for(const auto& it : m_parameters)
+  {
+
+    // Log param type, [args], name
+    LOGI(" %s[%d] %-*s", toString(it.type), it.readLength, maxParamNameLength, it.name.c_str());
+    if(it.helptext != "")
+    {
+      // Log helptext
+      LOGI(" - %s", it.helptext.c_str());
+    }
+    // Newline
+    LOGI("\n");
+  }
+  LOGI("\n");
+}
+
+uint32_t ParameterList::applyTokens(uint32_t argc, const char** argv, const char* prefix, const char* defaultPath) const
+{
+  uint32_t found = 0;
+
+  for(uint32_t a = 0; a < argc; a++)
+  {
+    if(applyParameters(argc, argv, a, prefix, defaultPath))
+    {
+      found++;
+    }
+    else
+    {
+      LOGI("  unhandled argument: %s\n", argv[a])
+    }
+  }
+
+  return found;
+}
+
+bool ParameterSequence::advanceIteration(const char* separator, uint32_t separatorArgLength, uint32_t& argBegin, uint32_t& argCount)
+{
+  if(!m_list || m_index >= m_tokens.size())
+    return true;
+
+  size_t begin = m_index;
+  size_t end   = begin;
+
+  m_separator = ~0;
+
+  for(size_t i = m_index; i < m_tokens.size(); i++)
+  {
+    if(strcmp(m_tokens[i], separator) == 0 && i + separatorArgLength < m_tokens.size())
+    {
+      end         = i - 1;
+      m_separator = i;
+      m_index     = i + separatorArgLength + 1;
+      break;
+    }
+    end = i;
+  }
+
+  if(m_separator == ~0)
+    return true;
+
+  uint32_t count = uint32_t(1 + end - begin);
+  if(count)
+  {
+    argCount = count;
+    argBegin = uint32_t(begin);
+
+    m_iteration++;
+    return false;
+  }
+  else
+  {
+    return true;
+  }
+}
+
+bool ParameterSequence::applyIteration(const char* separator,
+                                       uint32_t    separatorLength,
+                                       const char* paramPrefix /*= nullptr*/,
+                                       const char* defaultFilePath /*= nullptr*/)
+{
+  uint32_t argBegin;
+  uint32_t argCount;
+  if(!advanceIteration(separator, separatorLength, argBegin, argCount))
+  {
+    m_list->applyTokens(argCount, (const char**)&m_tokens[argBegin], paramPrefix, defaultFilePath);
+    return false;
+  }
+  else
+  {
+    // check if there is any parameters left
+    if(m_index < m_tokens.size())
+    {
+      uint32_t argBegin = uint32_t(m_index);
+      uint32_t argCount = uint32_t(m_tokens.size() - m_index);
+      m_list->applyTokens(argCount, (const char**)&m_tokens[argBegin], paramPrefix, defaultFilePath);
+    }
+    return true;
+  }
+}
+
+void ParameterSequence::resetIteration()
+{
+  m_index     = 0;
+  m_iteration = 0;
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/parametertools.hpp
+++ b/raytracer/nvpro_core/nvh/parametertools.hpp
@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef __NVPARMETERTOOLS_H__
+#define __NVPARMETERTOOLS_H__
+
+
+#include "platform.h"
+
+#include <climits>
+#include <functional>
+#include <string>
+#include <vector>
+
+namespace nvh {
+
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # class nvh::ParameterList
+
+    The nvh::ParameterList helps parsing commandline arguments
+    or commandline arguments stored within ascii config files.
+    
+    Parameters always update the values they point to, and optionally
+    can trigger a callback that can be provided per-parameter.
+    
+    ```cpp
+    ParameterList list;
+    std::string   modelFilename;
+    float         modelScale;
+    
+    list.addFilename(".gltf|model filename", &modelFilename);
+    list.add("scale|model scale", &modelScale);
+    
+    list.applyTokens(3, {"blah.gltf","-scale","4"}, "-", "/assets/");
+    ```
+
+    Use in combination with the ParameterSequence class to iterate
+    sequences of parameter changes for benchmarking/automation.
+  @DOC_END */
+
+class ParameterList
+{
+public:
+  typedef std::function<void(uint32_t)> Callback;
+
+  enum Type
+  {
+    TYPE_FLOAT,
+    TYPE_INT,
+    TYPE_UINT,
+    TYPE_BOOL,
+    TYPE_BOOL_VALUE,
+    TYPE_STRING,
+    TYPE_FILENAME,
+    TYPE_TRIGGER,
+  };
+
+  ParameterList();
+
+  uint32_t append(const ParameterList& list);
+
+  // Add a parameter. The name can be given in format: name[|help text], for example: "winsize|Set window size"
+  uint32_t add(const char* name, float* destination, Callback callback = nullptr, uint32_t length = 1, float min = -FLT_MAX, float max = FLT_MAX);
+  uint32_t add(const char* name, int32_t* destination, Callback callback = nullptr, uint32_t length = 1, int32_t min = -INT_MAX, int32_t max = +INT_MAX);
+  uint32_t add(const char* name, uint32_t* destination, Callback callback = nullptr, uint32_t length = 1, uint32_t min = 0, uint32_t max = 0xFFFFFFFF);
+  uint32_t add(const char* name, bool* destination, Callback callback = nullptr, uint32_t length = 1);
+  uint32_t add(const char* name, bool* destination, bool value, Callback callback = nullptr, uint32_t length = 1);
+  uint32_t add(const char* name, std::string* destination, Callback callback = nullptr, uint32_t length = 1);
+  uint32_t add(const char* name, Callback callback, uint32_t length = 0);
+
+  // if the parameter "name" starts with "." then we test the variable against this file-ending rather than
+  // treating name as commandline option.  So an argument that ends with ".blah" will trigger this parameter
+  uint32_t addFilename(const char* name, std::string* destination, Callback callback = nullptr);
+
+
+  // Set help of a parameter, returns the parameterIndex
+  uint32_t setHelp(uint32_t parameterIndex, const char* helptext);
+
+  // returns number of tokens found
+  // paramPrefix is typically "-"
+  // relative filenames get the defaultFilePath prepended
+  uint32_t applyTokens(uint32_t argCount, const char** argv, const char* paramPrefix = nullptr, const char* defaultFilePath = nullptr) const;
+  // tests only single argument, increases arg by appropriate length on success (returns true)
+  bool applyParameters(uint32_t argCount, const char** argv, uint32_t& arg, const char* paramPrefix = nullptr, const char* defaultFilePath = nullptr) const;
+
+  // prints all registered parameters and optional help strings
+  void print() const;
+
+  // separators are all space (tab, newline etc.) characters
+  // preserves quotes based on "", converts backslashes, uses # as line comment
+  // modifies content string by setting 0 at separators
+  static void        tokenizeString(std::string& content, std::vector<const char*>& args);
+  static const char* toString(Type typ);
+
+private:
+  struct Parameter
+  {
+    Type        type = TYPE_FLOAT;
+    std::string name;
+    uint32_t    readLength  = 0;
+    uint32_t    writeLength = 0;
+    union
+    {
+      uint32_t u32;
+      int32_t  s32;
+      float    f32;
+      bool     b;
+    } minmax[2] = {0, 0};
+    union
+    {
+      uint32_t*    u32;
+      int32_t*     s32;
+      float*       f32;
+      bool*        b;
+      std::string* str;
+      void*        ptr;
+    } destination        = {nullptr};
+    Callback    callback = nullptr;
+    std::string helptext;
+
+    Parameter() {}
+    Parameter(Type type, const char* name, Callback callback, void* destination, uint32_t readLength, uint32_t writeLength);
+  };
+
+  std::vector<Parameter> m_parameters;
+
+  Parameter makeParam(Type type, const char* name, Callback callback, void* destination, uint32_t readLength, uint32_t writeLength);
+};
+
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # class nvh::ParameterSequence
+
+    The nvh::ParameterSequence processes provided tokens in sequences.
+    The sequences are terminated by a special "separator" token.
+    All tokens between the last iteration and the separator are applied
+    to the provided ParameterList.
+    Useful to process commands in sequences (automation, benchmarking etc.).
+  
+    Example:
+  
+    ```cpp
+    ParameterSequence sequence;
+    ParameterList     list;
+    int               mode;
+    list.add("mode", &mode);
+  
+    std::vector<const char*> tokens;
+    ParameterList::tokenizeString("benchmark simple -mode 10 benchmark complex -mode 20", tokens);
+    sequence.init(&list, tokens);
+  
+       // 1 means our separator is followed by one argument (simple/complex)
+       // "-" as parameters in the string are prefixed with -
+  
+    while(!sequence.advanceIteration("benchmark", 1, "-")) {
+      printf("%d %s mode %d\n", sequence.getIteration(), sequence.getSeparatorArg(0), mode);
+    }
+
+    // would print:
+    //   0 simple mode 10
+    //   1 complex mode 20
+    ```
+@DOC_END  */
+
+
+class ParameterSequence
+{
+public:
+  ParameterSequence()
+      : m_list(nullptr)
+      , m_index(0)
+      , m_separator(0)
+      , m_iteration(0)
+  {
+  }
+
+  void init(const ParameterList* list, const std::vector<const char*>& tokens)
+  {
+    m_tokens = tokens;
+    m_list   = list;
+  }
+
+  // returns true if finished with all tokens, otherwise processes until next separator token is found
+  bool advanceIteration(const char* separator, uint32_t separatorArgLength, uint32_t& argBegin, uint32_t& argCount);
+  // also applies parameterlist
+  bool applyIteration(const char* separator,
+                      uint32_t    separatorArgLength = 0,
+                      const char* paramPrefix        = nullptr,
+                      const char* defaultFilePath    = nullptr);
+  // sets iteration to beginning
+  void resetIteration();
+
+  bool isActive() const { return m_list && m_index && m_iteration; }
+
+  uint32_t getIteration() const { return m_iteration; }
+
+  const char* getSeparatorArg(uint32_t offset) const
+  {
+    return m_separator != ~0ULL ? m_tokens[m_separator + offset + 1] : "";
+  }
+
+private:
+  const ParameterList*     m_list;
+  std::vector<const char*> m_tokens;
+  size_t                   m_index;
+  size_t                   m_separator;
+  uint32_t                 m_iteration;
+};
+
+}  // namespace nvh
+
+
+#endif
--- a/raytracer/nvpro_core/nvh/primitives.cpp
+++ b/raytracer/nvpro_core/nvh/primitives.cpp
@ -0,0 +1,801 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include <array>
+
+#define _USE_MATH_DEFINES
+#include <math.h>
+#include <unordered_map>
+#include <unordered_set>
+#include <random>
+#include "primitives.hpp"
+#include "container_utils.hpp"
+
+
+namespace nvh {
+static uint32_t addPos(PrimitiveMesh& mesh, glm::vec3 p)
+{
+  PrimitiveVertex v{};
+  v.p = p;
+  mesh.vertices.emplace_back(v);
+  return static_cast<uint32_t>(mesh.vertices.size()) - 1;
+}
+
+static void addTriangle(PrimitiveMesh& mesh, uint32_t a, uint32_t b, uint32_t c)
+{
+  mesh.triangles.push_back({{a, b, c}});
+}
+
+static void addTriangle(PrimitiveMesh& mesh, glm::vec3 a, glm::vec3 b, glm::vec3 c)
+{
+  mesh.triangles.push_back({{addPos(mesh, a), addPos(mesh, b), addPos(mesh, c)}});
+}
+
+static void generateFacetedNormals(PrimitiveMesh& mesh)
+{
+  auto num_indices = static_cast<int>(mesh.triangles.size());
+  for(int i = 0; i < num_indices; i++)
+  {
+    auto& v0 = mesh.vertices[mesh.triangles[i].v[0]];
+    auto& v1 = mesh.vertices[mesh.triangles[i].v[1]];
+    auto& v2 = mesh.vertices[mesh.triangles[i].v[2]];
+
+    glm::vec3 n = glm::normalize(glm::cross(glm::normalize(v1.p - v0.p), glm::normalize(v2.p - v0.p)));
+
+    v0.n = n;
+    v1.n = n;
+    v2.n = n;
+  }
+}
+
+// Function to generate texture coordinates
+static void generateTexCoords(PrimitiveMesh& mesh)
+{
+  for(auto& vertex : mesh.vertices)
+  {
+    glm::vec3 n = normalize(vertex.p);
+    float     u = 0.5f + std::atan2(n.z, n.x) / (2.0F * float(M_PI));
+    float     v = 0.5f - std::asin(n.y) / float(M_PI);
+    vertex.t    = {u, v};
+  }
+}
+
+// Generates a tetrahedron mesh (four triangular faces)
+PrimitiveMesh createTetrahedron()
+{
+  PrimitiveMesh mesh;
+
+  // choose coordinates on the unit sphere
+  float a = 1.0F / 3.0F;
+  float b = sqrt(8.0F / 9.0F);
+  float c = sqrt(2.0F / 9.0F);
+  float d = sqrt(2.0F / 3.0F);
+
+  // 4 vertices
+  glm::vec3 v0 = glm::vec3{0.0F, 1.0F, 0.0F} * 0.5F;
+  glm::vec3 v1 = glm::vec3{-c, -a, d} * 0.5F;
+  glm::vec3 v2 = glm::vec3{-c, -a, -d} * 0.5F;
+  glm::vec3 v3 = glm::vec3{b, -a, 0.0F} * 0.5F;
+
+  // 4 triangles
+  addTriangle(mesh, v0, v2, v1);
+  addTriangle(mesh, v0, v3, v2);
+  addTriangle(mesh, v0, v1, v3);
+  addTriangle(mesh, v3, v1, v2);
+
+  generateFacetedNormals(mesh);
+  generateTexCoords(mesh);
+
+  return mesh;
+}
+
+// Generates an icosahedron mesh (twenty equilateral triangular faces)
+PrimitiveMesh createIcosahedron()
+{
+  PrimitiveMesh mesh;
+
+  float sq5 = sqrt(5.0F);
+  float a   = 2.0F / (1.0F + sq5);
+  float b   = sqrt((3.0F + sq5) / (1.0F + sq5));
+  a /= b;
+  float r = 0.5F;
+
+  std::vector<glm::vec3> v;
+  v.emplace_back(0.0F, r * a, r / b);
+  v.emplace_back(0.0F, r * a, -r / b);
+  v.emplace_back(0.0F, -r * a, r / b);
+  v.emplace_back(0.0F, -r * a, -r / b);
+  v.emplace_back(r * a, r / b, 0.0F);
+  v.emplace_back(r * a, -r / b, 0.0F);
+  v.emplace_back(-r * a, r / b, 0.0F);
+  v.emplace_back(-r * a, -r / b, 0.0F);
+  v.emplace_back(r / b, 0.0F, r * a);
+  v.emplace_back(r / b, 0.0F, -r * a);
+  v.emplace_back(-r / b, 0.0F, r * a);
+  v.emplace_back(-r / b, 0.0F, -r * a);
+
+  addTriangle(mesh, v[1], v[6], v[4]);
+  addTriangle(mesh, v[0], v[4], v[6]);
+  addTriangle(mesh, v[0], v[10], v[2]);
+  addTriangle(mesh, v[0], v[2], v[8]);
+  addTriangle(mesh, v[1], v[9], v[3]);
+  addTriangle(mesh, v[1], v[3], v[11]);
+  addTriangle(mesh, v[2], v[7], v[5]);
+  addTriangle(mesh, v[3], v[5], v[7]);
+  addTriangle(mesh, v[6], v[11], v[10]);
+  addTriangle(mesh, v[7], v[10], v[11]);
+  addTriangle(mesh, v[4], v[8], v[9]);
+  addTriangle(mesh, v[5], v[9], v[8]);
+  addTriangle(mesh, v[0], v[6], v[10]);
+  addTriangle(mesh, v[0], v[8], v[4]);
+  addTriangle(mesh, v[1], v[11], v[6]);
+  addTriangle(mesh, v[1], v[4], v[9]);
+  addTriangle(mesh, v[3], v[7], v[11]);
+  addTriangle(mesh, v[3], v[9], v[5]);
+  addTriangle(mesh, v[2], v[10], v[7]);
+  addTriangle(mesh, v[2], v[5], v[8]);
+
+  generateFacetedNormals(mesh);
+  generateTexCoords(mesh);
+
+  return mesh;
+}
+
+// Generates an octahedron mesh (eight faces), this is like two four-sided pyramids placed base to base.
+PrimitiveMesh createOctahedron()
+{
+  PrimitiveMesh mesh;
+
+  std::vector<glm::vec3> v;
+  v.emplace_back(0.5F, 0.0F, 0.0F);
+  v.emplace_back(-0.5F, 0.0F, 0.0F);
+  v.emplace_back(0.0F, 0.5F, 0.0F);
+  v.emplace_back(0.0F, -0.5F, 0.0F);
+  v.emplace_back(0.0F, 0.0F, 0.5F);
+  v.emplace_back(0.0F, 0.0F, -0.5F);
+
+  addTriangle(mesh, v[0], v[2], v[4]);
+  addTriangle(mesh, v[0], v[4], v[3]);
+  addTriangle(mesh, v[0], v[5], v[2]);
+  addTriangle(mesh, v[0], v[3], v[5]);
+  addTriangle(mesh, v[1], v[4], v[2]);
+  addTriangle(mesh, v[1], v[3], v[4]);
+  addTriangle(mesh, v[1], v[5], v[3]);
+  addTriangle(mesh, v[2], v[5], v[1]);
+
+  generateFacetedNormals(mesh);
+  generateTexCoords(mesh);
+
+  return mesh;
+}
+
+// Generates a flat plane mesh with the specified number of steps, width, and depth.
+// The plane is essentially a grid with the specified number of subdivisions (steps)
+// in both the X and Z directions. It creates vertices, normals, and texture coordinates
+// for each point on the grid and forms triangles to create the plane's surface.
+PrimitiveMesh createPlane(int steps, float width, float depth)
+{
+  PrimitiveMesh mesh;
+
+  float increment = 1.0F / static_cast<float>(steps);
+  for(int sz = 0; sz <= steps; sz++)
+  {
+    for(int sx = 0; sx <= steps; sx++)
+    {
+      PrimitiveVertex v{};
+
+      v.p = glm::vec3(-0.5F + (static_cast<float>(sx) * increment), 0.0F, -0.5F + (static_cast<float>(sz) * increment));
+      v.p *= glm::vec3(width, 1.0F, depth);
+      v.n = glm::vec3(0.0F, 1.0F, 0.0F);
+      v.t = glm::vec2(static_cast<float>(sx) / static_cast<float>(steps),
+                      static_cast<float>(steps - sz) / static_cast<float>(steps));
+      mesh.vertices.emplace_back(v);
+    }
+  }
+
+  for(int sz = 0; sz < steps; sz++)
+  {
+    for(int sx = 0; sx < steps; sx++)
+    {
+      addTriangle(mesh, sx + sz * (steps + 1), sx + 1 + (sz + 1) * (steps + 1), sx + 1 + sz * (steps + 1));
+      addTriangle(mesh, sx + sz * (steps + 1), sx + (sz + 1) * (steps + 1), sx + 1 + (sz + 1) * (steps + 1));
+    }
+  }
+
+  return mesh;
+}
+
+// Generates a cube mesh with the specified width, height, and depth
+// Start with 8 vertex, 6 normal and 4 uv, then 12 triangles and 24
+// unique PrimitiveVertex
+PrimitiveMesh createCube(float width /*= 1*/, float height /*= 1*/, float depth /*= 1*/)
+{
+  PrimitiveMesh mesh;
+
+  glm::vec3              s   = glm::vec3(width, height, depth) * 0.5F;
+  std::vector<glm::vec3> pnt = {{-s.x, -s.y, -s.z}, {-s.x, -s.y, s.z}, {-s.x, s.y, -s.z}, {-s.x, s.y, s.z},
+                                {s.x, -s.y, -s.z},  {s.x, -s.y, s.z},  {s.x, s.y, -s.z},  {s.x, s.y, s.z}};
+  std::vector<glm::vec3> nrm = {{-1.0F, 0.0F, 0.0F}, {0.0F, 0.0F, 1.0F},  {1.0F, 0.0F, 0.0F},
+                                {0.0F, 0.0F, -1.0F}, {0.0F, -1.0F, 0.0F}, {0.0F, 1.0F, 0.0F}};
+  std::vector<glm::vec2> uv  = {{0.0F, 0.0F}, {0.0F, 1.0F}, {1.0F, 1.0F}, {1.0F, 0.0F}};
+
+  // cube topology
+  std::vector<std::vector<int>> cube_polygons = {{0, 1, 3, 2}, {1, 5, 7, 3}, {5, 4, 6, 7},
+                                                 {4, 0, 2, 6}, {4, 5, 1, 0}, {2, 3, 7, 6}};
+
+  for(int i = 0; i < 6; ++i)
+  {
+    auto index = static_cast<int>(mesh.vertices.size());
+    for(int j = 0; j < 4; ++j)
+      mesh.vertices.push_back({pnt[cube_polygons[i][j]], nrm[i], uv[j]});
+    addTriangle(mesh, index, index + 1, index + 2);
+    addTriangle(mesh, index, index + 2, index + 3);
+  }
+
+  return mesh;
+}
+
+// Generates a UV-sphere mesh with the specified radius, number of sectors (horizontal subdivisions)
+// and stacks (vertical subdivisions). It uses latitude-longitude grid generation to create vertices
+// with proper positions, normals, and texture coordinates.
+PrimitiveMesh createSphereUv(float radius, int sectors, int stacks)
+{
+  PrimitiveMesh mesh;
+
+  float omega{0.0F};                 // rotation around the X axis
+  float phi{0.0F};                   // rotation around the Y axis
+  float length_inv = 1.0F / radius;  // vertex normal
+
+  const float math_pi     = static_cast<float>(M_PI);
+  float       sector_step = 2.0F * math_pi / static_cast<float>(sectors);
+  float       stack_step  = math_pi / static_cast<float>(stacks);
+  float       sector_angle{0.0F};
+  float       stack_angle{0.0F};
+
+  for(int i = 0; i <= stacks; ++i)
+  {
+    stack_angle = math_pi / 2.0F - static_cast<float>(i) * stack_step;  // starting from pi/2 to -pi/2
+    phi         = radius * cosf(stack_angle);                           // r * cos(u)
+    omega       = radius * sinf(stack_angle);                           // r * sin(u)
+
+    // add (sectorCount+1) vertices per stack
+    // the first and last vertices have same position and normal, but different tex coords
+    for(int j = 0; j <= sectors; ++j)
+    {
+      PrimitiveVertex v{};
+
+      sector_angle = static_cast<float>(j) * sector_step;  // starting from 0 to 2pi
+
+      // vertex position (x, y, z)
+      v.p.x = phi * cosf(sector_angle);  // r * cos(u) * cos(v)
+      v.p.z = phi * sinf(sector_angle);  // r * cos(u) * sin(v)
+      v.p.y = omega;
+
+      // normalized vertex normal
+      v.n = v.p * length_inv;
+
+      // vertex tex coord (s, t) range between [0, 1]
+      v.t.x = 1.0F - static_cast<float>(j) / static_cast<float>(sectors);
+      v.t.y = static_cast<float>(i) / static_cast<float>(stacks);
+
+      mesh.vertices.emplace_back(v);
+    }
+  }
+
+  // indices
+  //  k2---k2+1
+  //  | \  |
+  //  |  \ |
+  //  k1---k1+1
+  int k1{0};
+  int k2{0};
+  for(int i = 0; i < stacks; ++i)
+  {
+    k1 = i * (sectors + 1);  // beginning of current stack
+    k2 = k1 + sectors + 1;   // beginning of next stack
+
+    for(int j = 0; j < sectors; ++j, ++k1, ++k2)
+    {
+      // 2 triangles per sector excluding 1st and last stacks
+      if(i != 0)
+      {
+        addTriangle(mesh, k1, k1 + 1, k2);  // k1---k2---k1+1
+      }
+
+      if(i != (stacks - 1))
+      {
+        addTriangle(mesh, k1 + 1, k2 + 1, k2);  // k1+1---k2---k2+1
+      }
+    }
+  }
+
+  return mesh;
+}
+
+// Function to create a cone
+// radius   :Adjust this to change the size of the cone
+// height   :Adjust this to change the height of the cone
+// segments :Adjust this for the number of segments forming the base circle
+PrimitiveMesh createConeMesh(float radius, float height, int segments)
+{
+  PrimitiveMesh mesh;
+
+  float halfHeight = height * 0.5f;
+
+  const float math_pi     = static_cast<float>(M_PI);
+  float       sector_step = 2.0F * math_pi / static_cast<float>(segments);
+  float       sector_angle{0.0F};
+
+  // length of the flank of the cone
+  float flank_len = sqrtf(radius * radius + 1.0F);
+  // unit vector along the flank of the cone
+  float cone_x = radius / flank_len;
+  float cone_y = -1.0F / flank_len;
+
+  glm::vec3 tip = {0.0F, halfHeight, 0.0F};
+
+  // Sides
+  for(int i = 0; i <= segments; ++i)
+  {
+    PrimitiveVertex v{};
+    sector_angle = static_cast<float>(i) * sector_step;
+
+    // Position
+    v.p.x = radius * cosf(sector_angle);  // r * cos(u) * cos(v)
+    v.p.z = radius * sinf(sector_angle);  // r * cos(u) * sin(v)
+    v.p.y = -halfHeight;
+    // Normal
+    v.n.x = -cone_y * cosf(sector_angle);
+    v.n.y = cone_x;
+    v.n.z = -cone_y * sinf(sector_angle);
+    // TexCoord
+    v.t.x = static_cast<float>(i) / static_cast<float>(segments);
+    v.t.y = 0.0F;
+    mesh.vertices.emplace_back(v);
+
+    // Tip point
+    v.p = tip;
+    // Normal
+    sector_angle += 0.5F * sector_step;  // Half way to next triangle
+    v.n.x = -cone_y * cosf(sector_angle);
+    v.n.y = cone_x;
+    v.n.z = -cone_y * sinf(sector_angle);
+    // TexCoord
+    v.t.x += 0.5F / static_cast<float>(segments);
+    v.t.y = 1.0F;
+
+    mesh.vertices.emplace_back(v);
+  }
+
+  for(int j = 0; j < segments; ++j)
+  {
+    int k1 = j * 2;
+    addTriangle(mesh, k1, k1 + 1, k1 + 2);
+  }
+
+  // Bottom plate (normal are different)
+  for(int i = 0; i <= segments; ++i)
+  {
+    PrimitiveVertex v{};
+    sector_angle = static_cast<float>(i) * sector_step;  // starting from 0 to 2pi
+
+    v.p.x = radius * cosf(sector_angle);  // r * cos(u) * cos(v)
+    v.p.z = radius * sinf(sector_angle);  // r * cos(u) * sin(v)
+    v.p.y = -halfHeight;
+    //
+    v.n = {0.0F, -1.0F, 0.0F};
+    //
+    v.t.x = static_cast<float>(i) / static_cast<float>(segments);
+    v.t.y = 0.0F;
+    mesh.vertices.emplace_back(v);
+
+    v.p = -tip;
+    v.t.x += 0.5F / static_cast<float>(segments);
+    v.t.y = 1.0F;
+    mesh.vertices.emplace_back(v);
+  }
+
+  for(int j = 0; j < segments; ++j)
+  {
+    int k1 = (j + segments + 1) * 2;
+    addTriangle(mesh, k1, k1 + 2, k1 + 1);
+  }
+
+
+  return mesh;
+}
+
+// Generates a sphere mesh with the specified radius and subdivisions (level of detail).
+// It uses the icosahedron subdivision technique to iteratively refine the mesh by
+// subdividing triangles into smaller triangles to approximate a more spherical shape.
+// It calculates vertex positions, normals, and texture coordinates for each vertex
+// and constructs triangles accordingly.
+// Note: There will be duplicated vertices with this method.
+//       Use removeDuplicateVertices to avoid duplicated vertices.
+PrimitiveMesh createSphereMesh(float radius, int subdivisions)
+{
+
+  const float            t        = (1.0F + std::sqrt(5.0F)) / 2.0F;  // Golden ratio
+  std::vector<glm::vec3> vertices = {{-1, t, 0},  {1, t, 0},  {-1, -t, 0}, {1, -t, 0}, {0, -1, t},  {0, 1, t},
+                                     {0, -1, -t}, {0, 1, -t}, {t, 0, -1},  {t, 0, 1},  {-t, 0, -1}, {-t, 0, 1}};
+
+  // Function to calculate the midpoint between two vertices
+  auto midpoint = [](const glm::vec3& v1, const glm::vec3& v2) { return (v1 + v2) * 0.5f; };
+
+  auto texCoord = [](const glm::vec3& v1) {
+    return glm::vec2{0.5f + std::atan2(v1.z, v1.x) / (2 * M_PI), 0.5f - std::asin(v1.y) / M_PI};
+  };
+
+  std::vector<PrimitiveVertex> primitiveVertices;
+  for(const auto& vertex : vertices)
+  {
+    glm::vec3 n = normalize(vertex);
+    primitiveVertices.push_back({n * radius, n, texCoord(n)});
+  }
+
+  std::vector<PrimitiveTriangle> triangles = {{{0, 11, 5}}, {{0, 5, 1}},  {{0, 1, 7}},   {{0, 7, 10}}, {{0, 10, 11}},
+                                              {{1, 5, 9}},  {{5, 11, 4}}, {{11, 10, 2}}, {{10, 7, 6}}, {{7, 1, 8}},
+                                              {{3, 9, 4}},  {{3, 4, 2}},  {{3, 2, 6}},   {{3, 6, 8}},  {{3, 8, 9}},
+                                              {{4, 9, 5}},  {{2, 4, 11}}, {{6, 2, 10}},  {{8, 6, 7}},  {{9, 8, 1}}};
+
+
+  for(int i = 0; i < subdivisions; ++i)
+  {
+    std::vector<PrimitiveTriangle> subTriangles;
+    for(const auto& tri : triangles)
+    {
+      // Subdivide each triangle into 4 sub-triangles
+      glm::vec3 mid1 = midpoint(primitiveVertices[tri.v[0]].p, primitiveVertices[tri.v[1]].p);
+      glm::vec3 mid2 = midpoint(primitiveVertices[tri.v[1]].p, primitiveVertices[tri.v[2]].p);
+      glm::vec3 mid3 = midpoint(primitiveVertices[tri.v[2]].p, primitiveVertices[tri.v[0]].p);
+
+      glm::vec3 mid1Normalized = normalize(mid1);
+      glm::vec3 mid2Normalized = normalize(mid2);
+      glm::vec3 mid3Normalized = normalize(mid3);
+
+      glm::vec2 mid1Uv = texCoord(mid1Normalized);
+      glm::vec2 mid2Uv = texCoord(mid2Normalized);
+      glm::vec2 mid3Uv = texCoord(mid3Normalized);
+
+      primitiveVertices.push_back({mid1Normalized * radius, mid1Normalized, mid1Uv});
+      primitiveVertices.push_back({mid2Normalized * radius, mid2Normalized, mid2Uv});
+      primitiveVertices.push_back({mid3Normalized * radius, mid3Normalized, mid3Uv});
+
+      uint32_t m1 = static_cast<uint32_t>(primitiveVertices.size()) - 3U;
+      uint32_t m2 = m1 + 1U;
+      uint32_t m3 = m2 + 1U;
+
+      // Create 4 new triangles from the subdivided triangle
+      subTriangles.push_back({{tri.v[0], m1, m3}});
+      subTriangles.push_back({{m1, tri.v[1], m2}});
+      subTriangles.push_back({{m2, tri.v[2], m3}});
+      subTriangles.push_back({{m1, m2, m3}});
+    }
+
+    triangles = subTriangles;
+  }
+
+  return {primitiveVertices, triangles};
+}
+
+
+// Generates a torus mesh, which is a 3D geometric shape resembling a donut
+// majorRadius: This represents the distance from the center of the torus to the center of the tube (the larger circle's radius).
+// minorRadius: This represents the radius of the tube (the smaller circle's radius).
+// majorSegments: The number of segments used to approximate the larger circle that forms the torus.
+// minorSegments: The number of segments used to approximate the smaller circle (tube) within the torus.
+nvh::PrimitiveMesh createTorusMesh(float majorRadius, float minorRadius, int majorSegments, int minorSegments)
+{
+  nvh::PrimitiveMesh mesh;
+
+  float majorStep = 2.0f * float(M_PI) / float(majorSegments);
+  float minorStep = 2.0f * float(M_PI) / float(minorSegments);
+
+  for(int i = 0; i <= majorSegments; ++i)
+  {
+    float     angle1 = i * majorStep;
+    glm::vec3 center = {majorRadius * std::cos(angle1), 0.0f, majorRadius * std::sin(angle1)};
+
+    for(int j = 0; j <= minorSegments; ++j)
+    {
+      float angle2 = j * minorStep;
+      glm::vec3 position = {center.x + minorRadius * std::cos(angle2) * std::cos(angle1), minorRadius * std::sin(angle2),
+                            center.z + minorRadius * std::cos(angle2) * std::sin(angle1)};
+
+      glm::vec3 normal = {std::cos(angle2) * std::cos(angle1), std::sin(angle2), std::cos(angle2) * std::sin(angle1)};
+
+      glm::vec2 texCoord = {static_cast<float>(i) / majorSegments, static_cast<float>(j) / minorSegments};
+      mesh.vertices.push_back({position, normal, texCoord});
+    }
+  }
+
+  for(int i = 0; i < majorSegments; ++i)
+  {
+    for(int j = 0; j < minorSegments; ++j)
+    {
+      uint32_t idx1 = i * (minorSegments + 1) + j;
+      uint32_t idx2 = (i + 1) * (minorSegments + 1) + j;
+      uint32_t idx3 = idx1 + 1;
+      uint32_t idx4 = idx2 + 1;
+
+      mesh.triangles.push_back({{idx1, idx3, idx2}});
+      mesh.triangles.push_back({{idx3, idx4, idx2}});
+    }
+  }
+
+  return mesh;
+}
+
+//------------------------------------------------------------------------
+// Create a vector of nodes that represent the Menger Sponge
+// Nodes have a different translation and scale, which can be used with
+// different objects.
+std::vector<nvh::Node> mengerSpongeNodes(int level, float probability, int seed)
+{
+  srand(seed);
+
+  struct MengerSponge
+  {
+    glm::vec3 m_topLeftFront;
+    float     m_size;
+
+    void split(std::vector<MengerSponge>& cubes)
+    {
+      float     size         = m_size / 3.f;
+      glm::vec3 topLeftFront = m_topLeftFront;
+      for(int x = 0; x < 3; x++)
+      {
+        topLeftFront[0] = m_topLeftFront[0] + static_cast<float>(x) * size;
+        for(int y = 0; y < 3; y++)
+        {
+          if(x == 1 && y == 1)
+            continue;
+          topLeftFront[1] = m_topLeftFront[1] + static_cast<float>(y) * size;
+          for(int z = 0; z < 3; z++)
+          {
+            if(x == 1 && z == 1)
+              continue;
+            if(y == 1 && z == 1)
+              continue;
+
+            topLeftFront[2] = m_topLeftFront[2] + static_cast<float>(z) * size;
+            cubes.push_back({topLeftFront, size});
+          }
+        }
+      }
+    }
+
+    void splitProb(std::vector<MengerSponge>& cubes, float prob)
+    {
+      float     size         = m_size / 3.f;
+      glm::vec3 topLeftFront = m_topLeftFront;
+      for(int x = 0; x < 3; x++)
+      {
+        topLeftFront[0] = m_topLeftFront[0] + static_cast<float>(x) * size;
+        for(int y = 0; y < 3; y++)
+        {
+          topLeftFront[1] = m_topLeftFront[1] + static_cast<float>(y) * size;
+          for(int z = 0; z < 3; z++)
+          {
+            float sample = rand() / static_cast<float>(RAND_MAX);
+            if(sample > prob)
+              continue;
+            topLeftFront[2] = m_topLeftFront[2] + static_cast<float>(z) * size;
+            cubes.push_back({topLeftFront, size});
+          }
+        }
+      }
+    }
+  };
+
+  // Starting element
+  MengerSponge element = {glm::vec3(-0.5, -0.5, -0.5), 1.f};
+
+  std::vector<MengerSponge> elements1 = {element};
+  std::vector<MengerSponge> elements2 = {};
+
+  auto previous = &elements1;
+  auto next     = &elements2;
+
+  for(int i = 0; i < level; i++)
+  {
+    for(MengerSponge& c : *previous)
+    {
+      if(probability < 0.f)
+        c.split(*next);
+      else
+        c.splitProb(*next, probability);
+    }
+    auto temp = previous;
+    previous  = next;
+    next      = temp;
+    next->clear();
+  }
+
+  std::vector<nvh::Node> nodes;
+  for(MengerSponge& c : *previous)
+  {
+    nvh::Node node{};
+    node.translation = c.m_topLeftFront;
+    node.scale       = glm::vec3(c.m_size);
+    node.mesh        = 0;  // default to the first mesh
+    nodes.push_back(node);
+  }
+
+  return nodes;
+}
+
+//-------------------------------------------------------------------------------------------------
+// Create a list of nodes where the seeds have the position similar as in a sun flower
+// and the seeds grow slightly the further they are from the center.
+std::vector<nvh::Node> sunflower(int seeds)
+{
+  constexpr double goldenRatio = glm::golden_ratio<double>();
+
+  std::vector<nvh::Node> flower;
+  for(int i = 1; i <= seeds; ++i)
+  {
+    double r     = pow(i, goldenRatio) / seeds;
+    double theta = 2 * glm::pi<double>() * goldenRatio * i;
+
+    nvh::Node seed;
+    seed.translation = glm::vec3(r * sin(theta), 0, r * cos(theta));
+    seed.scale       = glm::vec3(10.0f * i / (1.0f * seeds));
+    seed.mesh        = 0;
+
+    flower.push_back(seed);
+  }
+  return flower;
+}
+
+//---------------------------------------------------------------------------
+// Merge all nodes meshes into a single one
+// - nodes: the nodes to merge
+// - meshes: the mesh array that the nodes is referring to
+nvh::PrimitiveMesh mergeNodes(const std::vector<nvh::Node>& nodes, const std::vector<nvh::PrimitiveMesh> meshes)
+{
+  nvh::PrimitiveMesh resultMesh;
+
+  // Find how many triangles and vertices the merged mesh will have
+  size_t nb_triangles = 0;
+  size_t nb_vertices  = 0;
+  for(const auto& n : nodes)
+  {
+    nb_triangles += meshes[n.mesh].triangles.size();
+    nb_vertices += meshes[n.mesh].vertices.size();
+  }
+  resultMesh.triangles.reserve(nb_triangles);
+  resultMesh.vertices.reserve(nb_vertices);
+
+  // Merge all nodes meshes into a single one
+  for(const auto& n : nodes)
+  {
+    const glm::mat4 mat = n.localMatrix();
+
+    uint32_t                  tIndex = static_cast<uint32_t>(resultMesh.vertices.size());
+    const nvh::PrimitiveMesh& mesh   = meshes[n.mesh];
+
+    for(auto v : mesh.vertices)
+    {
+      v.p = glm::vec3(mat * glm::vec4(v.p, 1));
+      resultMesh.vertices.push_back(v);
+    }
+    for(auto t : mesh.triangles)
+    {
+      t.v += tIndex;
+      resultMesh.triangles.push_back(t);
+    }
+  }
+
+  return resultMesh;
+}
+
+
+// Takes a 3D mesh as input and modifies its vertices by adding random displacements within a
+// specified `amplitude` range to create a wobbling effect. The intensity of the wobbling effect
+// can be controlled by adjusting the `amplitude` parameter.
+// The function returns the modified mesh.
+nvh::PrimitiveMesh wobblePrimitive(const nvh::PrimitiveMesh& mesh, float amplitude)
+{
+  // Seed the random number generator with a random device
+  std::random_device rd;
+  std::mt19937       gen(rd());
+
+  // Define the range for the random number generation (-1.0 to 1.0)
+  std::uniform_real_distribution<float> distribution(-1.0, 1.0);
+
+  // Our random function
+  auto rand = [&] { return distribution(gen); };
+
+  std::vector<PrimitiveVertex> newVertices;
+  for(auto& vertex : mesh.vertices)
+  {
+    glm::vec3 originalPosition = vertex.p;
+    glm::vec3 displacement     = glm::vec3(rand(), rand(), rand());
+    displacement *= amplitude;
+    glm::vec3 newPosition = originalPosition + displacement;
+
+    newVertices.push_back({newPosition, vertex.n, vertex.t});
+  }
+
+  return {newVertices, mesh.triangles};
+}
+
+// Takes a 3D mesh as input and returns a new mesh with duplicate vertices removed.
+// This function iterates through each triangle in the original PrimitiveMesh,
+// compares its vertices, and creates a new set of unique vertices in uniqueVertices.
+// We use an unordered_map called vertexIndexMap to keep track of the mapping between
+// the original vertices and their corresponding indices in the uniqueVertices vector.
+PrimitiveMesh removeDuplicateVertices(const PrimitiveMesh& mesh, bool testNormal, bool testUv)
+{
+  auto hash = [&](const PrimitiveVertex& v) {
+    if(testNormal)
+    {
+      if(testUv)
+        return nvh::hashVal(v.p.x, v.p.y, v.p.z, v.n.x, v.n.y, v.n.z, v.t.x, v.t.y);
+      else
+        return nvh::hashVal(v.p.x, v.p.y, v.p.z, v.n.x, v.n.y, v.n.z);
+    }
+    else if(testUv)
+      return nvh::hashVal(v.p.x, v.p.y, v.p.z, v.t.x, v.t.y);
+    return nvh::hashVal(v.p.x, v.p.y, v.p.z);
+  };
+  auto equal = [&](const PrimitiveVertex& l, const PrimitiveVertex& r) {
+    return (l.p == r.p) && (testNormal ? l.n == r.n : true) && (testUv ? l.t == r.t : true);
+  };
+  std::unordered_map<PrimitiveVertex, uint32_t, decltype(hash), decltype(equal)> vertexIndexMap(0, hash, equal);
+
+  std::vector<PrimitiveVertex>   uniqueVertices;
+  std::vector<PrimitiveTriangle> uniqueTriangles;
+
+  for(const auto& triangle : mesh.triangles)
+  {
+    PrimitiveTriangle uniqueTriangle = {};
+    for(int i = 0; i < 3; i++)
+    {
+      const PrimitiveVertex& vertex = mesh.vertices[triangle.v[i]];
+
+      // Check if the vertex is already in the uniqueVertices list
+      auto it = vertexIndexMap.find(vertex);
+      if(it == vertexIndexMap.end())
+      {
+        // Vertex not found, add it to uniqueVertices and update the index map
+        uint32_t newIndex      = static_cast<uint32_t>(uniqueVertices.size());
+        vertexIndexMap[vertex] = newIndex;
+        uniqueVertices.push_back(vertex);
+        uniqueTriangle.v[i] = newIndex;
+      }
+      else
+      {
+        // Vertex found, use its index in uniqueVertices
+        uniqueTriangle.v[i] = it->second;
+      }
+    }
+    uniqueTriangles.push_back(uniqueTriangle);
+  }
+
+  // nvprintf("Before: %d vertex, %d triangles\n", mesh.vertices.size(), mesh.triangles.size());
+  // nvprintf("After: %d vertex, %d triangles\n", uniqueVertices.size(), uniqueTriangles.size());
+
+  return {uniqueVertices, uniqueTriangles};
+}
+
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/primitives.hpp
+++ b/raytracer/nvpro_core/nvh/primitives.hpp
@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2022 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+#include <vector>
+#include <cstdint>
+#include <glm/glm.hpp>
+#include <glm/gtc/quaternion.hpp>
+
+/* @DOC_START
+# struct `nvh::PrimitiveMesh`
+  - Common primitive type, made of vertices: position, normal and texture coordinates.
+  - All primitives are triangles, and each 3 indices is forming a triangle.
+
+# struct `nvh::Node`
+  - Structure to hold a reference to a mesh, with a material and transformation.
+
+Primitives that can be created:
+* Tetrahedron
+* Icosahedron
+* Octahedron
+* Plane
+* Cube
+* SphereUv
+* Cone
+* SphereMesh
+* Torus
+
+Node creator: returns the instance and the position
+* MengerSponge
+* SunFlower
+
+Other utilities
+* mergeNodes
+* removeDuplicateVertices
+* wobblePrimitive
+
+@DOC_END */
+
+namespace nvh {
+struct PrimitiveVertex
+{
+  glm::vec3 p;  // Position
+  glm::vec3 n;  // Normal
+  glm::vec2 t;  // Texture Coordinates
+};
+
+struct PrimitiveTriangle
+{
+  glm::uvec3 v;  // vertex indices
+};
+
+struct PrimitiveMesh
+{
+  std::vector<PrimitiveVertex>   vertices;   // Array of all vertex
+  std::vector<PrimitiveTriangle> triangles;  // Indices forming triangles
+};
+
+struct Node
+{
+  glm::vec3 translation{};  //
+  glm::quat rotation{};     //
+  glm::vec3 scale{1.0F};    //
+  glm::mat4 matrix{1};      // Added with the above transformations
+  int       material{0};
+  int       mesh{-1};
+
+  glm::mat4 localMatrix() const
+  {
+    glm::mat4 translationMatrix = glm::translate(glm::mat4(1.0f), translation);
+    glm::mat4 rotationMatrix    = glm::mat4_cast(rotation);
+    glm::mat4 scaleMatrix       = glm::scale(glm::mat4(1.0f), scale);
+    glm::mat4 combinedMatrix    = translationMatrix * rotationMatrix * scaleMatrix * matrix;
+    return combinedMatrix;
+  }
+};
+
+PrimitiveMesh createTetrahedron();
+PrimitiveMesh createIcosahedron();
+PrimitiveMesh createOctahedron();
+PrimitiveMesh createPlane(int steps = 1, float width = 1.0F, float depth = 1.0F);
+PrimitiveMesh createCube(float width = 1.0F, float height = 1.0F, float depth = 1.0F);
+PrimitiveMesh createSphereUv(float radius = 0.5F, int sectors = 20, int stacks = 20);
+PrimitiveMesh createConeMesh(float radius = 0.5F, float height = 1.0F, int segments = 16);
+PrimitiveMesh createSphereMesh(float radius = 0.5F, int subdivisions = 3);
+PrimitiveMesh createTorusMesh(float majorRadius = 0.5F, float minorRadius = 0.25F, int majorSegments = 32, int minorSegments = 16);
+
+std::vector<Node> mengerSpongeNodes(int level = 3, float probability = -1.f, int seed = 1);
+std::vector<Node> sunflower(int seeds = 3000);
+
+// Utilities
+PrimitiveMesh mergeNodes(const std::vector<Node>& nodes, const std::vector<PrimitiveMesh> meshes);
+PrimitiveMesh removeDuplicateVertices(const PrimitiveMesh& mesh, bool testNormal = true, bool testUv = true);
+PrimitiveMesh wobblePrimitive(const PrimitiveMesh& mesh, float amplitude = 0.05F);
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/profiler.cpp
+++ b/raytracer/nvpro_core/nvh/profiler.cpp
@ -0,0 +1,459 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#include "profiler.hpp"
+
+#include <assert.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+
+
+//////////////////////////////////////////////////////////////////////////
+
+namespace nvh {
+
+const uint32_t Profiler::CONFIG_DELAY;
+const uint32_t Profiler::FRAME_DELAY;
+const uint32_t Profiler::START_SECTIONS;
+const uint32_t Profiler::MAX_NUM_AVERAGE;
+
+Profiler::Profiler(Profiler* master)
+{
+  m_data = master ? master->m_data : std::shared_ptr<Data>(new Data);
+  grow(START_SECTIONS);
+}
+
+Profiler::Profiler(uint32_t startSections)
+{
+  m_data = std::shared_ptr<Data>(new Data);
+  grow(startSections);
+}
+
+void Profiler::setAveragingSize(uint32_t num)
+{
+  assert(num <= MAX_NUM_AVERAGE);
+  m_data->numAveraging = num;
+
+  for(size_t i = 0; i < m_data->entries.size(); i++)
+  {
+    m_data->entries[i].cpuTime.init(num);
+    m_data->entries[i].gpuTime.init(num);
+  }
+  m_data->cpuTime.init(num);
+}
+
+void Profiler::beginFrame()
+{
+  m_data->level       = 0;
+  m_data->nextSection = 0;
+  m_data->frameSections.clear();
+
+  m_data->cpuCurrentTime = -m_clock.getMicroSeconds();
+}
+
+void Profiler::endFrame()
+{
+  assert(m_data->level == 0);
+
+  m_data->cpuCurrentTime += m_clock.getMicroSeconds();
+
+  if(!m_data->frameSections.empty() && ((uint32_t)m_data->frameSections.size() != m_data->numLastEntries))
+  {
+    m_data->numLastEntries  = (uint32_t)m_data->frameSections.size();
+    m_data->numLastSections = m_data->frameSections.back() + 1;
+    m_data->resetDelay      = CONFIG_DELAY;
+  }
+
+  if(m_data->resetDelay)
+  {
+    m_data->resetDelay--;
+    for(uint32_t i = 0; i < m_data->entries.size(); i++)
+    {
+      Entry& entry = m_data->entries[i];
+      if(entry.level != LEVEL_SINGLESHOT)
+      {
+        entry.numTimes = 0;
+        entry.cpuTime.reset();
+        entry.gpuTime.reset();
+      }
+    }
+    m_data->cpuTime.reset();
+    m_data->numFrames = 0;
+  }
+
+  if(m_data->numFrames > FRAME_DELAY)
+  {
+    for(uint32_t i : m_data->frameSections)
+    {
+      Entry& entry = m_data->entries[i];
+
+      if(entry.splitter)
+        continue;
+
+      uint32_t queryFrame = (m_data->numFrames + 1) % FRAME_DELAY;
+      bool     available  = entry.api.empty() || entry.gpuTimeProvider(i, queryFrame, entry.gpuTimes[queryFrame]);
+
+      if(available)
+      {
+        entry.cpuTime.add(entry.cpuTimes[queryFrame]);
+        entry.gpuTime.add(entry.gpuTimes[queryFrame]);
+        entry.numTimes++;
+      }
+    }
+
+    for(uint32_t i : m_data->singleSections)
+    {
+      Entry&   entry      = m_data->entries[i];
+      uint32_t queryFrame = entry.subFrame;
+
+      // query once
+      bool available = entry.cpuTime.numValid == 0
+                       && (entry.api.empty() || entry.gpuTimeProvider(i, queryFrame, entry.gpuTimes[queryFrame]));
+
+      if(available)
+      {
+        entry.cpuTime.add(entry.cpuTimes[queryFrame]);
+        entry.gpuTime.add(entry.gpuTimes[queryFrame]);
+        entry.numTimes++;
+      }
+    }
+
+    m_data->cpuTime.add(m_data->cpuCurrentTime);
+  }
+
+  m_data->numFrames++;
+}
+
+
+void Profiler::grow(uint32_t newsize)
+{
+  size_t oldsize = m_data->entries.size();
+
+  if(oldsize == newsize)
+  {
+    return;
+  }
+
+  m_data->entries.resize(newsize);
+
+  for(size_t i = oldsize; i < newsize; i++)
+  {
+    m_data->entries[i].cpuTime.init(m_data->numAveraging);
+    m_data->entries[i].gpuTime.init(m_data->numAveraging);
+  }
+}
+
+void Profiler::clear()
+{
+  m_data->entries.clear();
+  m_data->singleSections.clear();
+}
+
+void Profiler::reset(uint32_t delay)
+{
+  m_data->resetDelay = delay;
+}
+
+static std::string format(const char* msg, ...)
+{
+  std::size_t const STRING_BUFFER(8192);
+  char              text[STRING_BUFFER];
+  va_list           list;
+
+  if(msg == 0)
+    return std::string();
+
+  va_start(list, msg);
+#ifdef _WIN32
+  vsprintf_s(text, msg, list);
+#else  // #ifdef _WIN32
+  vsprintf(text, msg, list);
+#endif
+  va_end(list);
+
+  return std::string(text);
+}
+
+bool Profiler::getTimerInfo(uint32_t i, TimerInfo& info)
+{
+  Entry& entry = m_data->entries[i];
+
+  if(!entry.numTimes || entry.accumulated)
+  {
+    return false;
+  }
+
+  info.gpu.average     = entry.gpuTime.getAveraged();
+  info.cpu.average     = entry.cpuTime.getAveraged();
+  info.cpu.absMinValue = entry.cpuTime.absMinValue;
+  info.cpu.absMaxValue = entry.cpuTime.absMaxValue;
+  info.gpu.absMinValue = entry.gpuTime.absMinValue;
+  info.gpu.absMaxValue = entry.gpuTime.absMaxValue;
+  bool found           = false;
+  for(uint32_t n = i + 1; n < m_data->numLastSections; n++)
+  {
+    Entry& otherentry = m_data->entries[n];
+    if(otherentry.name == entry.name && otherentry.level == entry.level && otherentry.api == entry.api && !otherentry.accumulated)
+    {
+      found = true;
+      info.gpu.average += otherentry.gpuTime.getAveraged();
+      info.cpu.average += otherentry.cpuTime.getAveraged();
+      info.cpu.absMinValue += entry.cpuTime.absMinValue;
+      info.cpu.absMaxValue += entry.cpuTime.absMaxValue;
+      info.gpu.absMinValue += entry.gpuTime.absMinValue;
+      info.gpu.absMaxValue += entry.gpuTime.absMaxValue;
+      otherentry.accumulated = true;
+    }
+
+    if(otherentry.splitter && otherentry.level <= entry.level)
+      break;
+  }
+
+  info.accumulated = found;
+  info.numAveraged = entry.cpuTime.numValid;
+
+  return true;
+}
+
+bool Profiler::getTimerInfo(const char* name, TimerInfo& info)
+{
+  if(name == nullptr)
+  {
+    info = TimerInfo();
+    if(!m_data->cpuTime.numValid)
+    {
+      return false;
+    }
+    info.cpu.average     = m_data->cpuTime.getAveraged();
+    info.cpu.absMaxValue = m_data->cpuTime.absMaxValue;
+    info.cpu.absMinValue = m_data->cpuTime.absMinValue;
+    info.numAveraged     = m_data->cpuTime.numValid;
+
+    return true;
+  }
+
+  for(uint32_t i = 0; i < m_data->numLastSections; i++)
+  {
+    Entry& entry = m_data->entries[i];
+
+    entry.accumulated = false;
+  }
+
+  for(uint32_t i = 0; i < (uint32_t)m_data->entries.size(); i++)
+  {
+    Entry& entry = m_data->entries[i];
+
+    if(entry.name.empty())
+      continue;
+
+    if(name != entry.name)
+      continue;
+
+    return getTimerInfo(i, info);
+  }
+
+  return false;
+}
+
+void Profiler::print(std::string& stats)
+{
+  stats.clear();
+
+  for(uint32_t i = 0; i < m_data->numLastSections; i++)
+  {
+    Entry& entry      = m_data->entries[i];
+    entry.accumulated = false;
+  }
+
+  printf("Timer null;\t N/A %6d; CPU %6d;\n", 0, (uint32_t)m_data->cpuTime.getAveraged());
+
+  for(uint32_t i = 0; i < m_data->numLastSections; i++)
+  {
+    static const char* spaces = "        ";  // 8
+    Entry&             entry  = m_data->entries[i];
+
+    if(entry.level == LEVEL_SINGLESHOT)
+      continue;
+
+    uint32_t level = 7 - (entry.level > 7 ? 7 : entry.level);
+
+    TimerInfo info;
+    if(!getTimerInfo(i, info))
+      continue;
+
+    const char* gpuname   = !entry.api.empty() ? entry.api.c_str() : "N/A";
+    const char* entryname = !entry.name.empty() ? entry.name.c_str() : "N/A";
+
+    if(info.accumulated)
+    {
+      stats += format("%sTimer %s;\t %s %6d; CPU %6d; (microseconds, accumulated loop)\n", &spaces[level], entryname,
+                      gpuname, (uint32_t)(info.gpu.average), (uint32_t)(info.cpu.average));
+    }
+    else
+    {
+      stats += format("%sTimer %s;\t %s %6d; CPU %6d; (microseconds, avg %d)\n", &spaces[level], entryname, gpuname,
+                      (uint32_t)(info.gpu.average), (uint32_t)(info.cpu.average), (uint32_t)entry.cpuTime.numValid);
+    }
+  }
+}
+
+uint32_t Profiler::getTotalFrames() const
+{
+  return m_data->numFrames;
+}
+
+void Profiler::accumulationSplit()
+{
+  SectionID sec = getSectionID(false, nullptr);
+  if(sec >= m_data->entries.size())
+  {
+    grow((uint32_t)(m_data->entries.size() * 2));
+  }
+
+  m_data->entries[sec].level    = m_data->level;
+  m_data->entries[sec].splitter = true;
+}
+
+Profiler::SectionID Profiler::getSectionID(bool singleShot, const char* name)
+{
+  uint32_t numEntries = (uint32_t)m_data->entries.size();
+
+  if(singleShot)
+  {
+    // find empty slot or with same name
+    for(uint32_t i = 0; i < numEntries; i++)
+    {
+      Entry& entry = m_data->entries[i];
+      if(entry.name == name || entry.name.empty())
+      {
+        m_data->singleSections.push_back(i);
+        return i;
+      }
+    }
+    m_data->singleSections.push_back(numEntries);
+    return numEntries;
+  }
+  else
+  {
+    // find non-single shot slot
+    while(m_data->nextSection < numEntries && m_data->entries[m_data->nextSection].level == LEVEL_SINGLESHOT)
+    {
+      m_data->nextSection++;
+    }
+
+    m_data->frameSections.push_back(m_data->nextSection);
+    return m_data->nextSection++;
+  }
+}
+
+Profiler::SectionID Profiler::beginSection(const char* name, const char* api, gpuTimeProvider_fn gpuTimeProvider, bool singleShot)
+{
+  uint32_t  subFrame = m_data->numFrames % FRAME_DELAY;
+  SectionID sec      = getSectionID(singleShot, name);
+
+  if(sec >= m_data->entries.size())
+  {
+    grow((uint32_t)(m_data->entries.size() * 2));
+  }
+
+  Entry&   entry = m_data->entries[sec];
+  uint32_t level = singleShot ? LEVEL_SINGLESHOT : (m_data->level++);
+
+  const std::string name_str = (name ? name : "");
+  const std::string api_str  = (api ? api : "");
+  if(entry.name != name_str || entry.api != api_str || entry.level != level)
+  {
+    entry.name = name_str;
+    entry.api  = api_str;
+
+    if(!singleShot)
+    {
+      m_data->resetDelay = CONFIG_DELAY;
+    }
+  }
+
+  entry.subFrame        = subFrame;
+  entry.level           = level;
+  entry.splitter        = false;
+  entry.gpuTimeProvider = gpuTimeProvider;
+
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+  {
+    nvtxEventAttributes_t eventAttrib = {0};
+    eventAttrib.version               = NVTX_VERSION;
+    eventAttrib.size                  = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
+    eventAttrib.colorType             = NVTX_COLOR_ARGB;
+
+    unsigned char color[4];
+    color[0] = 255;
+    color[1] = 0;
+    color[2] = sec % 2 ? 127 : 255;
+    color[3] = 255;
+
+    color[2] -= level * 16;
+    color[3] -= level * 16;
+
+    eventAttrib.color         = *(uint32_t*)(color);
+    eventAttrib.messageType   = NVTX_MESSAGE_TYPE_ASCII;
+    eventAttrib.message.ascii = name;
+    nvtxRangePushEx(&eventAttrib);
+  }
+#endif
+
+  entry.cpuTimes[subFrame] = -getMicroSeconds();
+  entry.gpuTimes[subFrame] = 0;
+
+  if(singleShot)
+  {
+    entry.cpuTime.init(1);
+    entry.gpuTime.init(1);
+  }
+
+  return sec;
+}
+
+void Profiler::endSection(SectionID sec)
+{
+  Entry& entry = m_data->entries[sec];
+
+  entry.cpuTimes[entry.subFrame] += getMicroSeconds();
+
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+  nvtxRangePop();
+#endif
+
+  if(entry.level != LEVEL_SINGLESHOT)
+  {
+    m_data->level--;
+  }
+}
+
+Profiler::Clock::Clock()
+{
+  m_init = std::chrono::high_resolution_clock::now();
+}
+
+double Profiler::Clock::getMicroSeconds() const
+{
+  return double(std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - m_init).count())
+         / double(1000);
+}
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/profiler.hpp
+++ b/raytracer/nvpro_core/nvh/profiler.hpp
@ -0,0 +1,369 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_PROFILER_INCLUDED
+#define NV_PROFILER_INCLUDED
+
+
+#include <algorithm>
+#include <chrono>
+#include <float.h>  // DBL_MAX
+#include <functional>
+#include <memory>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>  //memset
+#include <string>
+#include <vector>
+
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+#define NVTX_STDINT_TYPES_ALREADY_DEFINED
+#include <nvtx3/nvToolsExt.h>
+#endif
+
+namespace nvh {
+
+//////////////////////////////////////////////////////////////////////////
+/** @DOC_START
+    # class nvh::Profiler
+
+    > The nvh::Profiler class is designed to measure timed sections.
+
+    Each section has a cpu and gpu time. Gpu times are typically provided
+    by derived classes for each individual api (e.g. OpenGL, Vulkan etc.).
+    
+    There is functionality to pretty print the sections with their nesting level.
+    Multiple profilers can reference the same database, so one profiler
+    can serve as master that they others contribute to. Typically the
+    base class measuring only CPU time could be the master, and the api
+    derived classes reference it to share the same database.
+
+    Profiler::Clock can be used standalone for time measuring.
+@DOC_END  */
+
+class Profiler
+{
+public:
+  /// if we detect a change in timers (api/name change we trigger a reset after that amount of frames)
+  static const uint32_t CONFIG_DELAY = 8;
+  /// gpu times are queried after that amount of frames
+  static const uint32_t FRAME_DELAY = 4;
+  /// by default we start with space for that many begin/end sections per-frame
+  static const uint32_t START_SECTIONS = 64;
+  /// cyclic window for averaging
+  static const uint32_t MAX_NUM_AVERAGE = 128;
+
+public:
+  typedef uint32_t SectionID;
+  typedef uint32_t OnceID;
+
+  class Clock
+  {
+    // generic utility class for measuring time
+    // uses high resolution timer provided by OS
+  public:
+    Clock();
+    double getMicroSeconds() const;
+
+  private:
+    std::chrono::time_point<std::chrono::high_resolution_clock> m_init;
+  };
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // utility class for automatic calling of begin/end within a local scope
+  class Section
+  {
+  public:
+    Section(Profiler& profiler, const char* name, bool singleShot = false)
+        : m_profiler(profiler)
+    {
+      m_id = profiler.beginSection(name, nullptr, nullptr, singleShot);
+    }
+    ~Section() { m_profiler.endSection(m_id); }
+
+  private:
+    SectionID m_id;
+    Profiler& m_profiler;
+  };
+
+  // recurring, must be within beginFrame/endFrame
+  Section timeRecurring(const char* name) { return Section(*this, name, false); }
+
+  // single shot, results are available after FRAME_DELAY many endFrame
+  Section timeSingle(const char* name) { return Section(*this, name, true); }
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // num <= MAX_NUM_AVERAGE
+  void setAveragingSize(uint32_t num);
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // gpu times for a section are queried at "endFrame" with the use of this optional function.
+  // It returns true if the queried result was available, and writes the microseconds into gpuTime.
+  typedef std::function<bool(SectionID, uint32_t subFrame, double& gpuTime)> gpuTimeProvider_fn;
+
+  // must be called every frame
+  void beginFrame();
+  void endFrame();
+
+  // there are two types of sections
+  //  singleShot = true, means the timer can exist outside begin/endFrame and is non-recurring
+  //                     results of previous singleShot with same name will be overwritten.
+  // singleShot = false, sections can be nested, but must be within begin/endFrame
+  //
+
+  SectionID beginSection(const char* name, const char* api = nullptr, gpuTimeProvider_fn gpuTimeProvider = nullptr, bool singleShot = false);
+  void endSection(SectionID slot);
+
+  // When a section is used within a loop (same nesting level), and the the same arguments for name and api are
+  // passed, we normally average the results of those sections together when printing the stats or using the
+  // getAveraged functions below.
+  // Calling the splitter (outside of a section) means we insert a split point that the averaging will not
+  // pass.
+  void accumulationSplit();
+
+
+  inline double getMicroSeconds() const { return m_clock.getMicroSeconds(); }
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // resets all stats
+  void clear();
+
+  // resets recurring sections
+  // in case averaging should be reset after a few frames (warm-up cache, hide early heavier frames after
+  // configuration changes)
+  // implicit resets are triggered if the frame's configuration of timer section changes compared to
+  // previous frame.
+  void reset(uint32_t delay = CONFIG_DELAY);
+
+  // pretty print current averaged timers
+  void print(std::string& stats);
+
+  // returns number of frames since reset
+  uint32_t getTotalFrames() const;
+
+  struct TimerStats
+  {
+    // time in microseconds
+    double average     = 0;
+    double absMinValue = DBL_MAX;
+    double absMaxValue = 0;
+  };
+
+  struct TimerInfo
+  {
+    // number of averaged values, <= MAX_NUM_AVERAGE
+    uint32_t numAveraged = 0;
+
+    // accumulation happens for example in loops:
+    //   for (..) { auto scopeTimer = timeSection("blah"); ... }
+    // then the reported values are the accumulated sum of all those timers.
+    bool accumulated = false;
+
+    TimerStats cpu;
+    TimerStats gpu;
+  };
+
+  // query functions for current gathered cyclic averages ( <= MAX_NUM_AVERAGE)
+  // use nullptr name to get the cpu timing of the outermost scope (beginFrame/endFrame)
+  // returns true if found timer and it had valid values
+  bool getTimerInfo(const char* name, TimerInfo& info);
+
+  // simplified wrapper
+  bool getAveragedValues(const char* name, double& cpuTime, double& gpuTime)
+  {
+    TimerInfo info;
+
+    if(getTimerInfo(name, info))
+    {
+      cpuTime = info.cpu.average;
+      gpuTime = info.gpu.average;
+      return true;
+    }
+    else
+    {
+      cpuTime = 0;
+      gpuTime = 0;
+      return false;
+    }
+  }
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // if a master is provided we use its database
+  // otherwise our own
+  Profiler(Profiler* master = nullptr);
+
+  Profiler(uint32_t startSections);
+
+protected:
+  //////////////////////////////////////////////////////////////////////////
+
+  // Utility functions for derived classes that provide gpu times.
+  // We assume most apis use a big pool of api-specific events/timers,
+  // the functions below help manage such pool.
+
+  inline uint32_t getSubFrame(SectionID slot) const { return m_data->entries[slot].subFrame; }
+  inline uint32_t getRequiredTimers() const { return (uint32_t)(m_data->entries.size() * FRAME_DELAY * 2); }
+
+  static inline uint32_t getTimerIdx(SectionID slot, uint32_t subFrame, bool begin)
+  {
+    // must not change order of begin/end
+    return ((slot * FRAME_DELAY) + subFrame) * 2 + (begin ? 0 : 1);
+  }
+
+  inline bool isSectionRecurring(SectionID slot) const { return m_data->entries[slot].level != LEVEL_SINGLESHOT; }
+
+protected:
+  //////////////////////////////////////////////////////////////////////////
+
+  static const uint32_t LEVEL_SINGLESHOT = ~0;
+
+  struct TimeValues
+  {
+    double times[MAX_NUM_AVERAGE] = {0};
+    double valueTotal             = 0;
+    double absMinValue            = DBL_MAX;
+    double absMaxValue            = 0;
+
+    uint32_t index    = 0;
+    uint32_t numCycle = MAX_NUM_AVERAGE;
+    uint32_t numValid = 0;
+
+    TimeValues(uint32_t cycleSize = MAX_NUM_AVERAGE) { init(cycleSize); }
+
+    void init(uint32_t cycleSize)
+    {
+      numCycle = std::min(cycleSize, MAX_NUM_AVERAGE);
+      reset();
+    }
+
+    void reset()
+    {
+      valueTotal  = 0;
+      absMinValue = DBL_MAX;
+      absMaxValue = 0;
+      index       = 0;
+      numValid    = 0;
+      memset(times, 0, sizeof(times));
+    }
+
+    void add(double time)
+    {
+      valueTotal += time - times[index];
+      times[index] = time;
+
+      index    = (index + 1) % numCycle;
+      numValid = std::min(numValid + 1, numCycle);
+
+      absMinValue = std::min(time, absMinValue);
+      absMaxValue = std::max(time, absMaxValue);
+    }
+
+    double getAveraged()
+    {
+      if(numValid)
+      {
+        return valueTotal / double(numValid);
+      }
+      else
+      {
+        return 0;
+      }
+    }
+  };
+
+  struct Entry
+  {
+    std::string        name            = {};
+    std::string        api             = {};
+    gpuTimeProvider_fn gpuTimeProvider = nullptr;
+
+    // level == ~0 used for "singleShot"
+    uint32_t level    = 0;
+    uint32_t subFrame = 0;
+
+#ifdef NVP_SUPPORTS_NVTOOLSEXT
+    nvtxRangeId_t m_nvrange;
+#endif
+    double cpuTimes[FRAME_DELAY] = {0};
+    double gpuTimes[FRAME_DELAY] = {0};
+
+    // number of times summed since last reset
+    uint32_t numTimes = 0;
+
+    TimeValues gpuTime;
+    TimeValues cpuTime;
+
+    // splitter is used to prevent accumulated case below
+    // when same depth level is used
+    // {section("BLAH"); ... }
+    // splitter
+    // {section("BLAH"); ...}
+    // now the result of "BLAH" is not accumulated
+
+    bool splitter = false;
+
+    // if the same timer name is used within a loop (same
+    // depth level), e.g.:
+    //
+    // for () { section("BLAH"); ... }
+    //
+    // we accumulate the timing values of all of them
+
+    bool accumulated = false;
+  };
+
+  struct Data
+  {
+    uint32_t numAveraging = MAX_NUM_AVERAGE;
+    uint32_t resetDelay   = 0;
+    uint32_t numFrames    = 0;
+
+    uint32_t level       = 0;
+    uint32_t nextSection = 0;
+
+    uint32_t numLastSections = 0;
+    uint32_t numLastEntries  = 0;
+
+    std::vector<uint32_t> frameSections;
+    std::vector<uint32_t> singleSections;
+
+    double     cpuCurrentTime = 0;
+    TimeValues cpuTime;
+
+    std::vector<Entry> entries;
+  };
+
+
+  std::shared_ptr<Data> m_data = nullptr;
+  Clock                 m_clock;
+
+  SectionID getSectionID(bool singleShot, const char* name);
+
+  bool getTimerInfo(uint32_t i, TimerInfo& info);
+  void grow(uint32_t newsize);
+};
+}  // namespace nvh
+
+#endif
--- a/raytracer/nvpro_core/nvh/radixsort.hpp
+++ b/raytracer/nvpro_core/nvh/radixsort.hpp
@ -0,0 +1,108 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_RADIXSORT_INCLUDED
+#define NV_RADIXSORT_INCLUDED
+
+namespace nvh {
+
+/** @DOC_START
+      # function nvh::radixsort
+
+      The radixsort function sorts the provided keys based on
+      BYTES many bytes stored inside TKey starting at BYTEOFFSET.
+      The sorting result is returned as indices into the keys array.
+      
+      For example:
+      
+      ```cpp
+      struct MyData {
+        uint32_t objectIdentifier;
+        uint16_t objectSortKey;
+      };
+      
+      
+      // 4-byte offset of objectSortKey within MyData
+      // 2-byte size of sorting key
+      
+      result = radixsort<4,2>(keys, indicesIn, indicesTemp);
+      
+      // after sorting the following is true
+      
+      keys[result[i]].objectSortKey < keys[result[i + 1]].objectSortKey
+
+      // result can point either to indicesIn or indicesTemp (we swap the arrays
+      // after each byte iteration)
+      ```
+@DOC_END    */
+
+template <uint32_t BYTEOFFSET, uint32_t BYTES, typename TKey>
+uint32_t* radixsort(uint32_t numIndices, const TKey* keys, uint32_t* indicesIn, uint32_t* indicesTemp)
+{
+  uint32_t histogram[BYTES][256] = {0};
+
+  for(uint32_t i = 0; i < numIndices; i++)
+  {
+    uint32_t       idx   = indicesIn[i];
+    const uint8_t* bytes = (const uint8_t*)&keys[idx];
+    for(uint32_t p = 0; p < BYTES; p++)
+    {
+      uint8_t curbyte = bytes[BYTEOFFSET + p];
+      histogram[p][curbyte]++;
+    }
+  }
+
+  uint32_t* tempIn  = indicesIn;
+  uint32_t* tempOut = indicesTemp;
+
+  for(uint32_t p = 0; p < BYTES; p++)
+  {
+    uint32_t offset = 0;
+    for(int32_t i = 0; i < 256; i++)
+    {
+      uint32_t numBin = histogram[p][i];
+      histogram[p][i] = offset;
+      offset += numBin;
+    }
+
+    for(uint32_t i = 0; i < numIndices; i++)
+    {
+      uint32_t       idx     = tempIn[i];
+      const uint8_t* bytes   = (const uint8_t*)&keys[idx];
+      uint8_t        curbyte = bytes[BYTEOFFSET + p];
+      uint32_t       pos     = histogram[p][curbyte]++;
+      tempOut[pos]           = idx;
+    }
+
+    assert(histogram[p][255] == offset);
+
+    // swap
+    uint32_t* temp = tempIn;
+    tempIn         = tempOut;
+    tempOut        = temp;
+  }
+
+  // post swap tempIn is last tempOut
+  return tempIn;
+}
+
+}  // namespace nvh
+
+#endif
--- a/raytracer/nvpro_core/nvh/shaderfilemanager.cpp
+++ b/raytracer/nvpro_core/nvh/shaderfilemanager.cpp
@ -0,0 +1,320 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+/*
+ * This file contains code derived from glf by Christophe Riccio, www.g-truc.net
+ * Copyright (c) 2005 - 2015 G-Truc Creation (www.g-truc.net)
+ * https://github.com/g-truc/ogl-samples/blob/master/framework/compiler.cpp
+ */
+
+#include "shaderfilemanager.hpp"
+#include <algorithm>
+#include <assert.h>
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <stdarg.h>
+#include <stdio.h>
+
+#include "fileoperations.hpp"
+
+
+namespace nvh {
+
+std::string ShaderFileManager::format(const char* msg, ...)
+{
+  char    text[8192];
+  va_list list;
+
+  if(msg == 0)
+    return std::string();
+
+  va_start(list, msg);
+  vsnprintf(text, sizeof(text), msg, list);
+  va_end(list);
+
+  return std::string(text);
+}
+
+inline std::string ShaderFileManager::markerString(int line, std::string const& filename, int fileid)
+{
+  if(m_supportsExtendedInclude || m_forceLineFilenames)
+  {
+#if defined(_WIN32) && 1
+    std::string fixedname;
+    for(size_t i = 0; i < filename.size(); i++)
+    {
+      char c = filename[i];
+      if(c == '/' || c == '\\')
+      {
+        fixedname.append("\\\\");
+      }
+      else
+      {
+        fixedname.append(1, c);
+      }
+    }
+#else
+    std::string fixedname = filename;
+#endif
+    return ShaderFileManager::format("#line %d \"", line) + fixedname + std::string("\"\n");
+  }
+  else
+  {
+    return ShaderFileManager::format("#line %d %d\n", line, fileid);
+  }
+}
+
+std::string ShaderFileManager::getIncludeContent(IncludeID idx, std::string& filename)
+{
+  IncludeEntry& entry = m_includes[idx];
+
+  filename = entry.filename;
+
+  if(m_forceIncludeContent)
+  {
+    return entry.content;
+  }
+
+  if(!entry.content.empty() && !findFile(entry.filename, m_directories).empty())
+  {
+    return entry.content;
+  }
+
+  std::string content = loadFile(entry.filename, false, m_directories, filename, true);
+  return content.empty() ? entry.content : content;
+}
+
+std::string ShaderFileManager::getContent(std::string const& filename, std::string& filenameFound)
+{
+  if(filename.empty())
+  {
+    return std::string();
+  }
+
+  IncludeID idx = findInclude(filename);
+
+  if(idx.isValid())
+  {
+    return getIncludeContent(idx, filenameFound);
+  }
+
+  // fall back
+  filenameFound = filename;
+  return loadFile(filename, false, m_directories, filenameFound, true);
+}
+
+std::string ShaderFileManager::getContentWithRequestingSourceDirectory(std::string const& filename,
+                                                                       std::string&       filenameFound,
+                                                                       std::string const& requestingSource)
+{
+  if(filename.empty())
+  {
+    return std::string();
+  }
+
+  IncludeID idx = findInclude(filename);
+
+  if(idx.isValid())
+  {
+    return getIncludeContent(idx, filenameFound);
+  }
+
+  // fall back; check requestingSource's directory first.
+  filenameFound = filename;
+  m_extendedDirectories.resize(m_directories.size() + 1);
+  m_extendedDirectories[0] = getDirectoryComponent(requestingSource);
+  for(size_t i = 0; i < m_directories.size(); ++i)
+  {
+    m_extendedDirectories[i + 1] = m_directories[i];
+  }
+  return loadFile(filename, false, m_extendedDirectories, filenameFound, true);
+}
+
+std::string ShaderFileManager::getDirectoryComponent(std::string filename)
+{
+  while(!filename.empty())
+  {
+    auto popped = filename.back();
+    filename.pop_back();
+    switch(popped)
+    {
+      case '/':
+        goto exitLoop;
+#if defined(_WIN32)
+      case '\\':
+        goto exitLoop;
+#endif
+    }
+  }
+exitLoop:
+  if(filename.empty())
+    filename.push_back('.');
+  return filename;
+}
+
+std::string ShaderFileManager::manualInclude(std::string const& filename, std::string& filenameFound, std::string const& prepend, bool foundVersion)
+{
+  std::string source = getContent(filename, filenameFound);
+  return manualIncludeText(source, filenameFound, prepend, foundVersion);
+}
+
+std::string ShaderFileManager::manualIncludeText(std::string const& sourceText,
+                                                 std::string const& textFilename,
+                                                 std::string const& prepend,
+                                                 bool               foundVersion)
+{
+  if(sourceText.empty())
+  {
+    return std::string();
+  }
+
+  std::stringstream stream;
+  stream << sourceText;
+  std::string line, text;
+
+  // Handle command line defines
+  text += prepend;
+  if(m_lineMarkers)
+  {
+    text += markerString(1, textFilename, 0);
+  }
+
+  int lineCount = 0;
+  while(std::getline(stream, line))
+  {
+    std::size_t offset = 0;
+    lineCount++;
+
+    // Version
+    offset = line.find("#version");
+    if(offset != std::string::npos)
+    {
+      std::size_t commentOffset = line.find("//");
+      if(commentOffset != std::string::npos && commentOffset < offset)
+        continue;
+
+      if(foundVersion)
+      {
+        // someone else already set the version, so just comment out
+        text += std::string("//") + line + std::string("\n");
+      }
+      else
+      {
+        // Reorder so that the #version line is always the first of a shader text
+        text         = line + std::string("\n") + text + std::string("//") + line + std::string("\n");
+        foundVersion = true;
+      }
+      continue;
+    }
+
+    // Handle replacing #include with text if configured to do so.
+    // Otherwise just insert the #include command verbatim, for shaderc to handle.
+    if(m_handleIncludePasting)
+    {
+      offset = line.find("#include");
+      if(offset != std::string::npos)
+      {
+        std::size_t commentOffset = line.find("//");
+        if(commentOffset != std::string::npos && commentOffset < offset)
+          continue;
+
+        size_t firstQuote  = line.find("\"", offset);
+        size_t secondQuote = line.find("\"", firstQuote + 1);
+
+        std::string include = line.substr(firstQuote + 1, secondQuote - firstQuote - 1);
+
+        std::string includeFound;
+        std::string includeContent = manualInclude(include, includeFound, std::string(), foundVersion);
+
+        if(!includeContent.empty())
+        {
+          text += includeContent;
+          if(m_lineMarkers)
+          {
+            text += std::string("\n") + markerString(lineCount + 1, textFilename, 0);
+          }
+        }
+        continue;  // Skip adding the original #include line.
+      }
+    }
+
+    text += line + "\n";
+  }
+
+  return text;
+}
+
+
+ShaderFileManager::IncludeID ShaderFileManager::registerInclude(std::string const& name, std::string const& filename, std::string const& content)
+{
+  // find if already registered
+  for(size_t i = 0; i < m_includes.size(); i++)
+  {
+    if(m_includes[i].name == name)
+    {
+      m_includes[i].content = content;
+      return i;
+    }
+  }
+
+  IncludeEntry entry;
+  entry.name     = name;
+  entry.filename = filename.empty() ? name : filename;
+  entry.content  = content;
+
+  m_includes.push_back(entry);
+
+  return m_includes.size() - 1;
+}
+
+
+ShaderFileManager::IncludeID ShaderFileManager::findInclude(std::string const& name) const
+{
+  // check registered includes first
+  for(std::size_t i = 0; i < m_includes.size(); ++i)
+  {
+    if(m_includes[i].name == name)
+    {
+      return IncludeID(i);
+    }
+  }
+
+  return IncludeID();
+}
+
+bool ShaderFileManager::loadIncludeContent(IncludeID idx)
+{
+  std::string filenameFound;
+  m_includes[idx].content = getIncludeContent(idx, filenameFound);
+  return !m_includes[idx].content.empty();
+}
+
+const ShaderFileManager::IncludeEntry& ShaderFileManager::getIncludeEntry(IncludeID idx) const
+{
+  return m_includes[idx];
+}
+
+std::string ShaderFileManager::getProcessedContent(std::string const& filename, std::string& filenameFound)
+{
+  return manualInclude(filename, filenameFound, "", false);
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/shaderfilemanager.hpp
+++ b/raytracer/nvpro_core/nvh/shaderfilemanager.hpp
@ -0,0 +1,204 @@
+/*
+ * Copyright (c) 2014-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#ifndef NV_SHADERFILEMANAGER_INCLUDED
+#define NV_SHADERFILEMANAGER_INCLUDED
+
+
+#include <stdint.h>
+#include <stdio.h>
+#include <string>
+#include <vector>
+
+namespace nvh {
+
+class ShaderFileManager
+{
+
+  //////////////////////////////////////////////////////////////////////////
+  /** @DOC_START
+    # class nvh::ShaderFileManager
+
+    The nvh::ShaderFileManager class is meant to be derived from to create the actual api-specific
+    shader/program managers.
+
+    The ShaderFileManager provides a system to find/load shader files.
+    It also allows resolving #include instructions in HLSL/GLSL source files.
+    Such includes can be registered before pointing to strings in memory.
+
+    If m_handleIncludePasting is true, then `#include`s are replaced by
+    the include file contents (recursively) before presenting the
+    loaded shader source code to the caller. Otherwise, the include file
+    loader is still available but `#include`s are left unchanged.
+
+    Furthermore it handles injecting prepended strings (typically used
+    for #defines) after the #version statement of GLSL files,
+    regardless of m_handleIncludePasting's value.
+
+@DOC_END  */
+
+public:
+  enum FileType
+  {
+    FILETYPE_DEFAULT,
+    FILETYPE_GLSL,
+    FILETYPE_HLSL,
+    FILETYPE_SPIRV,
+  };
+
+  struct IncludeEntry
+  {
+    std::string name;
+    std::string filename;
+    std::string content;
+  };
+
+  typedef std::vector<IncludeEntry> IncludeRegistry;
+
+  static std::string format(const char* msg, ...);
+
+public:
+  class IncludeID
+  {
+  public:
+    size_t m_value;
+
+    IncludeID()
+        : m_value(size_t(~0))
+    {
+    }
+
+    IncludeID(size_t b)
+        : m_value((uint32_t)b)
+    {
+    }
+
+    IncludeID& operator=(size_t b)
+    {
+      m_value = b;
+      return *this;
+    }
+
+    bool isValid() const { return m_value != size_t(~0); }
+
+    operator bool() const { return isValid(); }
+    operator size_t() const { return m_value; }
+
+    friend bool operator==(const IncludeID& lhs, const IncludeID& rhs) { return rhs.m_value == lhs.m_value; }
+  };
+
+  struct Definition
+  {
+    Definition() {}
+    Definition(uint32_t type, std::string const& prepend, std::string const& filename)
+        : type(type)
+        , prepend(prepend)
+        , filename(filename)
+    {
+    }
+    Definition(uint32_t type, std::string const& filename)
+        : type(type)
+        , filename(filename)
+    {
+    }
+
+    uint32_t    type = 0;
+    std::string filename;
+    std::string prepend;
+    std::string entry    = "main";
+    FileType    filetype = FILETYPE_DEFAULT;
+    std::string filenameFound;
+    std::string content;
+  };
+
+
+  // optionally register files to be included, optionally provide content directly rather than from disk
+  //
+  // name: name used within shader files
+  // diskname = filename on disk (defaults to name if not set)
+  // content = provide content as string rather than loading from disk
+
+  IncludeID registerInclude(std::string const& name,
+                            std::string const& diskname = std::string(),
+                            std::string const& content  = std::string());
+
+  // Use m_prepend to pass global #defines
+  // Derived api classes will use this as global prepend to the per-definition prepends in combination
+  // with the source files
+  // actualSoure = m_prepend + definition.prepend + definition.content
+  std::string m_prepend;
+
+  // per file state, used when FILETYPE_DEFAULT is provided in the Definition
+  FileType m_filetype;
+
+  // add search directories
+  void addDirectory(const std::string& dir) { m_directories.push_back(dir); }
+
+  ShaderFileManager(bool handleIncludePasting = true)
+      : m_filetype(FILETYPE_GLSL)
+      , m_lineMarkers(true)
+      , m_forceLineFilenames(false)
+      , m_forceIncludeContent(false)
+      , m_supportsExtendedInclude(false)
+      , m_handleIncludePasting(handleIncludePasting)
+  {
+    m_directories.push_back(".");
+  }
+
+  //////////////////////////////////////////////////////////////////////////
+
+  // in rare cases you may want to access the included content in detail yourself
+
+  IncludeID           findInclude(std::string const& name) const;
+  bool                loadIncludeContent(IncludeID);
+  const IncludeEntry& getIncludeEntry(IncludeID idx) const;
+
+  std::string getProcessedContent(std::string const& filename, std::string& filenameFound);
+
+protected:
+  std::string markerString(int line, std::string const& filename, int fileid);
+  std::string getIncludeContent(IncludeID idx, std::string& filenameFound);
+  std::string getContent(std::string const& filename, std::string& filenameFound);
+  std::string getContentWithRequestingSourceDirectory(std::string const& filename,
+                                                      std::string&       filenameFound,
+                                                      std::string const& requestingSource);
+
+  static std::string getDirectoryComponent(std::string filename);
+
+  std::string manualInclude(std::string const& filename, std::string& filenameFound, std::string const& prepend, bool foundVersion);
+  std::string manualIncludeText(std::string const& sourceText, std::string const& textFilename, std::string const& prepend, bool foundVersion);
+
+  bool m_lineMarkers;
+  bool m_forceLineFilenames;
+  bool m_forceIncludeContent;
+  bool m_supportsExtendedInclude;
+  bool m_handleIncludePasting;
+
+  std::vector<std::string> m_directories;
+  IncludeRegistry          m_includes;
+
+  // Used as temporary storage in getContentWithRequestingSourceDirectory; saves on dynamic allocation.
+  std::vector<std::string> m_extendedDirectories;
+};
+
+}  // namespace nvh
+
+
+#endif  //NV_PROGRAM_INCLUDED
--- a/raytracer/nvpro_core/nvh/threading.hpp
+++ b/raytracer/nvpro_core/nvh/threading.hpp
@ -0,0 +1,172 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2014-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+#pragma once
+
+#include <thread>
+#include <condition_variable>
+#include <mutex>
+
+namespace nvh {
+
+using DefaultDelayClock    = std::chrono::steady_clock;
+using DefaultDelayDuration = std::chrono::nanoseconds;
+
+/** @DOC_START
+ # class nvh::delayed_call 
+ Class returned by delay_noreturn_for to track the thread created and possibly reset the
+ delay timer.
+@DOC_END */
+template <class Clock = DefaultDelayClock, class Duration = std::chrono::duration<double>>
+class delayed_call
+{
+  template <class ClockT, class DurationT, class Function, class... Args>
+  friend delayed_call<ClockT, DurationT> delay_noreturn_for(const DurationT& sleep_duration, Function&& f, Args&&... args);
+
+public:
+  /** Update the thread to make the call sleep_duration from now
+     * 
+     * \return True if the delay was updated before the callback was called. False otherwise.
+     */
+  bool delay_for(const Duration& sleep_duration)
+  {
+    bool result = false;
+    if(m_delay)
+    {
+      std::lock_guard<std::mutex> lock(m_delay->mutex);
+      if(!m_delay->started)
+      {
+        auto prevUntil = m_delay->until;
+        m_delay->until = Clock::now() + sleep_duration;
+
+        // No need to wake up the other thread if the delay is longer. It'll keep looping while dirty is set.
+        if(prevUntil < m_delay->until)
+          m_delay->dirty = true;
+        else
+          m_delay->cv.notify_all();
+      }
+      result = !m_delay->started;
+    }
+    return result;
+  }
+
+  /** Cancel a delayed call
+     * 
+     * \return True if the call was cancelled before running. False otherwise.
+     */
+  bool cancel()
+  {
+    bool result = false;
+    if(m_delay)
+    {
+      std::lock_guard<std::mutex> lock(m_delay->mutex);
+      if(!m_delay->started)
+      {
+        m_delay->cancelled = true;
+        m_delay->cv.notify_all();
+      }
+      result = !m_delay->started;
+    }
+    return result;
+  }
+
+  delayed_call() = default;
+  delayed_call(delayed_call&& other) { *this = std::move(other); }
+  ~delayed_call() = default;
+
+  delayed_call& operator=(delayed_call&& other)
+  {
+    m_delay = std::move(other.m_delay);
+    return *this;
+  }
+
+  // This class is movable only
+  delayed_call(const delayed_call& other)            = delete;
+  delayed_call& operator=(const delayed_call& other) = delete;
+
+private:
+  struct DelayData
+  {
+    std::chrono::time_point<Clock, Duration> until;
+    std::thread                              thread;
+    std::mutex                               mutex;
+    std::condition_variable                  cv;
+    bool                                     dirty     = true;
+    bool                                     cancelled = false;
+    bool                                     started   = false;
+
+    ~DelayData()
+    {
+      if(thread.joinable())
+        thread.join();
+    }
+  };
+
+  template <class Function, class... Args>
+  static void delayEntry(DelayData* delay, Function&& f, Args&&... args)
+  {
+    {
+      std::unique_lock<std::mutex> lock(delay->mutex);
+      std::cv_status               status = std::cv_status::no_timeout;
+      while(!delay->cancelled && (delay->dirty || status == std::cv_status::no_timeout))
+      {
+        delay->dirty = false;
+        status       = delay->cv.wait_until(lock, delay->until);
+      }
+      if(delay->cancelled)
+        return;
+      delay->started = true;
+    }
+
+    // Ignore the return value. Need to keep a std::future object if not.
+    (void)f(std::forward<Args>(args)...);
+  }
+
+  std::unique_ptr<DelayData> m_delay;
+
+  template <class Function, class... Args>
+  delayed_call(const Duration& sleep_duration, Function&& f, Args&&... args)
+      : m_delay(std::make_unique<DelayData>())
+  {
+    m_delay->until = Clock::now() + sleep_duration;
+    m_delay->thread =
+        std::thread(delayed_call::delayEntry<std::remove_reference_t<Function>&&, std::remove_reference_t<Args>&&...>,
+                    m_delay.get(), std::forward<Function>(f), std::forward<Args>(args)...);
+  }
+};
+
+/** @DOC_START
+ Delay a call to a void function for sleep_duration.
+ 
+ `return`: A delayed_call object that holds the running thread.
+
+Example:
+ ```cpp
+ // Create or update a delayed call to callback. Useful to consolidate multiple events into one call.
+ if(!m_delayedCall.delay_for(delay))
+   m_delayedCall = nvh::delay_noreturn_for(delay, callback);
+ ```
+@DOC_END */
+template <class Clock = DefaultDelayClock, class Duration = DefaultDelayDuration, class Function, class... Args>
+delayed_call<Clock, Duration> delay_noreturn_for(const Duration& sleep_duration, Function&& f, Args&&... args)
+{
+  return delayed_call<Clock, Duration>(sleep_duration, std::forward<Function>(f), std::forward<Args>(args)...);
+}
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/timesampler.hpp
+++ b/raytracer/nvpro_core/nvh/timesampler.hpp
@ -0,0 +1,207 @@
+/*
+ * Copyright (c) 2013-2023, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2013 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+//--------------------------------------------------------------------
+#pragma once
+#include <chrono>
+#include <string>
+#include <cstdarg>
+#include <cassert>
+#include "nvprint.hpp"
+
+/* @DOC_START -----------------------------------------------------------------------------
+# struct TimeSampler
+TimeSampler does time sampling work
+@DOC_END ----------------------------------------------------------------------------- */
+struct TimeSampler
+{
+  using Clock     = std::chrono::steady_clock;
+  using TimePoint = typename Clock::time_point;
+  bool      bNonStopRendering;
+  int       renderCnt;
+  TimePoint start_time, end_time;
+  int       timing_counter;
+  int       maxTimeSamples;
+  int       frameFPS;
+  double    frameDT;
+  TimeSampler()
+  {
+    bNonStopRendering = true;
+    renderCnt         = 1;
+    timing_counter    = 0;
+    maxTimeSamples    = 60;
+    frameDT           = 1.0 / 60.0;
+    frameFPS          = 0;
+    start_time = end_time = Clock::now();
+  }
+  inline double getFrameDT() { return frameDT; }
+  inline int    getFPS() { return frameFPS; }
+  void          resetSampling(int i = 10) { maxTimeSamples = i; }
+  bool          update(bool bContinueToRender, bool* glitch = nullptr)
+  {
+    if(glitch)
+      *glitch = false;
+    bool updated = false;
+
+
+    if((timing_counter >= maxTimeSamples) && (maxTimeSamples > 0))
+    {
+      timing_counter = 0;
+      end_time       = Clock::now();
+
+      // Get delta in seconds
+      frameDT = std::chrono::duration_cast<std::chrono::duration<double>>(end_time - start_time).count();
+
+      // Linux/OSX etc. TODO
+      frameDT /= maxTimeSamples;
+#define MAXDT (1.0 / 40.0)
+#define MINDT (1.0 / 3000.0)
+      if(frameDT < MINDT)
+      {
+        frameDT = MINDT;
+      }
+      else if(frameDT > MAXDT)
+      {
+        frameDT = MAXDT;
+        if(glitch)
+          *glitch = true;
+      }
+      frameFPS = (int)(1.0 / frameDT);
+      // update the amount of samples to average, depending on the speed of the scene
+      maxTimeSamples = (int)(0.15 / (frameDT));
+      if(maxTimeSamples > 50)
+        maxTimeSamples = 50;
+      updated = true;
+    }
+    if(bContinueToRender || bNonStopRendering)
+    {
+      if(timing_counter == 0)
+        start_time = Clock::now();
+      timing_counter++;
+    }
+    return updated;
+    return true;
+  }
+};
+
+
+/** @DOC_START
+# struct nvh::Stopwatch
+> Timer in milliseconds. 
+
+Starts the timer at creation and the elapsed time is retrieved by calling `elapsed()`. 
+The timer can be reset if it needs to start timing later in the code execution.
+
+Usage:
+````cpp
+{
+  nvh::Stopwatch sw;
+  ... work ...
+  LOGI("Elapsed: %f ms\n", sw.elapsed()); // --> Elapsed: 128.157 ms
+}
+````
+@DOC_END */
+
+namespace nvh {
+struct Stopwatch
+{
+  Stopwatch() { reset(); }
+  void   reset() { startTime = std::chrono::steady_clock::now(); }
+  double elapsed()
+  {
+    return std::chrono::duration<double>(std::chrono::steady_clock::now() - startTime).count() * 1000.;
+  }
+  std::chrono::time_point<std::chrono::steady_clock> startTime;
+};
+
+// Logging the time spent while alive in a scope.
+// Usage: at beginning of a function:
+//   auto stimer = ScopedTimer("Time for doing X");
+// Nesting timers is handled, but since the time is printed when it goes out of
+// scope, printing anything else will break the output formatting.
+struct ScopedTimer
+{
+  ScopedTimer(const std::string& str) { init_(str); }
+  ScopedTimer(const char* fmt, ...)
+  {
+    std::string str(256, '\0');  // initial guess. ideally the first try fits
+    va_list     args1, args2;
+    va_start(args1, fmt);
+    va_copy(args2, args1);  // make a backup as vsnprintf may consume args1
+    int rc = vsnprintf(str.data(), str.size(), fmt, args1);
+    if(rc >= 0 && static_cast<size_t>(rc + 1) > str.size())
+    {
+      str.resize(rc + 1);  // include storage for '\0'
+      rc = vsnprintf(str.data(), str.size(), fmt, args2);
+    }
+    va_end(args1);
+    assert(rc >= 0 && "vsnprintf error");
+    str.resize(rc >= 0 ? static_cast<size_t>(rc) : 0);
+    init_(str);
+  }
+  void init_(const std::string& str)
+  {
+    // If nesting timers, break the newline of the previous one
+    if(s_openNewline)
+    {
+      assert(s_nesting > 0);
+      LOGI("\n");
+    }
+
+    m_manualIndent = !str.empty() && (str[0] == ' ' || str[0] == '-' || str[0] == '|');
+
+    // Add indentation automatically if not already in str.
+    if(s_nesting > 0 && !m_manualIndent)
+    {
+      LOGI("%s", indent().c_str());
+    }
+
+    LOGI("%s", str.c_str());
+    s_openNewline = str.empty() || str[str.size() - 1] != '\n';
+    ++s_nesting;
+  }
+  ~ScopedTimer()
+  {
+    --s_nesting;
+    // If nesting timers and this is the second destructor in a row, indent and
+    // print "Total" as it won't be on the same line.
+    if(!s_openNewline && !m_manualIndent)
+    {
+      LOGI("%s|", indent().c_str());
+    }
+    else
+    {
+      LOGI(" ");
+    }
+    LOGI("-> %.3f ms\n", m_stopwatch.elapsed());
+    s_openNewline = false;
+  }
+  static std::string indent()
+  {
+    std::string result(static_cast<size_t>(s_nesting * 2), ' ');
+    for(int i = 0; i < s_nesting * 2; i += 2)
+      result[i] = '|';
+    return result;
+  }
+  nvh::Stopwatch                  m_stopwatch;
+  bool                            m_manualIndent = false;
+  static inline thread_local int  s_nesting      = 0;
+  static inline thread_local bool s_openNewline  = false;
+};
+
+}  // namespace nvh
--- a/raytracer/nvpro_core/nvh/trangeallocator.hpp
+++ b/raytracer/nvpro_core/nvh/trangeallocator.hpp
@ -0,0 +1,553 @@
+/*
+ * Copyright (c) 2019-2021, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2019-2021 NVIDIA CORPORATION
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+
+#pragma once
+
+#include <algorithm>
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>
+
+#include <NvFoundation.h>  // for NV_X86 and NV_X64
+
+#if(defined(NV_X86) || defined(NV_X64)) && defined(_MSC_VER)
+#include <intrin.h>
+#endif
+
+namespace nvh {
+
+/** @DOC_START
+  # class nvh::TRangeAllocator
+
+  The nvh::TRangeAllocator<GRANULARITY> template allows to sub-allocate ranges from a fixed
+  maximum size. Ranges are allocated at GRANULARITY and are merged back on freeing.
+  Its primary use is within allocators that sub-allocate from fixed-size blocks.
+
+  The implementation is based on [MakeID by Emil Persson](http://www.humus.name/3D/MakeID.h).
+
+  Example :
+
+  ```cpp
+  TRangeAllocator<256> range;
+
+  // initialize to a certain range
+  range.init(range.alignedSize(128 * 1024 * 1024));
+
+  ...
+
+  // allocate a sub range
+  // example
+  uint32_t size = vertexBufferSize;
+  uint32_t alignment = vertexAlignment;
+
+  uint32_t allocOffset;
+  uint32_t allocSize;
+  uint32_t alignedOffset;
+
+  if (range.subAllocate(size, alignment, allocOffset, alignedOffset, allocSize)) {
+    ... use the allocation space
+    // [alignedOffset + size] is guaranteed to be within [allocOffset + allocSize]
+  }
+
+  // give back the memory range for re-use
+  range.subFree(allocOffset, allocSize);
+
+  ...
+
+  // at the end cleanup
+  range.deinit();
+  ```
+@DOC_END */
+
+// GRANULARITY must be power of two
+template <uint32_t GRANULARITY = 256>
+class TRangeAllocator
+{
+private:
+  uint32_t m_size;
+  uint32_t m_used;
+
+public:
+  TRangeAllocator()
+      : m_size(0)
+      , m_used(0)
+  {
+  }
+  TRangeAllocator(uint32_t size) { init(size); }
+
+  ~TRangeAllocator() { deinit(); }
+
+  static uint32_t alignedSize(uint32_t size) { return (size + GRANULARITY - 1) & (~(GRANULARITY - 1)); }
+
+  void init(uint32_t size)
+  {
+    assert(size % GRANULARITY == 0 && "managed total size must be aligned to GRANULARITY");
+
+    uint32_t pages = ((size + GRANULARITY - 1) / GRANULARITY);
+    rangeInit(pages - 1);
+    m_used = 0;
+    m_size = size;
+  }
+  void deinit() { rangeDeinit(); }
+
+  bool isEmpty() const { return m_used == 0; }
+
+  bool isAvailable(uint32_t size, uint32_t align) const
+  {
+    uint32_t alignRest    = align - 1;
+    uint32_t sizeReserved = size;
+
+    if(m_used >= m_size)
+    {
+      return false;
+    }
+
+    if(m_used != 0 && align > GRANULARITY)
+    {
+      sizeReserved += alignRest;
+    }
+
+    uint32_t countReserved = (sizeReserved + GRANULARITY - 1) / GRANULARITY;
+    return isRangeAvailable(countReserved);
+  }
+
+  bool subAllocate(uint32_t size, uint32_t align, uint32_t& outOffset, uint32_t& outAligned, uint32_t& outSize)
+  {
+    if(align == 0)
+    {
+      align = 1;
+    }
+    uint32_t alignRest    = align - 1;
+    uint32_t sizeReserved = size;
+#if(defined(NV_X86) || defined(NV_X64)) && defined(_MSC_VER)
+    bool alignIsPOT = __popcnt(align) == 1;
+#else
+    bool alignIsPOT = __builtin_popcount(align) == 1;
+#endif
+
+    if(m_used >= m_size)
+    {
+      outSize    = 0;
+      outOffset  = 0;
+      outAligned = 0;
+      return false;
+    }
+
+    if(m_used != 0 && (alignIsPOT ? (align > GRANULARITY) : ((alignRest + size) > GRANULARITY)))
+    {
+      sizeReserved += alignRest;
+    }
+
+    uint32_t countReserved = (sizeReserved + GRANULARITY - 1) / GRANULARITY;
+
+    uint32_t startID;
+    if(createRangeID(startID, countReserved))
+    {
+      outOffset  = startID * GRANULARITY;
+      outAligned = ((outOffset + alignRest) / align) * align;
+
+      // due to custom alignment, we may be able to give
+      // pages back that we over-allocated
+      //
+      // reserved:   [     |     |     |     ] (GRANULARITY spacing)
+      // used:                [      ]         (custom alignment/size)
+      // corrected:        [     |     ]       (GRANULARITY spacing)
+
+      // correct start (warning could yield more fragmentation)
+
+      uint32_t skipFront = (outAligned - outOffset) / GRANULARITY;
+      if(skipFront)
+      {
+        destroyRangeID(startID, skipFront);
+        outOffset += skipFront * GRANULARITY;
+        startID += skipFront;
+        countReserved -= skipFront;
+      }
+
+      assert(outOffset <= outAligned);
+
+      // correct end
+      uint32_t outLast = alignedSize(outAligned + size);
+      outSize          = outLast - outOffset;
+
+      uint32_t usedCount = outSize / GRANULARITY;
+      assert(usedCount <= countReserved);
+
+      if(usedCount < countReserved)
+      {
+        destroyRangeID(startID + usedCount, countReserved - usedCount);
+      }
+
+      assert((outAligned + size) <= (outOffset + outSize));
+
+      m_used += outSize;
+
+      //checkRanges();
+
+      return true;
+    }
+    else
+    {
+      outSize    = 0;
+      outOffset  = 0;
+      outAligned = 0;
+      return false;
+    }
+  }
+
+  void subFree(uint32_t offset, uint32_t size)
+  {
+    assert(offset % GRANULARITY == 0);
+    assert(size % GRANULARITY == 0);
+
+    m_used -= size;
+    destroyRangeID(offset / GRANULARITY, size / GRANULARITY);
+
+    //checkRanges();
+  }
+
+  TRangeAllocator& operator=(const TRangeAllocator& other)
+  {
+    m_size = other.m_size;
+    m_used = other.m_used;
+
+    m_Ranges   = other.m_Ranges;
+    m_Count    = other.m_Count;
+    m_Capacity = other.m_Capacity;
+    m_MaxID    = other.m_MaxID;
+
+    if(m_Ranges)
+    {
+      m_Ranges = static_cast<Range*>(::malloc(m_Capacity * sizeof(Range)));
+      memcpy(m_Ranges, other.m_Ranges, m_Capacity * sizeof(Range));
+    }
+
+    return *this;
+  }
+
+  TRangeAllocator(const TRangeAllocator& other)
+  {
+    m_size = other.m_size;
+    m_used = other.m_used;
+
+    m_Ranges   = other.m_Ranges;
+    m_Count    = other.m_Count;
+    m_Capacity = other.m_Capacity;
+    m_MaxID    = other.m_MaxID;
+
+    if(m_Ranges)
+    {
+      m_Ranges = static_cast<Range*>(::malloc(m_Capacity * sizeof(Range)));
+      assert(m_Ranges);  // Make sure allocation succeeded
+      memcpy(m_Ranges, other.m_Ranges, m_Capacity * sizeof(Range));
+    }
+  }
+
+  TRangeAllocator& operator=(TRangeAllocator&& other)
+  {
+    m_size = other.m_size;
+    m_used = other.m_used;
+
+    m_Ranges   = other.m_Ranges;
+    m_Count    = other.m_Count;
+    m_Capacity = other.m_Capacity;
+    m_MaxID    = other.m_MaxID;
+
+    other.m_Ranges = nullptr;
+
+    return *this;
+  }
+
+  TRangeAllocator(TRangeAllocator&& other)
+  {
+    m_size = other.m_size;
+    m_used = other.m_used;
+
+    m_Ranges   = other.m_Ranges;
+    m_Count    = other.m_Count;
+    m_Capacity = other.m_Capacity;
+    m_MaxID    = other.m_MaxID;
+
+    other.m_Ranges = nullptr;
+  }
+
+private:
+  //////////////////////////////////////////////////////////////////////////
+  // most of the following code is taken from Emil Persson's MakeID
+  // http://www.humus.name/3D/MakeID.h (v1.02)
+
+  struct Range
+  {
+    uint32_t m_First;
+    uint32_t m_Last;
+  };
+
+  Range*   m_Ranges   = nullptr;  // Sorted array of ranges of free IDs
+  uint32_t m_Count    = 0;        // Number of ranges in list
+  uint32_t m_Capacity = 0;        // Total capacity of range list
+  uint32_t m_MaxID    = 0;
+
+public:
+  void rangeInit(const uint32_t max_id)
+  {
+    // Start with a single range, from 0 to max allowed ID (specified)
+    m_Ranges = static_cast<Range*>(::malloc(sizeof(Range)));
+    assert(m_Ranges != nullptr);  // Make sure allocation succeeded
+    m_Ranges[0].m_First = 0;
+    m_Ranges[0].m_Last  = max_id;
+    m_Count             = 1;
+    m_Capacity          = 1;
+    m_MaxID             = max_id;
+  }
+
+  void rangeDeinit()
+  {
+    if(m_Ranges)
+    {
+      ::free(m_Ranges);
+      m_Ranges = nullptr;
+    }
+  }
+
+  bool createID(uint32_t& id)
+  {
+    if(m_Ranges[0].m_First <= m_Ranges[0].m_Last)
+    {
+      id = m_Ranges[0].m_First;
+
+      // If current range is full and there is another one, that will become the new current range
+      if(m_Ranges[0].m_First == m_Ranges[0].m_Last && m_Count > 1)
+      {
+        destroyRange(0);
+      }
+      else
+      {
+        ++m_Ranges[0].m_First;
+      }
+      return true;
+    }
+
+    // No availble ID left
+    return false;
+  }
+
+  bool createRangeID(uint32_t& id, const uint32_t count)
+  {
+    uint32_t i = 0;
+    do
+    {
+      const uint32_t range_count = 1 + m_Ranges[i].m_Last - m_Ranges[i].m_First;
+      if(count <= range_count)
+      {
+        id = m_Ranges[i].m_First;
+
+        // If current range is full and there is another one, that will become the new current range
+        if(count == range_count && i + 1 < m_Count)
+        {
+          destroyRange(i);
+        }
+        else
+        {
+          m_Ranges[i].m_First += count;
+        }
+        return true;
+      }
+      ++i;
+    } while(i < m_Count);
+
+    // No range of free IDs was large enough to create the requested continuous ID sequence
+    return false;
+  }
+
+  bool destroyID(const uint32_t id) { return destroyRangeID(id, 1); }
+
+  bool destroyRangeID(const uint32_t id, const uint32_t count)
+  {
+    const uint32_t end_id = id + count;
+
+    assert(end_id <= m_MaxID + 1);
+
+    // Binary search of the range list
+    uint32_t i0 = 0;
+    uint32_t i1 = m_Count - 1;
+
+    for(;;)
+    {
+      const uint32_t i = (i0 + i1) / 2;
+
+      if(id < m_Ranges[i].m_First)
+      {
+        // Before current range, check if neighboring
+        if(end_id >= m_Ranges[i].m_First)
+        {
+          if(end_id != m_Ranges[i].m_First)
+            return false;  // Overlaps a range of free IDs, thus (at least partially) invalid IDs
+
+          // Neighbor id, check if neighboring previous range too
+          if(i > i0 && id - 1 == m_Ranges[i - 1].m_Last)
+          {
+            // Merge with previous range
+            m_Ranges[i - 1].m_Last = m_Ranges[i].m_Last;
+            destroyRange(i);
+          }
+          else
+          {
+            // Just grow range
+            m_Ranges[i].m_First = id;
+          }
+          return true;
+        }
+        else
+        {
+          // Non-neighbor id
+          if(i != i0)
+          {
+            // Cull upper half of list
+            i1 = i - 1;
+          }
+          else
+          {
+            // Found our position in the list, insert the deleted range here
+            insertRange(i);
+            m_Ranges[i].m_First = id;
+            m_Ranges[i].m_Last  = end_id - 1;
+            return true;
+          }
+        }
+      }
+      else if(id > m_Ranges[i].m_Last)
+      {
+        // After current range, check if neighboring
+        if(id - 1 == m_Ranges[i].m_Last)
+        {
+          // Neighbor id, check if neighboring next range too
+          if(i < i1 && end_id == m_Ranges[i + 1].m_First)
+          {
+            // Merge with next range
+            m_Ranges[i].m_Last = m_Ranges[i + 1].m_Last;
+            destroyRange(i + 1);
+          }
+          else
+          {
+            // Just grow range
+            m_Ranges[i].m_Last += count;
+          }
+          return true;
+        }
+        else
+        {
+          // Non-neighbor id
+          if(i != i1)
+          {
+            // Cull bottom half of list
+            i0 = i + 1;
+          }
+          else
+          {
+            // Found our position in the list, insert the deleted range here
+            insertRange(i + 1);
+            m_Ranges[i + 1].m_First = id;
+            m_Ranges[i + 1].m_Last  = end_id - 1;
+            return true;
+          }
+        }
+      }
+      else
+      {
+        // Inside a free block, not a valid ID
+        return false;
+      }
+    }
+  }
+
+  bool isRangeAvailable(uint32_t searchCount) const
+  {
+    uint32_t i = 0;
+    do
+    {
+      uint32_t count = m_Ranges[i].m_Last - m_Ranges[i].m_First + 1;
+      if(count >= searchCount)
+        return true;
+
+      ++i;
+    } while(i < m_Count);
+
+    return false;
+  }
+
+  void printRanges() const
+  {
+    uint32_t i = 0;
+    for(;;)
+    {
+      if(m_Ranges[i].m_First < m_Ranges[i].m_Last)
+        printf("%u-%u", m_Ranges[i].m_First, m_Ranges[i].m_Last);
+      else if(m_Ranges[i].m_First == m_Ranges[i].m_Last)
+        printf("%u", m_Ranges[i].m_First);
+      else
+        printf("-");
+
+      ++i;
+      if(i >= m_Count)
+      {
+        printf("\n");
+        return;
+      }
+
+      printf(", ");
+    }
+  }
+
+  void checkRanges() const
+  {
+    for(uint32_t i = 0; i < m_Count; i++)
+    {
+      assert(m_Ranges[i].m_Last <= m_MaxID);
+
+      if(m_Ranges[i].m_First == m_Ranges[i].m_Last + 1)
+      {
+        continue;
+      }
+      assert(m_Ranges[i].m_First <= m_Ranges[i].m_Last);
+      assert(m_Ranges[i].m_First <= m_MaxID);
+    }
+  }
+
+  void insertRange(const uint32_t index)
+  {
+    if(m_Count >= m_Capacity)
+    {
+      m_Capacity += m_Capacity;
+      m_Ranges = (Range*)realloc(m_Ranges, m_Capacity * sizeof(Range));
+      assert(m_Ranges);  // Make sure reallocation succeeded
+    }
+
+    ::memmove(m_Ranges + index + 1, m_Ranges + index, (m_Count - index) * sizeof(Range));
+    ++m_Count;
+  }
+
+  void destroyRange(const uint32_t index)
+  {
+    --m_Count;
+    ::memmove(m_Ranges + index, m_Ranges + index + 1, (m_Count - index) * sizeof(Range));
+  }
+};
+
+}  // namespace nvh