Depth estimation

Depth estimation with event cameras is possible by applying the same approach of disparity calculation on a calibrated stereo camera rig. The straightforward approach is to accumulate frames from events on both cameras and use the same disparity estimation algorithm. This approach might have some limitations, since accumulating events might result in suboptimal results due to low texture available in an accumulated frame.

The dv-processing library provides the dv::camera::StereoGeometry and a few disparity estimation algorithms that, in combination, can be used to build a depth estimation pipeline.

Semi-dense stereo block matching

Dense block matching here refers to the most straightforward approach: accumulating full frames and running a conventional disparity estimation on top to estimate depth. Since the accumulated frames only contain limited texture due to pixels reacting to brightness changes - this approach is referred to as semi-dense. The SemiDenseStereoMatcher class wraps the disparity estimation part, where estimated disparity can be used to calculate depth with dv::camera::StereoGeometry.

Following sample code show the use of SemiDenseStereoMatcher with dv::camera::StereoGeometry to run a real-time depth estimation pipeline on a calibration stereo camera.

#include <dv-processing/camera/calibration_set.hpp>
#include <dv-processing/core/stereo_event_stream_slicer.hpp>
#include <dv-processing/depth/semi_dense_stereo_matcher.hpp>
#include <dv-processing/io/stereo_capture.hpp>
#include <dv-processing/noise/background_activity_noise_filter.hpp>

#include <opencv2/highgui.hpp>

int main() {
    using namespace std::chrono_literals;

    // Path to a stereo calibration file, replace with a file path on your local file system
    const std::string calibrationFilePath = "path/to/calibration.json";

    // Load the calibration file
    auto calibration = dv::camera::CalibrationSet::LoadFromFile(calibrationFilePath);

    // It is expected that calibration file will have "C0" as the leftEventBuffer camera
    auto leftCamera = calibration.getCameraCalibration("C0").value();

    // The second camera is assumed to be rightEventBuffer-side camera
    auto rightCamera = calibration.getCameraCalibration("C1").value();

    // Open the stereo camera with camera names from calibration
    dv::io::StereoCapture capture(leftCamera.name, rightCamera.name);

    // Make sure both cameras support event stream output, throw an error otherwise
    if (!capture.left.isEventStreamAvailable() || !capture.right.isEventStreamAvailable()) {
        throw dv::exceptions::RuntimeError("Input camera does not provide an event stream.");
    }

    // Initialize a stereo block matcher with a stereo geometry from calibration and the preconfigured SGBM instance
    dv::SemiDenseStereoMatcher blockMatcher(std::make_unique<dv::camera::StereoGeometry>(leftCamera, rightCamera));

    // Initialization of a stereo event sliver
    dv::StereoEventStreamSlicer slicer;

    // Initialize a window to show previews of the output
    cv::namedWindow("Preview", cv::WINDOW_NORMAL);

    // Local event buffers to implement overlapping window of events for accumulation
    dv::EventStore leftEventBuffer, rightEventBuffer;

    // Use one third of the resolution as count of events per accumulated frame
    const size_t eventCount = static_cast<size_t>(leftCamera.resolution.area()) / 3;

    // Register a callback to be done at 30Hz
    slicer.doEveryTimeInterval(33ms, [&blockMatcher, &leftEventBuffer, &rightEventBuffer, eventCount](
                                         const auto &leftEvents, const auto &rightEvents) {
        // Push input events into the local buffers
        leftEventBuffer.add(leftEvents);
        rightEventBuffer.add(rightEvents);

        // If the number of events is above the count, just keep the latest events
        if (leftEventBuffer.size() > eventCount) {
            leftEventBuffer = leftEventBuffer.sliceBack(eventCount);
        }
        if (rightEventBuffer.size() > eventCount) {
            rightEventBuffer = rightEventBuffer.sliceBack(eventCount);
        }

        // Pass these events into block matcher and estimate disparity, the matcher will accumulate frames
        // internally. The disparity output is 16-bit integer, that has sub-pixel precision.
        const auto disparity = blockMatcher.computeDisparity(leftEventBuffer, rightEventBuffer);

        // Convert disparity into 8-bit integers with scaling and normalize the output for a nice preview.
        // This loses the actual numeric value of the disparity, but it's a nice way to visualize the disparity.
        cv::Mat disparityU8, disparityColored;
        disparity.convertTo(disparityU8, CV_8UC1, 1.0 / 16.0);
        cv::normalize(disparityU8, disparityU8, 0, 255, cv::NORM_MINMAX);

        // Convert the accumulated frames into colored images for preview.
        std::vector<cv::Mat> images(3);
        cv::cvtColor(blockMatcher.getLeftFrame().image, images[0], cv::COLOR_GRAY2BGR);
        cv::cvtColor(blockMatcher.getRightFrame().image, images[1], cv::COLOR_GRAY2BGR);

        // Apply color-mapping to the disparity image, this will encode depth with color: red - close; blue - far.
        cv::applyColorMap(disparityU8, images[2], cv::COLORMAP_JET);

        // Concatenate images and show them in a window
        cv::Mat preview;
        cv::hconcat(images, preview);
        cv::imshow("Preview", preview);
    });

    // Buffer input events in these variables to synchronize inputs
    std::optional<dv::EventStore> leftEvents  = std::nullopt;
    std::optional<dv::EventStore> rightEvents = std::nullopt;

    // Run the processing loop while both cameras are connected
    while (capture.left.isRunning() && capture.right.isRunning()) {
        // Read events from respective left / right cameras
        if (!leftEvents.has_value()) {
            leftEvents = capture.left.getNextEventBatch();
        }
        if (!rightEvents.has_value()) {
            rightEvents = capture.right.getNextEventBatch();
        }

        // Feed the data into the slicer and reset the buffer
        if (leftEvents && rightEvents) {
            slicer.accept(*leftEvents, *rightEvents);
            leftEvents  = std::nullopt;
            rightEvents = std::nullopt;
        }

        // Wait for a small amount of time to avoid CPU overhaul
        cv::waitKey(1);
    }

    return 0;
}

import dv_processing as dv
import cv2 as cv
from datetime import timedelta

# Path to a stereo calibration file, replace with a file path on your local file system
calibration_file_path = "path/to/calibration.json"

# Load the calibration file
calibration = dv.camera.CalibrationSet.LoadFromFile(calibration_file_path)

# It is expected that calibration file will have "C0" as the leftEventBuffer camera
left_camera = calibration.getCameraCalibration("C0")

# The second camera is assumed to be rightEventBuffer-side camera
right_camera = calibration.getCameraCalibration("C1")

# Open the stereo camera with camera names from calibration
capture = dv.io.StereoCapture(left_camera.name, right_camera.name)

# Make sure both cameras support event stream output, throw an error otherwise
if not capture.left.isEventStreamAvailable() or not capture.right.isEventStreamAvailable():
    raise RuntimeError("Input camera does not provide an event stream.")

# Initialize a stereo block matcher with a stereo geometry from calibration and the preconfigured SGBM instance
block_matcher = dv.SemiDenseStereoMatcher(dv.camera.StereoGeometry(left_camera, right_camera))

# Initialization of a stereo event sliver
slicer = dv.StereoEventStreamSlicer()

# Initialize a window to show previews of the output
cv.namedWindow("Preview", cv.WINDOW_NORMAL)

# Local event buffers to implement overlapping window of events for accumulation
global left_event_buffer, right_event_buffer
left_event_buffer = dv.EventStore()
right_event_buffer = dv.EventStore()

# Use one third of the resolution as count of events per accumulated frame
event_count = int((left_camera.resolution[0] * left_camera.resolution[1]) / 3)


# Stereo slicer callback method
def callback(left_events: dv.EventStore, right_events: dv.EventStore):
    # Push input events into the local buffers
    global left_event_buffer, right_event_buffer
    left_event_buffer.add(left_events)
    right_event_buffer.add(right_events)

    # If the number of events is above the count, just keep the latest events
    if len(left_event_buffer) > event_count:
        left_event_buffer = left_event_buffer.sliceBack(event_count)
    if len(right_event_buffer) > event_count:
        right_event_buffer = right_event_buffer.sliceBack(event_count)

    # Pass these events into block matcher and estimate disparity, the matcher will accumulate frames
    # internally. The disparity output is 16-bit integer, that has sub-pixel precision.
    disparity = block_matcher.computeDisparity(left_event_buffer, right_event_buffer)

    # Convert the accumulated frames into colored images for preview.
    images = []
    images.append(cv.cvtColor(block_matcher.getLeftFrame().image, cv.COLOR_GRAY2BGR))
    images.append(cv.cvtColor(block_matcher.getRightFrame().image, cv.COLOR_GRAY2BGR))

    # Convert disparity into 8-bit integers with scaling and normalize the output for a nice preview.
    # This loses the actual numeric value of the disparity, but it's a nice way to visualize the disparity.
    # Apply color-mapping to the disparity image, this will encode depth with color: red - close; blue - far.
    images.append(cv.applyColorMap(cv.normalize(disparity, None, 0, 255, cv.NORM_MINMAX, cv.CV_8UC1), cv.COLORMAP_JET))

    # Concatenate images and show them in a window
    cv.imshow("Preview", cv.hconcat(images))


# Register a callback to be done at 30Hz
slicer.doEveryTimeInterval(timedelta(milliseconds=33), callback)

# Buffer input events in these variables to synchronize inputs
left_events = None
right_events = None

# Run the processing loop while both cameras are connected
while capture.left.isRunning() and capture.right.isRunning():
    # Read events from respective left / right cameras
    if left_events is None:
        left_events = capture.left.getNextEventBatch()
    if right_events is None:
        right_events = capture.right.getNextEventBatch()

    # Feed the data into the slicer and reset the buffer
    if left_events is not None and right_events is not None:
        slicer.accept(left_events, right_events)
        left_events = None
        right_events = None

    #  Wait for a small amount of time to avoid CPU overhaul
    cv.waitKey(1)

_images/semi-dense.png — Expected result of semi-dense disparity estimation. The output provides two accumulated frames and color-coded disparity map.

Note

Disparity map yields results only in areas with visible texture, areas without texture contain speckle noise.

Sparse disparity estimation

The semi-dense appraoch is a straightforward to stereo disparity estimation. Another approach is to perform disparity estimation on sparse selected regions within accumulated image. Sparse estimation approach allows the implementation to select regions with enough texture to be selected for the disparity, reducing computational complexity and improving quality. The sparse approach takes point coordinates of where the disparity needs to be estimated, performs sparse accumulation only in the regions where disparity matching actually needs to happen and runs correlation based template matching of left image patches on the right camera image. Each template is matched against the other image on a horizontal line using normalized correlation coefficient (Pearson correlation) and the best scoring match is considered to be the correct match and according disparity is assigned to that point.

The following sample code shows the use of sparse disparity block matcher with a live calibrated stereo camera:

#include <dv-processing/camera/calibration_set.hpp>
#include <dv-processing/cluster/mean_shift/event_store_adaptor.hpp>
#include <dv-processing/core/stereo_event_stream_slicer.hpp>
#include <dv-processing/data/utilities.hpp>
#include <dv-processing/depth/sparse_event_block_matcher.hpp>
#include <dv-processing/io/stereo_capture.hpp>
#include <dv-processing/visualization/colors.hpp>

#include <opencv2/highgui.hpp>

int main() {
    using namespace std::chrono_literals;

    // Path to a stereo calibration file, replace with a file path on your local file system
    const std::string calibrationFilePath = "path/to/calibration.json";

    // Load the calibration file
    auto calibration = dv::camera::CalibrationSet::LoadFromFile(calibrationFilePath);

    // It is expected that calibration file will have "C0" as the leftEventBuffer camera
    auto leftCamera = calibration.getCameraCalibration("C0").value();

    // The second camera is assumed to be rightEventBuffer-side camera
    auto rightCamera = calibration.getCameraCalibration("C1").value();

    // Open the stereo camera with camera names from calibration
    dv::io::StereoCapture capture(leftCamera.name, rightCamera.name);

    // Make sure both cameras support event stream output, throw an error otherwise
    if (!capture.left.isEventStreamAvailable() || !capture.right.isEventStreamAvailable()) {
        throw dv::exceptions::RuntimeError("Input camera does not provide an event stream.");
    }

    // Matching window size for the block matcher
    const cv::Size window(24, 24);
    // Minimum disparity value to measure
    const int minDisparity = 0;
    // Maximum disparity value
    const int maxDisparity = 40;
    // Minimum z-score value that a valid match can have
    const float minScore = 0.0f;

    // Initialize the block matcher with rectification
    auto matcher = dv::SparseEventBlockMatcher(std::make_unique<dv::camera::StereoGeometry>(leftCamera, rightCamera),
        window, maxDisparity, minDisparity, minScore);

    // Initialization of a stereo event sliver
    dv::StereoEventStreamSlicer slicer;

    // Initialize a window to show previews of the output
    cv::namedWindow("Preview", cv::WINDOW_NORMAL);

    // Local event buffers to implement overlapping window of events for accumulation
    dv::EventStore leftEventBuffer, rightEventBuffer;

    // Use one third of the resolution as count of events per accumulated frame
    const size_t eventCount = static_cast<size_t>(leftCamera.resolution.area()) / 3;

    // Register a callback to be done at 50Hz
    slicer.doEveryTimeInterval(20ms, [&matcher, &leftEventBuffer, &rightEventBuffer, eventCount, &window](
                                         const auto &leftEvents, const auto &rightEvents) {
        // Push input events into the local buffers
        leftEventBuffer.add(leftEvents);
        rightEventBuffer.add(rightEvents);

        // If the number of events is above the count, just keep the latest events
        if (leftEventBuffer.size() > eventCount) {
            leftEventBuffer = leftEventBuffer.sliceBack(eventCount);
        }
        if (rightEventBuffer.size() > eventCount) {
            rightEventBuffer = rightEventBuffer.sliceBack(eventCount);
        }

        // Number of clusters to extract
        constexpr int numClusters = 100;

        // Initialize the mean-shift clustering algorithm
        dv::cluster::mean_shift::MeanShiftEventStoreAdaptor meanShift(leftEventBuffer, 10.f, 1.0f, 20, numClusters);

        // Find cluster centers which are going to be used for disparity estimation
        auto centers = meanShift.findClusterCentres<dv::cluster::mean_shift::kernel::Epanechnikov>();

        // Run disparity estimation, the output will contain a disparity estimate for each of the given points.
        const std::vector<dv::SparseEventBlockMatcher::PixelDisparity> estimates
            = matcher.computeDisparitySparse(leftEventBuffer, rightEventBuffer, dv::data::convertToCvPoints(centers));

        // Convert the accumulated frames into colored images for preview.
        std::vector<cv::Mat> images(2);
        cv::cvtColor(matcher.getLeftFrame().image, images[0], cv::COLOR_GRAY2BGR);
        cv::cvtColor(matcher.getRightFrame().image, images[1], cv::COLOR_GRAY2BGR);

        // Visualize the matched blocks
        int32_t index = 0;
        for (const auto &point : estimates) {
            // If point estimation is invalid, do not show a preview of it
            if (!point.valid) {
                continue;
            }

            // The rest of the code performs drawing of the match according to the disparity value on the
            // preview images.
            const cv::Scalar color = dv::visualization::colors::someNeonColor(index++);
            // Draw some nice colored markers and rectangles.
            cv::drawMarker(images[1], *point.matchedPosition, color, cv::MARKER_CROSS, 7);
            cv::rectangle(images[1],
                cv::Rect(point.matchedPosition->x - (window.width / 2), point.matchedPosition->y - (window.height / 2),
                    window.width, window.height),
                color);
            cv::rectangle(images[0],
                cv::Rect(point.templatePosition->x - (window.width / 2),
                    point.templatePosition->y - (window.height / 2), window.width, window.height),
                color);
        }

        // Concatenate images and show them in a window
        cv::Mat preview;
        cv::hconcat(images, preview);
        cv::imshow("Preview", preview);
    });

    // Buffer input events in these variables to synchronize inputs
    std::optional<dv::EventStore> leftEvents  = std::nullopt;
    std::optional<dv::EventStore> rightEvents = std::nullopt;

    // Run the processing loop while both cameras are connected
    while (capture.left.isRunning() && capture.right.isRunning()) {
        // Read events from respective left / right cameras
        if (!leftEvents.has_value()) {
            leftEvents = capture.left.getNextEventBatch();
        }
        if (!rightEvents.has_value()) {
            rightEvents = capture.right.getNextEventBatch();
        }

        // Feed the data into the slicer and reset the buffer
        if (leftEvents && rightEvents) {
            slicer.accept(*leftEvents, *rightEvents);
            leftEvents  = std::nullopt;
            rightEvents = std::nullopt;
        }

        // Wait for a small amount of time to avoid CPU overhaul
        cv::waitKey(1);
    }

    return 0;
}

import dv_processing as dv
import cv2 as cv
from datetime import timedelta

# Path to a stereo calibration file, replace with a file path on your local file system
calibration_file_path = "path/to/calibration.json"

# Load the calibration file
calibration = dv.camera.CalibrationSet.LoadFromFile(calibration_file_path)

# It is expected that calibration file will have "C0" as the leftEventBuffer camera
left_camera = calibration.getCameraCalibration("C0")

# The second camera is assumed to be rightEventBuffer-side camera
right_camera = calibration.getCameraCalibration("C1")

# Open the stereo camera with camera names from calibration
capture = dv.io.StereoCapture(left_camera.name, right_camera.name)

# Make sure both cameras support event stream output, throw an error otherwise
if not capture.left.isEventStreamAvailable() or not capture.right.isEventStreamAvailable():
    raise RuntimeError("Input camera does not provide an event stream.")

# Matching window size for the block matcher
window = (24, 24)

# Minimum disparity value to measure
min_disparity = 0

# Maximum disparity value
max_disparity = 40

# Minimum z-score value that a valid match can have
min_score = 0.0

# Initialize the block matcher with rectification
matcher = dv.SparseEventBlockMatcher(dv.camera.StereoGeometry(left_camera, right_camera), window, max_disparity,
                                     min_disparity, min_score)

# Initialization of a stereo event sliver
slicer = dv.StereoEventStreamSlicer()

# Initialize a window to show previews of the output
cv.namedWindow("Preview", cv.WINDOW_NORMAL)

# Local event buffers to implement overlapping window of events for accumulation
global left_event_buffer, right_event_buffer
left_event_buffer = dv.EventStore()
right_event_buffer = dv.EventStore()

# Use one third of the resolution as count of events per accumulated frame
event_count = int((left_camera.resolution[0] * left_camera.resolution[1]) / 3)


# Stereo slicer callback method
def callback(left_events: dv.EventStore, right_events: dv.EventStore):
    # Push input events into the local buffers
    global left_event_buffer, right_event_buffer
    left_event_buffer.add(left_events)
    right_event_buffer.add(right_events)

    # If the number of events is above the count, just keep the latest events
    if len(left_event_buffer) > event_count:
        left_event_buffer = left_event_buffer.sliceBack(event_count)
    if len(right_event_buffer) > event_count:
        right_event_buffer = right_event_buffer.sliceBack(event_count)

    # Number of clusters to extract
    num_clusters = 100

    # Initialize the mean-shift clustering algorithm
    mean_shift = dv.cluster.mean_shift.MeanShiftEventStoreAdaptor(left_event_buffer, 10, 1, 20, num_clusters)

    # Find cluster centers which are going to be used for disparity estimation
    centers = mean_shift.findClusterCentresEpanechnikov()

    # Run disparity estimation, the output will contain a disparity estimate for each of the given points.
    estimates = matcher.computeDisparitySparse(left_event_buffer, right_event_buffer, list(map(lambda x: x.pt,
                                                                                               centers)))

    # Convert the accumulated frames into colored images for preview.
    images = []
    images.append(cv.cvtColor(matcher.getLeftFrame().image, cv.COLOR_GRAY2BGR))
    images.append(cv.cvtColor(matcher.getRightFrame().image, cv.COLOR_GRAY2BGR))

    # Visualize the matched blocks
    index = 0
    for point in estimates:
        # If point estimation is invalid, do not show a preview of it
        if not point.valid:
            continue

        # The rest of the code performs drawing of the match according to the disparity value on the
        # preview images.
        color = dv.visualization.colors.someNeonColor(index)
        index += 1

        # Draw some nice colored markers and rectangles.
        cv.drawMarker(images[1], point.matchedPosition, color, cv.MARKER_CROSS, 7)
        cv.rectangle(images[1],
                     (int(point.matchedPosition[0] - (window[0] / 2)), int(point.matchedPosition[1] - (window[1] / 2))),
                     (int(point.matchedPosition[0] + (window[0] / 2)), int(point.matchedPosition[1] + (window[1] / 2))),
                     color)
        cv.rectangle(
            images[0],
            (int(point.templatePosition[0] - (window[0] / 2)), int(point.templatePosition[1] - (window[1] / 2))),
            (int(point.templatePosition[0] + (window[0] / 2)), int(point.templatePosition[1] + (window[1] / 2))), color)

    # Concatenate images and show them in a window
    cv.imshow("Preview", cv.hconcat(images))


# Register a callback to be done at 30Hz
slicer.doEveryTimeInterval(timedelta(milliseconds=33), callback)

# Buffer input events in these variables to synchronize inputs
left_events = None
right_events = None

# Run the processing loop while both cameras are connected
while capture.left.isRunning() and capture.right.isRunning():
    # Read events from respective left / right cameras
    if left_events is None:
        left_events = capture.left.getNextEventBatch()
    if right_events is None:
        right_events = capture.right.getNextEventBatch()

    # Feed the data into the slicer and reset the buffer
    if left_events is not None and right_events is not None:
        slicer.accept(left_events, right_events)
        left_events = None
        right_events = None

    #  Wait for a small amount of time to avoid CPU overhaul
    cv.waitKey(1)

_images/sparse-disparity.png — Expected result of sparse disparity estimation. The colored rectangles represent sparse blocks that are matched on the right side image. Block colors are matched on both images. Note that frame are sparse as well - the accumulation happens only in relevant areas around points of interest. The points of interest are selected on high density event areas as per mean-shift cluster extraction.