BRDF-NeRF: Neural Radiance Fields with
Optical Satellite Images and BRDF Modelling
Abstract
Understanding the anisotropic reflectance of complex Earth’s surfaces from satellite images is essential for many applications. Neural radiance fields (NeRF) have gained popularity as a learning technique capable of inferring the bidirectional reflectance distribution function (BRDF) of a scene from a set of images. However, previous research has mainly focused on applying NeRF to close-range images, estimating simple Microfacet BRDF models, which are inadequate for most Earth surfaces. Additionally, high-quality NeRFs typically require dozens of images acquired simultaneously, a rare scenario in satellite imagery. To overcome these challenges, we introduce BRDF-NeRF, designed to explicitly estimate the Rahman-Pinty-Verstraete (RPV) model, a semi-empirical BRDF model widely used in remote sensing. We evaluate our method on two datasets: (1) Djibouti, captured in a single epoch at different viewing angles and for a fixed Sun position, and (2) Lanzhou, captured at different epochs with varying viewing angles and Sun positions. Our work, using only three or four satellite images for training, shows that BRDF-NeRF can successfully synthesize new views from directions far from those of the training set, and generate high-quality digital surface models (DSMs).
keywords:
Neural radiance fields, Satellite images, BRDF, Parametric RPV model, Digital surface model1 Introduction
Over the past two decades, significant progress has been made in image processing algorithms. In particular, 3D surface reconstruction has benefited from high-resolution spatial data and algorithms such as semi-global stereo matching (SGM), which can generate detailed surface maps of urban and natural environments (hirschmuller:08:sgm; rosu2015measurement; mpd:06:sgm). However, state-of-the-art approaches still face challenges, particularly with radiometrically heterogeneous surfaces, complex reflectance functions, or diachronic acquisitions. Recent research has attempted to address these challenges by leveraging learning algorithms capable of modelling complex resemblance functions, given appropriate architectures and sufficient training datasets (PSMNet; chebbi2023deepsim; wu2024evaluation). At the same time, a new approach to surface reconstruction has emerged with Neural Radiance Fields (NeRF), which differs from other learning-based methods by operating in a self-supervised manner. It works on single pixels rather than patches, treats non-Lambertian surfaces and generates new synthetic views (Mildenhall20eccv_nerf). Besides, NeRF is capable of estimating the Bidirectional Reflectance Distribution Function (BRDF) of the surface at the same time. Understanding the BRDF of continental surfaces is crucial for a variety of applications, including land cover mapping, assessment of Earth’s radiation budget, climate change studies, vegetation density analysis, and intercalibration of spaceborne sensors (dumont2010high). However, due to the anisotropic nature of the Earth’s reflectance, estimation of the BRDF generally requires numerous angular measurements, which is challenging with a few satellite images.
The aim of this paper is to model the surface’s BRDF explicitly from sparse satellite views, and improve surface reconstruction, particularly over landscapes with anisotropic reflectance characteristics (e.g., bare soil, vegetation). We select NeRF as the algorithmic framework for its capacity to model angle dependent surface reflectance. Our BRDF-NeRF workflow (Figure 1) is designed for satellite acquisitions with only three synchronous views and incorporates the semi-empirical Rahman-Pinty-Verstraete (RPV) BRDF model, widely used in the remote sensing community to represent the BRDF of natural surfaces (rahman1993coupled). To the best of our knowledge, this work is the first to integrate a BRDF model into neural radiance fields for land surfaces. It extends our previous work on neural radiance fields with sparse optical satellite images (zhang2023spsnerf1). The open-source code is available at github.com/LulinZhang/BRDF-NeRF. The terms DSM and surface, as well as SGM and stereo matching are used interchangeably throughout this article.
2 Related works
2.1 Neural Radiance Fields
The vanilla NeRF (Mildenhall20eccv_nerf) leverages a large number of images captured with a pinhole camera to represent small-size scenes as particles that emit light, instead of reflecting light. Subsequent NeRF variants proposed to relax some of those defining constraints, without compromising the quality of the outputs, i.e., the synthesised images and the 3D model. In the following paragraphs we briefly discuss the state-of-the-art approaches relevant to our work.
NeRF from few views
Since NeRF relies solely on pixel values for network training, a large number of input images is essential for generating photo-realistic novel views. Attempts to train NeRF with sparse input images often result in overfitting and inaccurate estimation of scene depth, leading to artifacts in the rendered novel views. This limitation restricts the applicability of NeRF and prolongs training time. To address this challenge, efforts have been made to adapt NeRF to sparse input images by introducing various regularization priors. A common approach is to incorporate depth supervision, including sparse depth (deng2022depth; wang2022sparsenerf; Somraj_2023; guo2024depthguided) and dense depth (wei2021nerfingmvs; roessle2022dense; zhang2023spsnerf1). In addition, methods such as image features (yu2021pixelnerf) or semantic regularization (xu2022sinnerf) have also been explored.
NeRF and BRDF
While vanilla NeRF excels in view synthesis, it cannot relight or edit materials, due to its inability to decompose outgoing radiance into incoming radiance and surface material reflectance. Some researchers proposed to extend NeRF to incorporate information on the Bidirectional Reflectance Distribution Function (BRDF), which characterises how materials reflect light under different viewing and lighting conditions. The majority of BRDF-compatible NeRF variants, such as those proposed by (bi2020neural; srinivasan2020nerv; boss2021nerd; yang2022psnerf; verbin2022refnerf; mai2023neural), adopt some version of the microfacet BRDF model (walter2007microfacet). This model represents reflectance as the superposition of diffuse and specular components and typically includes a surface roughness parameter, influencing the appearance of the surface through a distribution of microfacet orientations. However, while microfacet BRDF models offer effective parameterization, they poorly model BRDF of natural surfaces (e.g., soil, vegetation) which exhibit a more or less marked backscattering behavior (hotspot effect). Additionally, data-driven BRDFs pre-trained on BRDF databases have been explored in NeRF (Zhang_2021). However, the databases consist of artificial materials and have been built in controlled environments. The spectral and directional optical properties of natural materials are often very different. Figure 2 shows the scattering patterns of the Lambertian, Microfacet and RPV models. The latter is an anisotropic BRDF model widely used in remote sensing and which we will use in this work.
NeRF in Earth Observations
Earth observation community has mainly focused on adapting the initial NeRF’s design to meet the specificities of space imagery: changing shadows, dynamic scene due to asynchronous acquisitions, as well as sparse views. Shadow-NeRF (derksen2021shadow) pioneered the application of NeRFs to satellite images, where the authors have explicitly modelled the shadows of the scene by leveraging the Sun’s direction. Sat-NeRF(mari2022sat) extends Shadow-NeRF by replacing the pinhole camera model with an empirical push broom model (i.e., Rational Polynomial Coefficients) and modelling transient objects in the scene such as moving cars. EO-NeRF (Mari_2023_CVPR) employs a novel geometry-based shadow rendering, resulting in more accurate digital surface models (DSMs). SpS-NeRF (zhang2023spsnerf1) further adapted NeRF for scenarios with few satellite views by introducing spatial guidance within NeRF sampling, conditioned on low-resolution input depths. Sat-Mesh (qu2023sat) used a latent vector to deal with inconsistent appearances in satellite imagery, while SUNDIAL (behari2024sundial) proposed a secondary shadow ray casting technique to jointly learn satellite scene geometry, illumination components and Sun direction. SatensoRF (zhang2024satensorf) decomposed colour into ambient, diffuse and specular light. Season-NeRF (gableman2024incorporating) learned and rendered seasonal variations by incorporating time as an additional input variable. GC-NeRF (wan2024constraining) proposed a geometric loss to create a compact weight distribution around the surface. In addition, RS-NeRF (xie2023remote), SAT-NGP (billouard2024sat) and SatensoRF (zhang2024satensorf) addressed the computational efficiency of NeRF by accelerating the runtimes with hash encoding and voxel occupancy grids to sample points near the surface, as well as through tensor decomposition. Last but not least, Radar fields (ehret2023radar) extended NeRF to spaceborne synthetic aperture radar (SAR) images.
2.2 BRDF in remote sensing
Existing BRDF models
Numerous BRDF models have been developed to describe the spectral and directional reflectance of natural and artificial surfaces. They can be classified into physical, empirical and semi-empirical models. Physical models (pinty1991extracting) are based on rigorously defined physical parameters and offer the most accurate descriptions of observed scenes. However, a large number of multiangular observations are required to retrieve these parameters by model inversion, making them impractical for optically complex surfaces. Empirical models (walthall1985simple; minnaert1941reciprocity; shibayama1985view) are derived as simple statistical fits to observed data, and provide no additional insights into the surface type or structure. Semi-empirical models (rahman1993coupled; wanner1995derivation; hapke1981bidirectional; roujean1992bidirectional; lucht2000algorithm) employ specific mathematical functions to best represent the physical interactions between the radiation field and the surface. They accept a reduced number of parameters, which facilitate their inversion. The semi-empirical RPV model (rahman1993coupled) is among the most commonly used. It is capable of representing the reflectance of various natural surfaces with just four parameters (Figure 2), and has been used to address atmospheric radiation transfer problems (martonchik1998techniques; martonchik1998determination); classify forest types (koukal2014evaluation); simulate plant leaves reflectance (biliouris2009rpv); estimate BRF values under unmeasured illumination and viewing angles (lattanzio2007consistency); estimate surface albedo (martonchik1998techniques; martonchik1998determination; privette2002first); and identify surface properties (widlowski2001characterization; gao2003detecting).
Deriving BRDF
Most of the parameters controlling BRDF cannot be measured in the field, but are obtained by invertion of surface reflectance models on observations. To guarantee reliable estimates, the surface must be observed over a wide range of illumination/viewing angles. Laboratory and field measurements using goniophotometers have traditionally been used to measure reflectance (lv2016multi; sandmeier2000brdf; combes2007new). Over the last few decades, several spaceborne instruments have been designed to carry out multiangular observations, such as MISR, POLDER, MODIS, CHRIS/Proba, and VIIRS. These instruments have limited spatial resolutions, ranging from a few tens to a few hundreds of meters. (labarre2019retrieving) have inverted the Hapke model on a set of 21 multiangular Pleiades images, acquired at a spatial resolution of 2m. However, such acquisitions are rarely available and inversion with three or four images is ill-posed.
In this paper, we explore the potential for estimating the BRDF of natural surfaces using as few as three high-resolution multispectral optical images. Our approach offers new possibilities for studying solar radiation reflected from the Earth’s surface, taking advantage of the multiplicity, temporal coverage, and high spatial resolution of optical satellite imagery.
3 Radiance Fields with RPV Reflectance
We briefly introduce the vanilla NeRF architecture and discuss two key ingredients of our BRDF-NeRF: geometry modelling (depths and normals) as well as the radiometric rendering (RPV BRDF model). The BRDF-NeRF workflow is described in Figure 1.
Preliminaries
NeRF (Mildenhall20eccv_nerf) represents a continuous volumetric field of a static scene that emits light, optimized with a fully connected deep network. Given a 3D point accompanied with a viewing angle , NeRF predicts a volume density and a colour . NeRF renders images by sampling query points along each camera ray and accumulating the colours with weights defined by density, and imposes the rendered images to be close to the training images. Each camera ray r is defined by an origin point o and a viewing direction vector such that . Each query point in r is defined as , where lies between the near and far bounds of the scene, and . The rendered pixel value of ray r is calculated as follows:
(1) |
whith , and . represents the opacity of the current query point and is the transmittance. The contribution of colour to the accumulated colour increases with opacity and transmittance.
3.1 Geometric modelling
We incorporate geometric information to extend the applicability of BRDF-NeRF to sparse view acquisitions and to predict surface normals that are essential for accurate estimates of BRDF, as detailed in Section 3.2.
Depth supervision
Instead of querying ray points crossing the entire volume of the scene, as is the case in the vanilla NeRF, we narrow it down to a buffer space defined around the location of an approximately known surface. This tactic reduces ambiguity and enables reliable volume densities to be estimated with fewer images. We further encourage the depths predicted by NeRF to remain close to the input surface by using the following loss term introduced in (zhang2023spsnerf1):
(2) |
where are the predicted depths calculated as , while the are the input depths obtained from stereo matching on low-resolution images. We have observed that the performance of stereo matching on low resolution images is marginally affected by a change in surface BRDF and can therefore provide sufficiently good depth initialisations for our radiance fields. The parameter corr(r), which corresponds to the similarity score obtained by stereo matching, acts as a weight or confidence. It adjusts the level of supervision, having a strong impact where confidence is high and a minimal impact where input depths are uncertain. is a subset of rays that satisfy at least one of the following two conditions: (1) ; (2) , where represents the uncertainty of the predicted depth, and represents the uncertainty of the input depth. In other words, depth supervision is only applied to rays for which the predicted depths are more uncertain than the input depths.
Surface normal
BRDF is a function that depends on both the incident and viewing angles, which are defined relative to the surface normal. Therefore, the surface normal is crucial to accurately recovering the BRDF. In NeRF, it can be derived as analytical or learned. The analytical normal is calculated as the negative of the normalized gradient of the density field with respect to the spatial location x as (srinivasan2020nerv). The learned normal is predicted from a spatial MLP and can be supervised implicitly (bi2020neural) or with the analytical normal (verbin2022refnerf; li2022neural). Approaches relying on learnt normals led to smooth surfaces and a loss of detail in our case studies. Consequently, we chose to incorporate the analytical normal into our architecture because, despite its computational cost, it provides more accurate and better resolved normal vectors (srinivasan2020nerv).
3.2 Radiometric rendering
The geometric approach presented above guarantees decent 3D reconstructions of Lambertian scenes. Next, we adapt this approach to handle non-Lambertian natural surfaces by estimating a BRDF and incorporating it into the rendering Equation 1.
RPV equation
We estimate the reflectance of natural surfaces using the Rahman-Pinty-Verstraete (RPV) model (rahman1993coupled), a semi-empirical model well suited to satellite images (see Equation 3). We chose this model for its simplicity, its physics-based parameters and its ability to represent asymmetric BRDF, including the hotspot effect. The latter corresponds to a sharp increase in reflectance, which becomes maximum when the illumination and viewing directions are coincident.
In this model, the colour c of a surface point, defined by the normal vector n, the illumination direction and the viewing direction (Figure 3), is calculated as the product of the incoming light , the cosine of the incident angle , and the bidirectional reflectance factor simulated by :
(3) |
is set to a unit vector, and is approximated by because the analytical normal n is not sufficiently smooth. The term can be broken down into an amplitude parameter and three angle-dependent functions: modified Minnaert function , Henyey-Greensteon function , and backscatter function :
(4) |
with , , , and the geometric factor . The illumination and viewing directions are decomposed into zenith angles and , azimuth angles and , relative azimuth angle and phase angle , all defined in a spherical coordinate system determined by the surface normal n (Figure 3).
Detailed Description of RPV Model Input Parameters
The RPV parameter in Equation 4 plays the role of pseudo-albedo. The modified Minnaert function controls the anisotropic behaviour of the surface using the parameter k. If the surface is quasi-Lambertian; if a bowl-shaped pattern dominates (reflectance values increase with the viewing zenith angle); and if a bell-shaped pattern dominates (reflectance values decrease with the viewing zenith angle) (widlowski2004canopy). The parameter of the Henyey-Greenstein function controls the amount of radiation scattered in the forward (0 1) or backward (-1 0) directions. The backscatter function is written as a function of a geometric factor and a parameter , which represents the sharp increase in reflectance in the hotspot direction. When and , the geometric factor disappears and reaches its maximum value, contributing to increase total reflectance. When estimating the RPV model, the ranges of variation of the parameters , k, and are fixed to [0, 1], [0, 2], [-1, 1] and [0, 1] (koukal2014evaluation).
To give the reader an intuition about how RPV parameters affect reflectance, we analyse the bidirectional reflectance function (BRF) of selected points in one of our datasets (Figure 4). The BRFs are plotted by varying the viewing directions (zenith and azimuth angles between [0∘, 90∘] and [0∘, 360∘]) and fixing the Sun’s direction to and . The pseudo-albedo of the selected surface point estimated by BRDF-NeRF is , the normal n=[0, 0, 1]. Among the six combinations of () given in Table 1, backward scattering (Figure 4 (b)) was predicted by BRDF-NeRF. Note that this BRF is consistent with a result generated independently over the same area with 21 Pleiades views (labarre2019retrieving).
Param. | Backward | Forward | Bowl shape | Bell shape | Hotspot effect | |
---|---|---|---|---|---|---|
k | 0.996 | 0.996 | 0.500 | 1.500 | 0.996 | 0.996 |
-0.174 | 0.174 | -0.174 | -0.174 | -0.174 | -0.174 | |
0.979 | 0.979 | 0.979 | 0.979 | 0.500 | 0 |
3.3 Network architecture
The network architecture presented in Figure 5 consists of two progressively trained components: the geometric part and the RPV part. Initially, the geometric part is pre-trained on the assumption of a Lambertian surface. After this pre-training phase (i.e., 20% of the total duration of the training time), the RPV part is introduced. Three MLPs predicting k, and as well as the analytical normal n engage in training, while the albedo is finetuned to match the pseudo-albedo of our new rendering equation (see Equation 3). The separation of the geometry part from the RPV part, which handles the case of non-Lambertian surfaces, ensures that the final training stage works on well-initialised normal vectors.
The input spatial locations x are transformed by positional encoding, the activations for , k, and are sigmoid function, and the k and are scaled to [0, 2] and [-1, 1] to match their real value ranges (koukal2014evaluation). Ultimately, all the parameters of the BRDF-NeRF are optimized to minimise the combination of (1) the colour loss between the ground truth pixel colour and the predicted pixel colours , and (2) the depth loss (Equation 2):
(5) |
where is a weight balancing the contribution of colour and depth. See our experiment on finding optimal in Section 4.6.
4 Numerical experiments
We conduct a series of experiments to evaluate our BRDF-NeRF on novel view synthesis (Section 4.4) and altitude estimation (Section 4.5) tasks. In addition, we examine the influence of atmospheric correction from TOA (Top Of Atmosphere) to TOC (Top Of Canopy) (Section 4.3) and carry out ablation studies to determine the best training strategy (Section 4.6), and the most optimal way of rendering (Section 4.6).
Evaluation Metrics
Precision metrics are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index measure (SSIM) (wang2004image) for view synthesis, and Mean Altitude Error (MAE) for altitude extraction. Ground truth (GT) images are true images not seen during training, while the GT surface is a DSM generated with high-resolution Pleiades panchromatic images, with a ground sampling distance () of 0.5 m, and stereo image matching (mpd:06:sgm). BRDF-NeRF is also compared to competitions Sat-NeRF (mari2022sat), SpS-NeRF (zhang2023spsnerf1) as well as DSMs generated with SGM using full-resolution images (i.e., with m). The source code for EO-NeRF (Mari_2023_CVPR) was not available for comparison. The RPV parameters are indirectly validated by examining quantitative metrics for novel view synthesis. Additionally, one can compare the BRF in Figure 4 (b) with an independent result derived from 21 Pléiades images in (labarre2019retrieving).
4.1 Implementation Details
Training
Our network is trained with the Adam Optimizer (lr=5e-4, decay=0.9, batch size=1024). We use SpS-NeRF’s ray sampling strategy to sample 64 stratified points along each ray, accompanied with 64 guided points following a Gaussian distribution. We optimize BRDF-NeRF for 100k iterations, which takes 10 hours on NVIDIA GPU with 40GB RAM. For fair comparison, the competitive methods (i.e., Sat-NeRF and SpS-NeRF) are also trained for 100k iterations with 64 + 64 points along each ray, which takes 6 hours. BRDF-NeRF is less efficient in training than competitive methods, mainly due to the normal analytical computation. Computational efficiency was not the aim of this work and could be improved in the future with techniques such as tensor decomposition.
Light visibility
It is important to take into account the visibility of samples when scenes contain occlusions, for example when acquiring images in mountainous or urban areas. Visibility describes the transmission of a sample between the light source and the query point. Brute-force computation of the light visibility is computationally expensive, as it requires marching rays from all the query points along the camera ray to the light source. Previous work treats the light visibility with different strategies: (1) assuming that light visibility is the same everywhere (mai2023neural; boss2021nerd); in this scenario, a shadow is embedded into the albedo colour; (2) with a fully analytical approach provided that the camera-light configuration is collocated, i.e., the light rays are aligned with the camera rays (bi2020neural); (3) with a semi-analytical approach where visibility is calculated by ray tracing from the light source to the estimated surface (Zhang_2021; li2022neural; yang2022s; Mari_2023_CVPR); (4) with MLP-learnt visibility, supervised by photometric loss (verbin2022refnerf), or by light visibility calculated by ray tracing (srinivasan2020nerv; yang2022psnerf; derksen2021shadow; mari2022sat).
The fully analytical strategy is intractable in situations where light and view are not collocated, while the semi-analytical strategy can lead to noisy results, especially when few input views are used. The learning-based approach gives the network superfluous freedom that could confuse light visibility with other phenomena, such as albedo colour. Since our main objective is to model natural surfaces, such as bare soil, from aerial images, we can safely assume that the light visibility is constant everywhere, i.e., equal to 1.
4.2 Dataset
Evaluations are carried out on two sites (Djibouti, Lanzhou). We extract two regions of interest ( 1.5 km 1.5 km) in each site, which we refer to as and (e.g., Dji-A and Dji-B). In each dataset, we distinguish three test scenarios ranging from easy (novel view interpolation) to very hard (novel view extrapolation). The input images are RGB with a GSD of 2m and undergo atmospheric correction prior to processing (see Section 4.3). Sadly, we could not perform experiments on the open DFC dataset (bosch2019semantic) representing urban scenes since RPV is designed for natural surfaces.
Djibouti Dataset
It is located in the Asal-Ghoubbet rift (Republic of Djibouti) (labarre2019retrieving) and consists of 21 multiangular Pleiades 1B images collected in a single flyby on January 26, 2013. Three quasi-nadir images are chosen for training, and three other images for testing, including interpolation and extrapolation scenarios (Figure 6 (a)).
Lanzhou Dataset
It is located in Lanzhou (China) and consists of 3 Pleiades 1B images acquired on April 23, 2013 and 3 Pleiades 1A images acquired on June 29, 2013. This is a multi-date dataset where the position of the Sun changes between acquisitions (Figure 6 (b)).
4.3 The effect of atmospheric correction
Atmospheric correction is important for two reasons. Firstly, the atmosphere affects the signal received by the satellite sensor. Secondly, the RPV model, like all BRDF models, describes TOC reflectances whereas Pleiades images are typically supplied as calibrated at TOA. We apply the atmospheric correction using an Orfeo ToolBox (OTB) software library (OTB2002) which is based on the 6S radiative transfer code (vermote1997second). The correction model requires four atmospheric parameters: ozone content (cm-atm), water vapor content (g/cm2), aerosol optical thickness (unitless) and atmospheric pressure (hPa). The first three parameters are estimated from ancillary datasets corresponding to the day of image acquisition, available on NASA’s Earth observation website (https://neo.gsfc.nasa.gov/). The atmospheric pressure is approximated using the formula , where is the surface altitude expressed in meter. The four parameters, along with the adjacency radius for both epochs in the Lanzhou dataset, are shown in Table 2. The Djibouti dataset had been corrected for atmosphere by the satellite image provider.
Figure 7 shows an example of comparison between input images with and without performing atmospheric correction. Images tones between different epochs become more similar with atmospheric correction. Figure 8 and Table 3 compared results based on images with and without atmospheric correction on novel view synthesis and altitude estimation. Results without atmospheric correction gained generally worse metrics and displayed artifacts in synthetic images.
Epoch | adjacency | ||||
---|---|---|---|---|---|
(cm-atm) | (g/cm2) | (unitless) | (hPa) | radius (-) | |
23/04/2013 | 0.3220 | 1.7333 | 0.4665 | 783 | 1.0 |
29/06/2013 | 0.2969 | 2.5625 | 0.0980 | 783 | 1.0 |
Method | AC | MAE | PSNR | SSIM | ||
---|---|---|---|---|---|---|
Easy | Hard | Easy | Hard | |||
SpS-NeRF | ✘ | 3.697 | 28.282 | 26.131 | 0.880 | 0.858 |
SpS-NeRF | ✔ | 3.558 | 29.548 | 24.755 | 0.965 | 0.900 |
BRDF-NeRF | ✘ | 3.439 | 33.315 | 29.857 | 0.946 | 0.923 |
BRDF-NeRF | ✔ | 3.420 | 32.196 | 28.420 | 0.979 | 0.94 |
4.4 Novel view synthesis
Quantitative metrics are presented in Table 4 while qualitative visualisations are provided in Figures 9 and 10. The visualisations here are limited to the most challenging scenario (very hard for Djibouti and hard for Lanzhou) due to space limitation. Our BRDF-NeRF outperforms Sat-NeRF and SpS-NeRF. Among competitive methods, SpS-NeRF produces less blurry renderings than Sat-NeRF (compare Figure 9 (a) and (c) as well as Figure 10 (b) and (d)). Nevertheless, both methods produce minimal hallucination effects (Figure 9 (a), Figure 10 (b,d)). The two competitive methods remain less sharp and far from the colour tone of the NeRF which includes our realistic RPV-based BRDF (compare Figure 9 (c) and (e)). PSNR and SSIM metrics are best for BRDF-NeRF, followed by SpS-NeRF in second place. The margins between BRDF-NeRF and competitive approaches increase from easy to very hard mode, indicating greater robustness of BRDF-NeRF. From single to multiple epoch datasets (i.e., from Djibouti to Lanzhou), the image quality rendered by Sat-NeRF and SpS-NeRF decreased significantly, while BRDF-NeRF recovered photorealistic images in both cases.
Metric | Method | Dji-A | Dji-B | Lzh-A | Lzh-B |
PSNR | Sat-NeRF | 32.747 | 31.818 | 29.979 | 25.470 |
(Easy) | SpS-NeRF | 38.832 | 38.174 | 29.548 | 30.904 |
BRDF-NeRF | 41.844 | 40.823 | 32.196 | 32.165 | |
PSNR | Sat-NeRF | 25.542 | 23.699 | 25.963 | 20.811 |
(Hard) | SpS-NeRF | 28.348 | 27.468 | 24.755 | 24.148 |
BRDF-NeRF | 36.232 | 35.448 | 28.420 | 27.814 | |
PSNR | Sat-NeRF | 23.581 | 21.288 | / | / |
(VHard) | SpS-NeRF | 23.144 | 22.31 | / | / |
BRDF-NeRF | 33.35 | 32.376 | / | / | |
SSIM | Sat-NeRF | 0.927 | 0.950 | 0.962 | 0.925 |
(Easy) | SpS-NeRF | 0.975 | 0.979 | 0.965 | 0.970 |
BRDF-NeRF | 0.985 | 0.988 | 0.979 | 0.975 | |
SSIM | Sat-NeRF | 0.766 | 0.825 | 0.909 | 0.760 |
(Hard) | SpS-NeRF | 0.840 | 0.887 | 0.900 | 0.928 |
BRDF-NeRF | 0.957 | 0.965 | 0.94 | 0.953 | |
SSIM | Sat-NeRF | 0.676 | 0.768 | / | / |
(VHard) | SpS-NeRF | 0.614 | 0.72 | / | / |
BRDF-NeRF | 0.918 | 0.942 | / | / | |
Sat-NeRF | 12.85 | 18.059 | 61.299 | 27.489 | |
MAE | SpS-NeRF | 1.438 | 1.761 | 3.558 | 3.235 |
BRDF-NeRF | 1.378 | 1.614 | 3.42 | 2.941 | |
SGMZ1 | 1.061 | 1.052 | 1.409 | 1.220 |
4.5 Altitude estimation
Quantitative metrics are presented in Table 4, whereas qualitative visualisations are provided in Figures 11 and 12. Sat-NeRF, which was not designed for scenarios with few images, estimates surface altitudes that are tens of meters away from the ground truth surface. SpS-NeRF performs better, thanks to the dense depth supervision, as opposed to supervision with sparse points in Sat-NeRF. Visual assessment reveals that altitudes predicted by Sat-NeRF are either flat (Figure 11 (a-b)), or contain a made up pattern (Figure 12 (a)). SpS-NeRF altitudes are more faithful to ground truth but remain noisy. Our BRDF-NeRF outperforms both versions of NeRF, producing less noisy surfaces while retaining detail.
Compared with surfaces obtained with the classical stereo matching (mpd:06:sgm), BRDF-NeRF appears smoother and less detailed (compare Figure 12 (f) and (h)) in areas with good texture. However, on poorly textured areas where stereo matching is challenging, our BRDF-NeRF predicts coherent altitudes (compare Figure 11 (e) and (g)). Note also that our ground truth surface was generated with stereo matching algorithms, thus the comparison is possibly slightly biasing the MAE metric in favour of the stereo matching surface.
4.6 Ablations
Training strategy
Our BRDF-NeRF model is trained progressively to ensure proper initialisation of the geometry (i.e., density weights in NeRF) before learning the RPV parameters. We perform an ablation with different pre-training strategies to determine the optimal point at which the transition from Lambertian to RPV model should occur. In the Preno approach, the entire network is trained without pre-training. In the Presho, Premed and Prelon approaches, we start from the Lambertian assumption and switch to the RPV model at different training steps, as shown in Figure 13. The initial learning rate is set to 5e4 and decreases to 3.65e5, 2.15e5 and 1.27e5, respectively.
Qualitative results are presented in Figures 14 and 15, while quantitative metrics are provided in Table 5. The absence of pre-training leads to blurry synthetic images and noisy altitude estimations. Performance differences between Presho, Premed, and Prelon are minor, with Premed emerging as the most optimal choice.
Depth Loss Weighting
We perform an ablation experiment to evaluate the contribution of the depth loss term. Removing the term entirely results in very poor altitude predictions, confirming that a simple NeRF architecture is unable to learn from just three views. By increasing the weight in the range , altitude metrics improve consistently (see Figure 15). The PSNR and SSIM metrics corresponding to the synthetic image quality reach a maximum at , suggesting that assigning greater importance to depths compromises the rendering quality. We set the in all our experiments.
Surface or volume rendering
The rendering equation in Equation 3 can be applied as surface rendering () or volume rendering (). In surface rendering, the RPV parameters n, , k, and are estimated at the surface by accumulating points along the ray and applying the rendering once for each ray. In volume rendering, rendering is applied to each sample individually, associating it with a color c, and accumulation is done on the sample colours instead of RPV parameters. is more rigorous than , as the latter assumes every point along the ray follow the RPV reflectance, while assumes the same only for points on the surface which is concordant wit the RPV model definition. We demonstrate in Table 6 and Figure 16 that outperforms and adopt this rendering method in our experiments.
Pre | MAE | PSNR | SSIM | |||||
---|---|---|---|---|---|---|---|---|
Easy | Hard | VHard | Easy | Hard | VHard | |||
no | 1.632 | 38.057 | 33.446 | 31.186 | 0.966 | 0.93 | 0.884 | |
sho | 1.449 | 41.166 | 35.36 | 32.999 | 0.983 | 0.951 | 0.913 | |
med | 1.378 | 41.844 | 36.232 | 33.35 | 0.985 | 0.957 | 0.918 | |
lon | 1.39 | 41.415 | 35.645 | 31.663 | 0.984 | 0.947 | 0.88 | |
med | 9.432 | 39.408 | 34.755 | 31.870 | 0.973 | 0.941 | 0.900 | |
1.877 | 40.894 | 35.822 | 33.318 | 0.982 | 0.951 | 0.914 | ||
1.353 | 41.109 | 34.915 | 32.107 | 0.983 | 0.949 | 0.908 |
MAE | PSNR | SSIM | |||||
---|---|---|---|---|---|---|---|
Easy | Hard | VHard | Easy | Hard | VHard | ||
1.399 | 41.743 | 35.867 | 31.770 | 0.985 | 0.955 | 0.903 | |
1.378 | 41.844 | 36.232 | 33.350 | 0.985 | 0.957 | 0.918 |
5 Conclusion
We presented BRDF-NeRF, an extension of NeRF adapted to sparse satellite imagery, capable of estimating realistic BRDFs for natural surfaces. By incorporating the semi-empirical Rahman-Pinty-Verstraete (RPV) model, BRDF-NeRF enhances the rendering of anisotropic surface reflectance, leading to improved quality of both synthetic images and recovered surface altitudes. Although our experiments show promising results, certain limitations remain. At present, our method does not explicitly model shading effects, and although our surface reconstructions outperform other NeRF-based approaches, they still lack the regularity achieved by SGM. Future work will address these challenges.
6 Acknowledgements
This research was funded by CNES (Centre national d’études spatiales) through a PostDoc scholarship as well as CAROLInA/SURFACEs projects. The Djibouti dataset was obtained through the CNES ISIS framework. Numerical computations were carried out on the SCAPAD cluster at the Institut de Physique du Globe de Paris. We thank Arthur Delorme for the Lanzhou dataset. We also thank Manchun Lei and Antoine Lucas for fruitful discussions.