BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling

1 Introduction

Over the past two decades, significant progress has been made in image processing algorithms. In particular, 3D surface reconstruction has benefited from high-resolution spatial data and algorithms such as semi-global stereo matching (SGM), which can generate detailed surface maps of urban and natural environments (hirschmuller:08:sgm; rosu2015measurement; mpd:06:sgm). However, state-of-the-art approaches still face challenges, particularly with radiometrically heterogeneous surfaces, complex reflectance functions, or diachronic acquisitions. Recent research has attempted to address these challenges by leveraging learning algorithms capable of modelling complex resemblance functions, given appropriate architectures and sufficient training datasets (PSMNet; chebbi2023deepsim; wu2024evaluation). At the same time, a new approach to surface reconstruction has emerged with Neural Radiance Fields (NeRF), which differs from other learning-based methods by operating in a self-supervised manner. It works on single pixels rather than patches, treats non-Lambertian surfaces and generates new synthetic views (Mildenhall20eccv_nerf). Besides, NeRF is capable of estimating the Bidirectional Reflectance Distribution Function (BRDF) of the surface at the same time. Understanding the BRDF of continental surfaces is crucial for a variety of applications, including land cover mapping, assessment of Earth’s radiation budget, climate change studies, vegetation density analysis, and intercalibration of spaceborne sensors (dumont2010high). However, due to the anisotropic nature of the Earth’s reflectance, estimation of the BRDF generally requires numerous angular measurements, which is challenging with a few satellite images.

The aim of this paper is to model the surface’s BRDF explicitly from sparse satellite views, and improve surface reconstruction, particularly over landscapes with anisotropic reflectance characteristics (e.g., bare soil, vegetation). We select NeRF as the algorithmic framework for its capacity to model angle dependent surface reflectance. Our BRDF-NeRF workflow (Figure 1) is designed for satellite acquisitions with only three synchronous views and incorporates the semi-empirical Rahman-Pinty-Verstraete (RPV) BRDF model, widely used in the remote sensing community to represent the BRDF of natural surfaces (rahman1993coupled). To the best of our knowledge, this work is the first to integrate a BRDF model into neural radiance fields for land surfaces. It extends our previous work on neural radiance fields with sparse optical satellite images (zhang2023spsnerf1). The open-source code is available at github.com/LulinZhang/BRDF-NeRF. The terms DSM and surface, as well as SGM and stereo matching are used interchangeably throughout this article.

Refer to caption — Figure 1: BRDF-NeRF Workflow. A few satellite RGB images, and the corresponding low-resolution depth maps calculated using a classical stereo matching algorithm are fed into BRDF-NeRF to predict the normals n, and the RPV parameters $\bm{\rho_{0}}$ , k, $\bm{\Theta}$ and $\bm{\rho_{c}}$ , describing the surface reflectance. $\bm{\rho_{0}}$ represents the amplitude component, k controls the overall shape of the anisotropic behaviour, $\bm{\Theta}$ establishes the degree of forward or backward scattering, and $\bm{\rho_{c}}$ allows to model the hotspot effect. The five parameters are integrated into a RPV renderer to generate the synthetic image. In the meantime, high resolution depths are obtained by accumulating the weights in the volume estimated by BRDF-NeRF.

2 Related works

2.1 Neural Radiance Fields

The vanilla NeRF (Mildenhall20eccv_nerf) leverages a large number of images captured with a pinhole camera to represent small-size scenes as particles that emit light, instead of reflecting light. Subsequent NeRF variants proposed to relax some of those defining constraints, without compromising the quality of the outputs, i.e., the synthesised images and the 3D model. In the following paragraphs we briefly discuss the state-of-the-art approaches relevant to our work.

NeRF from few views

Since NeRF relies solely on pixel values for network training, a large number of input images is essential for generating photo-realistic novel views. Attempts to train NeRF with sparse input images often result in overfitting and inaccurate estimation of scene depth, leading to artifacts in the rendered novel views. This limitation restricts the applicability of NeRF and prolongs training time. To address this challenge, efforts have been made to adapt NeRF to sparse input images by introducing various regularization priors. A common approach is to incorporate depth supervision, including sparse depth (deng2022depth; wang2022sparsenerf; Somraj_2023; guo2024depthguided) and dense depth (wei2021nerfingmvs; roessle2022dense; zhang2023spsnerf1). In addition, methods such as image features (yu2021pixelnerf) or semantic regularization (xu2022sinnerf) have also been explored.

NeRF and BRDF

While vanilla NeRF excels in view synthesis, it cannot relight or edit materials, due to its inability to decompose outgoing radiance into incoming radiance and surface material reflectance. Some researchers proposed to extend NeRF to incorporate information on the Bidirectional Reflectance Distribution Function (BRDF), which characterises how materials reflect light under different viewing and lighting conditions. The majority of BRDF-compatible NeRF variants, such as those proposed by (bi2020neural; srinivasan2020nerv; boss2021nerd; yang2022psnerf; verbin2022refnerf; mai2023neural), adopt some version of the microfacet BRDF model (walter2007microfacet). This model represents reflectance as the superposition of diffuse and specular components and typically includes a surface roughness parameter, influencing the appearance of the surface through a distribution of microfacet orientations. However, while microfacet BRDF models offer effective parameterization, they poorly model BRDF of natural surfaces (e.g., soil, vegetation) which exhibit a more or less marked backscattering behavior (hotspot effect). Additionally, data-driven BRDFs pre-trained on BRDF databases have been explored in NeRF (Zhang_2021). However, the databases consist of artificial materials and have been built in controlled environments. The spectral and directional optical properties of natural materials are often very different. Figure 2 shows the scattering patterns of the Lambertian, Microfacet and RPV models. The latter is an anisotropic BRDF model widely used in remote sensing and which we will use in this work.

NeRF in Earth Observations

Earth observation community has mainly focused on adapting the initial NeRF’s design to meet the specificities of space imagery: changing shadows, dynamic scene due to asynchronous acquisitions, as well as sparse views. Shadow-NeRF (derksen2021shadow) pioneered the application of NeRFs to satellite images, where the authors have explicitly modelled the shadows of the scene by leveraging the Sun’s direction. Sat-NeRF(mari2022sat) extends Shadow-NeRF by replacing the pinhole camera model with an empirical push broom model (i.e., Rational Polynomial Coefficients) and modelling transient objects in the scene such as moving cars. EO-NeRF (Mari_2023_CVPR) employs a novel geometry-based shadow rendering, resulting in more accurate digital surface models (DSMs). SpS-NeRF (zhang2023spsnerf1) further adapted NeRF for scenarios with few satellite views by introducing spatial guidance within NeRF sampling, conditioned on low-resolution input depths. Sat-Mesh (qu2023sat) used a latent vector to deal with inconsistent appearances in satellite imagery, while SUNDIAL (behari2024sundial) proposed a secondary shadow ray casting technique to jointly learn satellite scene geometry, illumination components and Sun direction. SatensoRF (zhang2024satensorf) decomposed colour into ambient, diffuse and specular light. Season-NeRF (gableman2024incorporating) learned and rendered seasonal variations by incorporating time as an additional input variable. GC-NeRF (wan2024constraining) proposed a geometric loss to create a compact weight distribution around the surface. In addition, RS-NeRF (xie2023remote), SAT-NGP (billouard2024sat) and SatensoRF (zhang2024satensorf) addressed the computational efficiency of NeRF by accelerating the runtimes with hash encoding and voxel occupancy grids to sample points near the surface, as well as through tensor decomposition. Last but not least, Radar fields (ehret2023radar) extended NeRF to spaceborne synthetic aperture radar (SAR) images.

2.2 BRDF in remote sensing

Existing BRDF models

Numerous BRDF models have been developed to describe the spectral and directional reflectance of natural and artificial surfaces. They can be classified into physical, empirical and semi-empirical models. Physical models (pinty1991extracting) are based on rigorously defined physical parameters and offer the most accurate descriptions of observed scenes. However, a large number of multiangular observations are required to retrieve these parameters by model inversion, making them impractical for optically complex surfaces. Empirical models (walthall1985simple; minnaert1941reciprocity; shibayama1985view) are derived as simple statistical fits to observed data, and provide no additional insights into the surface type or structure. Semi-empirical models (rahman1993coupled; wanner1995derivation; hapke1981bidirectional; roujean1992bidirectional; lucht2000algorithm) employ specific mathematical functions to best represent the physical interactions between the radiation field and the surface. They accept a reduced number of parameters, which facilitate their inversion. The semi-empirical RPV model (rahman1993coupled) is among the most commonly used. It is capable of representing the reflectance of various natural surfaces with just four parameters (Figure 2), and has been used to address atmospheric radiation transfer problems (martonchik1998techniques; martonchik1998determination); classify forest types (koukal2014evaluation); simulate plant leaves reflectance (biliouris2009rpv); estimate BRF values under unmeasured illumination and viewing angles (lattanzio2007consistency); estimate surface albedo (martonchik1998techniques; martonchik1998determination; privette2002first); and identify surface properties (widlowski2001characterization; gao2003detecting).

Deriving BRDF

Most of the parameters controlling BRDF cannot be measured in the field, but are obtained by invertion of surface reflectance models on observations. To guarantee reliable estimates, the surface must be observed over a wide range of illumination/viewing angles. Laboratory and field measurements using goniophotometers have traditionally been used to measure reflectance (lv2016multi; sandmeier2000brdf; combes2007new). Over the last few decades, several spaceborne instruments have been designed to carry out multiangular observations, such as MISR, POLDER, MODIS, CHRIS/Proba, and VIIRS. These instruments have limited spatial resolutions, ranging from a few tens to a few hundreds of meters. (labarre2019retrieving) have inverted the Hapke model on a set of 21 multiangular Pleiades images, acquired at a spatial resolution of 2m. However, such acquisitions are rarely available and inversion with three or four images is ill-posed.

In this paper, we explore the potential for estimating the BRDF of natural surfaces using as few as three high-resolution multispectral optical images. Our approach offers new possibilities for studying solar radiation reflected from the Earth’s surface, taking advantage of the multiplicity, temporal coverage, and high spatial resolution of optical satellite imagery.

3 Radiance Fields with RPV Reflectance

We briefly introduce the vanilla NeRF architecture and discuss two key ingredients of our BRDF-NeRF: geometry modelling (depths and normals) as well as the radiometric rendering (RPV BRDF model). The BRDF-NeRF workflow is described in Figure 1.

Preliminaries

NeRF (Mildenhall20eccv_nerf) represents a continuous volumetric field of a static scene that emits light, optimized with a fully connected deep network. Given a 3D point $\textbf{x}=(x,y,z)$ accompanied with a viewing angle $\textbf{w}_{r}=(d_{x},d_{y},d_{z})$ , NeRF predicts a volume density $\sigma$ and a colour $\textbf{c}=(r,g,b)$ . NeRF renders images by sampling $N$ query points along each camera ray and accumulating the colours with weights defined by density, and imposes the rendered images to be close to the training images. Each camera ray r is defined by an origin point o and a viewing direction vector $\textbf{w}_{r}$ such that $\textbf{r}(t)=\textbf{o}+t\textbf{w}_{r}$ . Each query point $\textbf{x}_{i}$ in r is defined as $\textbf{x}_{i}=\textbf{o}+t_{i}\textbf{w}_{r}$ , where $t_{i}$ lies between the near and far bounds of the scene, $t_{n}$ and $t_{f}$ . The rendered pixel value $\textbf{C}(\textbf{r})$ of ray r is calculated as follows:

\displaystyle\textbf{C(r)}=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}\textbf{c}_{i}}~{},

(1)

whith ${\alpha}_{i}=1-e^{-{\sigma}_{i}{\delta}_{i}}$ , $T_{i}=\prod_{j=1}^{i-1}{(1-{\alpha}_{j})}$ and ${\delta}_{i}=t_{i+1}-t_{i}$ . $\alpha_{i}$ represents the opacity of the current query point $\textbf{x}_{i}$ and $T_{i}$ is the transmittance. The contribution of colour $\textbf{c}_{i}$ to the accumulated colour $\textbf{C}(\textbf{r})$ increases with opacity and transmittance.

3.1 Geometric modelling

We incorporate geometric information to extend the applicability of BRDF-NeRF to sparse view acquisitions and to predict surface normals that are essential for accurate estimates of BRDF, as detailed in Section 3.2.

Depth supervision

Instead of querying ray points crossing the entire volume of the scene, as is the case in the vanilla NeRF, we narrow it down to a buffer space defined around the location of an approximately known surface. This tactic reduces ambiguity and enables reliable volume densities to be estimated with fewer images. We further encourage the depths predicted by NeRF to remain close to the input surface by using the following loss term introduced in (zhang2023spsnerf1):

\mathcal{L}_{depth}(\textbf{r})=\sum_{\textbf{r}\in R_{sub}}(corr(\textbf{r})(% D(\textbf{r})-\overline{D}(\textbf{r}))^{2}~{},

(2)

where ${D}(\textbf{r})$ are the predicted depths calculated as ${D}(\textbf{r})=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}t_{i}}$ , while the $\overline{D}(\textbf{r})$ are the input depths obtained from stereo matching on low-resolution images. We have observed that the performance of stereo matching on low resolution images is marginally affected by a change in surface BRDF and can therefore provide sufficiently good depth initialisations for our radiance fields. The parameter corr(r), which corresponds to the similarity score obtained by stereo matching, acts as a weight or confidence. It adjusts the level of supervision, having a strong impact where confidence is high and a minimal impact where input depths are uncertain. $R_{sub}$ is a subset of rays that satisfy at least one of the following two conditions: (1) $S(\mathbf{r})>\Sigma(\mathbf{r})$ ; (2) $\left|(D(\textbf{r})-\overline{D}(\textbf{r}))\right|>\Sigma(\mathbf{r})$ , where $S(\textbf{r})^{2}=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}(t_{i}-D(\textbf{r}))^{2}}$ represents the uncertainty of the predicted depth, and $\Sigma(\textbf{r})=1-corr(\textbf{r})$ represents the uncertainty of the input depth. In other words, depth supervision is only applied to rays for which the predicted depths are more uncertain than the input depths.

Surface normal

BRDF is a function that depends on both the incident and viewing angles, which are defined relative to the surface normal. Therefore, the surface normal is crucial to accurately recovering the BRDF. In NeRF, it can be derived as analytical or learned. The analytical normal is calculated as the negative of the normalized gradient of the density field $\sigma$ with respect to the spatial location x as $\textbf{n(x)}=-\frac{\nabla_{x}(\sigma)}{\|\nabla_{x}(\sigma)\|_{2}}$ (srinivasan2020nerv). The learned normal is predicted from a spatial MLP and can be supervised implicitly (bi2020neural) or with the analytical normal (verbin2022refnerf; li2022neural). Approaches relying on learnt normals led to smooth surfaces and a loss of detail in our case studies. Consequently, we chose to incorporate the analytical normal into our architecture because, despite its computational cost, it provides more accurate and better resolved normal vectors (srinivasan2020nerv).

3.2 Radiometric rendering

The geometric approach presented above guarantees decent 3D reconstructions of Lambertian scenes. Next, we adapt this approach to handle non-Lambertian natural surfaces by estimating a BRDF and incorporating it into the rendering Equation 1.

RPV equation

We estimate the reflectance of natural surfaces using the Rahman-Pinty-Verstraete (RPV) model (rahman1993coupled), a semi-empirical model well suited to satellite images (see Equation 3). We chose this model for its simplicity, its physics-based parameters and its ability to represent asymmetric BRDF, including the hotspot effect. The latter corresponds to a sharp increase in reflectance, which becomes maximum when the illumination and viewing directions are coincident.

In this model, the colour c of a surface point, defined by the normal vector n, the illumination direction $\textbf{w}_{ir}$ and the viewing direction $\textbf{w}_{r}$ (Figure 3), is calculated as the product of the incoming light $L_{ir}$ , the cosine of the incident angle $|\textbf{w}_{ir}\cdot\textbf{n}|$ , and the bidirectional reflectance factor simulated by $RPV$ :

\textbf{c}(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r})=L_{ir}\cdot|\textbf{w}_{% ir}\cdot\textbf{n}|\cdot RPV(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r}),

(3)

$L_{ir}$ is set to a unit vector, and $|\textbf{w}_{ir}\cdot\textbf{n}|$ is approximated by $|\textbf{w}_{ir}\cdot[0,0,1]|$ because the analytical normal n is not sufficiently smooth. The $RPV$ term can be broken down into an amplitude parameter $\bm{\rho_{0}}$ and three angle-dependent functions: modified Minnaert function $M$ , Henyey-Greensteon function $F_{HG}$ , and backscatter function $H$ :

RPV(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r})=\bm{\rho_{0}}\cdot M({\theta}_{% ir},{\theta}_{r},\textbf{k})\cdot F_{HG}(g,\bm{\Theta})\cdot H(\bm{\rho_{c}},G),

(4)

with $M({\theta}_{ir},{\theta}_{r},\bm{k})=(cos\theta_{ir}cos\theta_{r}(cos\theta_{% ir}+cos\theta_{r}))^{\bm{k}-1}$ , $F_{HG}(g,\bm{\Theta})=(1-\bm{\Theta}^{2})\cdot(1+2\bm{\Theta}cosg+\bm{\Theta}^% {2})^{-3/2}$ , $H(\bm{\rho_{c}},G)=1+(1-\bm{\rho_{c}})/(1+G)$ , and the geometric factor $G=(tan^{2}\theta_{ir}+tan^{2}\theta_{r}-2tan\theta_{ir}tan\theta_{r}cos\Phi)^{% 1/2}$ . The illumination $\textbf{w}_{ir}$ and viewing $\textbf{w}_{r}$ directions are decomposed into zenith angles ${\theta}_{ir}$ and ${\theta}_{r}$ , azimuth angles ${\Phi}_{ir}$ and ${\Phi}_{r}$ , relative azimuth angle $\Phi$ and phase angle $g$ , all defined in a spherical coordinate system determined by the surface normal n (Figure 3).

Detailed Description of RPV Model Input Parameters

The RPV parameter $\bm{\rho_{0}}$ in Equation 4 plays the role of pseudo-albedo. The modified Minnaert function controls the anisotropic behaviour of the surface using the parameter k. If $\textbf{k}\approx 1$ the surface is quasi-Lambertian; if $\textbf{k}\textless 1$ a bowl-shaped pattern dominates (reflectance values increase with the viewing zenith angle); and if $\textbf{k}\textgreater 1$ a bell-shaped pattern dominates (reflectance values decrease with the viewing zenith angle) (widlowski2004canopy). The parameter $\bm{\Theta}$ of the Henyey-Greenstein function controls the amount of radiation scattered in the forward (0 $\leq\bm{\Theta}\leq$ 1) or backward (-1 $\leq\bm{\Theta}\leq$ 0) directions. The backscatter function $H$ is written as a function of a geometric factor $G$ and a parameter $\bm{\rho_{c}}$ , which represents the sharp increase in reflectance in the hotspot direction. When $\theta_{ir}=\theta_{r}$ and $\Phi_{ir}=\Phi_{r}$ , the geometric factor disappears and $H$ reaches its maximum value, contributing to increase total reflectance. When estimating the RPV model, the ranges of variation of the parameters $\bm{\rho_{0}}$ , k, $\bm{\Theta}$ and $\bm{\rho_{c}}$ are fixed to [0, 1], [0, 2], [-1, 1] and [0, 1] (koukal2014evaluation).

To give the reader an intuition about how RPV parameters affect reflectance, we analyse the bidirectional reflectance function (BRF) of selected points in one of our datasets (Figure 4). The BRFs are plotted by varying the viewing directions (zenith and azimuth angles between [0^∘, 90^∘] and [0^∘, 360^∘]) and fixing the Sun’s direction to $\theta_{ir}=52.1^{\circ}$ and $\Phi_{ir}=142.5^{\circ}$ . The pseudo-albedo of the selected surface point estimated by BRDF-NeRF is $\bm{\rho_{0}}=[0.122,0.105,0.091]$ , the normal n=[0, 0, 1]. Among the six combinations of ( $\bm{\Theta},\mathbf{k},\bm{\rho_{c}}$ ) given in Table 1, backward scattering (Figure 4 (b)) was predicted by BRDF-NeRF. Note that this BRF is consistent with a result generated independently over the same area with 21 Pleiades views (labarre2019retrieving).

Param.	Backward	Forward	Bowl shape	Bell shape	Hotspot effect
k	0.996	0.996	0.500	1.500	0.996	0.996
$\bm{\Theta}$	-0.174	0.174	-0.174	-0.174	-0.174	-0.174
$\bm{\rho_{c}}$	0.979	0.979	0.979	0.979	0.500	0

Table 1: RPV Model Parameter Sets. Different RPV parameters k,

\bm{\Theta}

and

\bm{\rho_{c}}

corresponding to the reflectance spectra in Figure 4 (b-f).

3.3 Network architecture

The network architecture presented in Figure 5 consists of two progressively trained components: the geometric part and the RPV part. Initially, the geometric part is pre-trained on the assumption of a Lambertian surface. After this pre-training phase (i.e., $\sim$ 20% of the total duration of the training time), the RPV part is introduced. Three MLPs predicting k, $\bm{\Theta}$ and $\bm{\rho_{c}}$ as well as the analytical normal n engage in training, while the albedo $\bm{\rho_{0}}$ is finetuned to match the pseudo-albedo of our new rendering equation (see Equation 3). The separation of the geometry part from the RPV part, which handles the case of non-Lambertian surfaces, ensures that the final training stage works on well-initialised normal vectors.

The input spatial locations x are transformed by positional encoding, the activations for $\bm{\rho_{0}}$ , k, $\bm{\Theta}$ and $\bm{\rho_{c}}$ are sigmoid function, and the k and $\bm{\Theta}$ are scaled to [0, 2] and [-1, 1] to match their real value ranges (koukal2014evaluation). Ultimately, all the parameters of the BRDF-NeRF are optimized to minimise the combination of (1) the colour loss between the ground truth pixel colour $\textbf{$\overline{\mathbf{C}}$}(\textbf{r})$ and the predicted pixel colours $\textbf{C}(\textbf{r})$ , and (2) the depth loss $\mathcal{L}_{depth}(\textbf{r})$ (Equation 2):

\mathcal{L}=\sum_{\textbf{r}\in R}\left\|\textbf{C}(\textbf{r})-\textbf{$% \overline{\mathbf{C}}$}(\textbf{r})\right\|_{2}^{2}+\lambda\mathcal{L}_{depth}% (\textbf{r})~{},

(5)

where $\lambda$ is a weight balancing the contribution of colour and depth. See our experiment on finding optimal $\lambda$ in Section 4.6.

4 Numerical experiments

We conduct a series of experiments to evaluate our BRDF-NeRF on novel view synthesis (Section 4.4) and altitude estimation (Section 4.5) tasks. In addition, we examine the influence of atmospheric correction from TOA (Top Of Atmosphere) to TOC (Top Of Canopy) (Section 4.3) and carry out ablation studies to determine the best training strategy (Section 4.6), and the most optimal way of rendering (Section 4.6).

Evaluation Metrics

Precision metrics are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index measure (SSIM) (wang2004image) for view synthesis, and Mean Altitude Error (MAE) for altitude extraction. Ground truth (GT) images are true images not seen during training, while the GT surface is a DSM generated with high-resolution Pleiades panchromatic images, with a ground sampling distance ( $GSD$ ) of 0.5 m, and stereo image matching (mpd:06:sgm). BRDF-NeRF is also compared to competitions Sat-NeRF (mari2022sat), SpS-NeRF (zhang2023spsnerf1) as well as DSMs generated with SGM using full-resolution images (i.e., $SGM_{Z1}$ with $GSD=2$ m). The source code for EO-NeRF (Mari_2023_CVPR) was not available for comparison. The RPV parameters are indirectly validated by examining quantitative metrics for novel view synthesis. Additionally, one can compare the BRF in Figure 4 (b) with an independent result derived from 21 Pléiades images in (labarre2019retrieving).

4.1 Implementation Details

Training

Our network is trained with the Adam Optimizer (lr=5e-4, decay=0.9, batch size=1024). We use SpS-NeRF’s ray sampling strategy to sample 64 stratified points along each ray, accompanied with 64 guided points following a Gaussian distribution. We optimize BRDF-NeRF for 100k iterations, which takes $\sim$ 10 hours on NVIDIA GPU with 40GB RAM. For fair comparison, the competitive methods (i.e., Sat-NeRF and SpS-NeRF) are also trained for 100k iterations with 64 + 64 points along each ray, which takes $\sim$ 6 hours. BRDF-NeRF is less efficient in training than competitive methods, mainly due to the normal analytical computation. Computational efficiency was not the aim of this work and could be improved in the future with techniques such as tensor decomposition.

Light visibility

It is important to take into account the visibility of samples when scenes contain occlusions, for example when acquiring images in mountainous or urban areas. Visibility describes the transmission of a sample between the light source and the query point. Brute-force computation of the light visibility is computationally expensive, as it requires marching rays from all the query points along the camera ray to the light source. Previous work treats the light visibility with different strategies: (1) assuming that light visibility is the same everywhere (mai2023neural; boss2021nerd); in this scenario, a shadow is embedded into the albedo colour; (2) with a fully analytical approach provided that the camera-light configuration is collocated, i.e., the light rays are aligned with the camera rays (bi2020neural); (3) with a semi-analytical approach where visibility is calculated by ray tracing from the light source to the estimated surface (Zhang_2021; li2022neural; yang2022s; Mari_2023_CVPR); (4) with MLP-learnt visibility, supervised by photometric loss (verbin2022refnerf), or by light visibility calculated by ray tracing (srinivasan2020nerv; yang2022psnerf; derksen2021shadow; mari2022sat).

The fully analytical strategy is intractable in situations where light and view are not collocated, while the semi-analytical strategy can lead to noisy results, especially when few input views are used. The learning-based approach gives the network superfluous freedom that could confuse light visibility with other phenomena, such as albedo colour. Since our main objective is to model natural surfaces, such as bare soil, from aerial images, we can safely assume that the light visibility is constant everywhere, i.e., equal to 1.

4.2 Dataset

Evaluations are carried out on two sites (Djibouti, Lanzhou). We extract two regions of interest ( $\sim$ 1.5 km $\times$ 1.5 km) in each site, which we refer to as $A$ and $B$ (e.g., Dji-A and Dji-B). In each dataset, we distinguish three test scenarios ranging from easy (novel view interpolation) to very hard (novel view extrapolation). The input images are RGB with a GSD of 2m and undergo atmospheric correction prior to processing (see Section 4.3). Sadly, we could not perform experiments on the open DFC dataset (bosch2019semantic) representing urban scenes since RPV is designed for natural surfaces.

Djibouti Dataset

It is located in the Asal-Ghoubbet rift (Republic of Djibouti) (labarre2019retrieving) and consists of 21 multiangular Pleiades 1B images collected in a single flyby on January 26, 2013. Three quasi-nadir images are chosen for training, and three other images for testing, including interpolation and extrapolation scenarios (Figure 6 (a)).

Lanzhou Dataset

It is located in Lanzhou (China) and consists of 3 Pleiades 1B images acquired on April 23, 2013 and 3 Pleiades 1A images acquired on June 29, 2013. This is a multi-date dataset where the position of the Sun changes between acquisitions (Figure 6 (b)).

4.3 The effect of atmospheric correction

Atmospheric correction is important for two reasons. Firstly, the atmosphere affects the signal received by the satellite sensor. Secondly, the RPV model, like all BRDF models, describes TOC reflectances whereas Pleiades images are typically supplied as calibrated at TOA. We apply the atmospheric correction using an Orfeo ToolBox (OTB) software library (OTB2002) which is based on the 6S radiative transfer code (vermote1997second). The correction model requires four atmospheric parameters: ozone content $U_{O3}$ (cm-atm), water vapor content $U_{H2O}$ (g/cm2), aerosol optical thickness $\tau_{A}$ (unitless) and atmospheric pressure $P_{a}$ (hPa). The first three parameters are estimated from ancillary datasets corresponding to the day of image acquisition, available on NASA’s Earth observation website (https://neo.gsfc.nasa.gov/). The atmospheric pressure is approximated using the formula $P_{a}=1013.25\cdot(1-0.0065\cdot Z/288.15)^{5.31}$ , where $Z$ is the surface altitude expressed in meter. The four parameters, along with the adjacency radius for both epochs in the Lanzhou dataset, are shown in Table 2. The Djibouti dataset had been corrected for atmosphere by the satellite image provider.

Figure 7 shows an example of comparison between input images with and without performing atmospheric correction. Images tones between different epochs become more similar with atmospheric correction. Figure 8 and Table 3 compared results based on images with and without atmospheric correction on novel view synthesis and altitude estimation. Results without atmospheric correction gained generally worse metrics and displayed artifacts in synthetic images.

Epoch	$U_{O3}$	$U_{H2O}$	$\tau_{A}$	$P_{a}$	adjacency
Epoch	(cm-atm)	(g/cm²)	(unitless)	(hPa)	radius (-)
23/04/2013	0.3220	1.7333	0.4665	783	1.0
29/06/2013	0.2969	2.5625	0.0980	783	1.0

Table 2: Atmospheric Correction Parameters. Values for ozone content

U_{O3}

, water vapor content

U_{H2O}

, aerosol optical thickness

\tau_{A}

and atmospheric pressure

P_{a}

, as well as the adjacency radius used for atmospheric correction in the Lanzhou dataset.

Method	AC	MAE $\downarrow$	PSNR $\uparrow$		SSIM $\uparrow$
Method	AC	MAE $\downarrow$	Easy	Hard	Easy	Hard
SpS-NeRF	✘	3.697	28.282	26.131	0.880	0.858
SpS-NeRF	✔	3.558	29.548	24.755	0.965	0.900
BRDF-NeRF	✘	3.439	33.315	29.857	0.946	0.923
BRDF-NeRF	✔	3.420	32.196	28.420	0.979	0.94

Table 3: Ablation of Atmospheric Correction – Quantitative Evaluation on Lzh-A. Experiments with ✔and without ✘atmospheric correction (AC). BRDF-NeRF ✔performs best, with the smallest MAE and biggest SSIM. Although BRDF-NeRF ✘shows slightly higher PSNR, qualitative visualisation in Figure 8(b) reveal stripe artefacts, which are not present in BRDF-NeRF ✔in Figure 10(e).

4.4 Novel view synthesis

Quantitative metrics are presented in Table 4 while qualitative visualisations are provided in Figures 9 and 10. The visualisations here are limited to the most challenging scenario (very hard for Djibouti and hard for Lanzhou) due to space limitation. Our BRDF-NeRF outperforms Sat-NeRF and SpS-NeRF. Among competitive methods, SpS-NeRF produces less blurry renderings than Sat-NeRF (compare Figure 9 (a) and (c) as well as Figure 10 (b) and (d)). Nevertheless, both methods produce minimal hallucination effects (Figure 9 (a), Figure 10 (b,d)). The two competitive methods remain less sharp and far from the colour tone of the NeRF which includes our realistic RPV-based BRDF (compare Figure 9 (c) and (e)). PSNR and SSIM metrics are best for BRDF-NeRF, followed by SpS-NeRF in second place. The margins between BRDF-NeRF and competitive approaches increase from easy to very hard mode, indicating greater robustness of BRDF-NeRF. From single to multiple epoch datasets (i.e., from Djibouti to Lanzhou), the image quality rendered by Sat-NeRF and SpS-NeRF decreased significantly, while BRDF-NeRF recovered photorealistic images in both cases.

Metric	Method	Dji-A	Dji-B	Lzh-A	Lzh-B
PSNR	Sat-NeRF	32.747	31.818	29.979	25.470
(Easy)	SpS-NeRF	38.832	38.174	29.548	30.904
$\uparrow$	BRDF-NeRF	41.844	40.823	32.196	32.165
PSNR	Sat-NeRF	25.542	23.699	25.963	20.811
(Hard)	SpS-NeRF	28.348	27.468	24.755	24.148
$\uparrow$	BRDF-NeRF	36.232	35.448	28.420	27.814
PSNR	Sat-NeRF	23.581	21.288	/	/
(VHard)	SpS-NeRF	23.144	22.31	/	/
$\uparrow$	BRDF-NeRF	33.35	32.376	/	/
SSIM	Sat-NeRF	0.927	0.950	0.962	0.925
(Easy)	SpS-NeRF	0.975	0.979	0.965	0.970
$\uparrow$	BRDF-NeRF	0.985	0.988	0.979	0.975
SSIM	Sat-NeRF	0.766	0.825	0.909	0.760
(Hard)	SpS-NeRF	0.840	0.887	0.900	0.928
$\uparrow$	BRDF-NeRF	0.957	0.965	0.94	0.953
SSIM	Sat-NeRF	0.676	0.768	/	/
(VHard)	SpS-NeRF	0.614	0.72	/	/
$\uparrow$	BRDF-NeRF	0.918	0.942	/	/
	Sat-NeRF	12.85	18.059	61.299	27.489
MAE	SpS-NeRF	1.438	1.761	3.558	3.235
$\downarrow$	BRDF-NeRF	1.378	1.614	3.42	2.941
	SGM_Z1	1.061	1.052	1.409	1.220

Table 4: Quantitative Evaluation. The best and second best performing metrics are in blue and magenta. For each dataset, we train a BRDF-NeRF to render three images in easy, hard (and very hard when existing) modes at the same time. BRDF-NeRF achieves better PSNR, SSIM and MAE than Sat-NeRF and SpS-NeRF. However, BRDF-NeRF has higher MAEs than SGM_Z1, which we attribute to NeRF’s design that handles pixels individually without taking context into account.

4.5 Altitude estimation

Quantitative metrics are presented in Table 4, whereas qualitative visualisations are provided in Figures 11 and 12. Sat-NeRF, which was not designed for scenarios with few images, estimates surface altitudes that are tens of meters away from the ground truth surface. SpS-NeRF performs better, thanks to the dense depth supervision, as opposed to supervision with sparse points in Sat-NeRF. Visual assessment reveals that altitudes predicted by Sat-NeRF are either flat (Figure 11 (a-b)), or contain a made up pattern (Figure 12 (a)). SpS-NeRF altitudes are more faithful to ground truth but remain noisy. Our BRDF-NeRF outperforms both versions of NeRF, producing less noisy surfaces while retaining detail.

Compared with surfaces obtained with the classical stereo matching (mpd:06:sgm), BRDF-NeRF appears smoother and less detailed (compare Figure 12 (f) and (h)) in areas with good texture. However, on poorly textured areas where stereo matching is challenging, our BRDF-NeRF predicts coherent altitudes (compare Figure 11 (e) and (g)). Note also that our ground truth surface was generated with stereo matching algorithms, thus the comparison is possibly slightly biasing the MAE metric in favour of the stereo matching surface.

4.6 Ablations

Training strategy

Our BRDF-NeRF model is trained progressively to ensure proper initialisation of the geometry (i.e., density weights in NeRF) before learning the RPV parameters. We perform an ablation with different pre-training strategies to determine the optimal point at which the transition from Lambertian to RPV model should occur. In the Pre_no approach, the entire network is trained without pre-training. In the Pre_sho, Pre_med and Pre_lon approaches, we start from the Lambertian assumption and switch to the RPV model at different training steps, as shown in Figure 13. The initial learning rate is set to 5e4 and decreases to 3.65e5, 2.15e5 and 1.27e5, respectively.

Qualitative results are presented in Figures 14 and 15, while quantitative metrics are provided in Table 5. The absence of pre-training leads to blurry synthetic images and noisy altitude estimations. Performance differences between Pre_sho, Pre_med, and Pre_lon are minor, with Pre_med emerging as the most optimal choice.

Depth Loss Weighting

We perform an ablation experiment to evaluate the contribution of the depth loss term. Removing the term entirely results in very poor altitude predictions, confirming that a simple NeRF architecture is unable to learn from just three views. By increasing the weight in the range $[\frac{1}{3},\frac{50}{3}]$ , altitude metrics improve consistently (see Figure 15). The PSNR and SSIM metrics corresponding to the synthetic image quality reach a maximum at $\lambda=\frac{10}{3}$ , suggesting that assigning greater importance to depths compromises the rendering quality. We set the $\lambda=\frac{10}{3}$ in all our experiments.

Surface or volume rendering

The rendering equation in Equation 3 can be applied as surface rendering ( $Ren_{sur}$ ) or volume rendering ( $Ren_{vol}$ ). In surface rendering, the RPV parameters n, $\bm{\rho_{0}}$ , k, $\bm{\Theta}$ and $\bm{\rho_{c}}$ are estimated at the surface by accumulating $N$ points along the ray and applying the rendering once for each ray. In volume rendering, rendering is applied to each sample individually, associating it with a color c, and accumulation is done on the sample colours instead of RPV parameters. $Ren_{sur}$ is more rigorous than $Ren_{vol}$ , as the latter assumes every point along the ray follow the RPV reflectance, while $Ren_{sur}$ assumes the same only for points on the surface which is concordant wit the RPV model definition. We demonstrate in Table 6 and Figure 16 that $Ren_{sur}$ outperforms $Ren_{vol}$ and adopt this rendering method in our experiments.

Pre	$\lambda$	MAE $\downarrow$	PSNR $\uparrow$			SSIM $\uparrow$
Pre	$\lambda$	MAE $\downarrow$	Easy	Hard	VHard	Easy	Hard	VHard
no	$\frac{10}{3}$	1.632	38.057	33.446	31.186	0.966	0.93	0.884
sho		1.449	41.166	35.36	32.999	0.983	0.951	0.913
med		1.378	41.844	36.232	33.35	0.985	0.957	0.918
lon		1.39	41.415	35.645	31.663	0.984	0.947	0.88
med	$\lambda=0$	9.432	39.408	34.755	31.870	0.973	0.941	0.900
	$\lambda=\frac{1}{3}$	1.877	40.894	35.822	33.318	0.982	0.951	0.914
	$\lambda=\frac{50}{3}$	1.353	41.109	34.915	32.107	0.983	0.949	0.908

Table 5: Training Strategies – Quantitative Evaluation. Pre refers to the various training settings shown in Figure 13, while

\lambda

is the parameter that balances the contribution of colour and depth losses in Equation 5. The best and second best performing metrics are in blue and magenta. Pre_med achieved the best PSNR and SSIM and the second best MAE.

\lambda=\frac{50}{3}

ranks the best for MAE, but has worse PSNR and SSIM than Pre_med. Tests correspond to Dji-A dataset.

	MAE $\downarrow$	PSNR $\uparrow$			SSIM $\uparrow$
	MAE $\downarrow$	Easy	Hard	VHard	Easy	Hard	VHard
$Ren_{vol}$	1.399	41.743	35.867	31.770	0.985	0.955	0.903
$Ren_{sur}$	1.378	41.844	36.232	33.350	0.985	0.957	0.918

Table 6: Rendering – Quantitative Evaluation of

Ren_{vol}

and

Ren_{sur}

.

Ren_{vol}

produces new views and surfaces close to

Ren_{sur}

with slightly poorer metrics overall. Tests correspond to Dji-A dataset.

5 Conclusion

We presented BRDF-NeRF, an extension of NeRF adapted to sparse satellite imagery, capable of estimating realistic BRDFs for natural surfaces. By incorporating the semi-empirical Rahman-Pinty-Verstraete (RPV) model, BRDF-NeRF enhances the rendering of anisotropic surface reflectance, leading to improved quality of both synthetic images and recovered surface altitudes. Although our experiments show promising results, certain limitations remain. At present, our method does not explicitly model shading effects, and although our surface reconstructions outperform other NeRF-based approaches, they still lack the regularity achieved by SGM. Future work will address these challenges.

6 Acknowledgements

This research was funded by CNES (Centre national d’études spatiales) through a PostDoc scholarship as well as CAROLInA/SURFACEs projects. The Djibouti dataset was obtained through the CNES ISIS framework. Numerical computations were carried out on the SCAPAD cluster at the Institut de Physique du Globe de Paris. We thank Arthur Delorme for the Lanzhou dataset. We also thank Manchun Lei and Antoine Lucas for fruitful discussions.

BRDF-NeRF: Neural Radiance Fields with
Optical Satellite Images and BRDF Modelling

Abstract

keywords: