BRDF-NeRF: Neural Radiance Fields with
Optical Satellite Images and BRDF Modelling

Lulin Zhang Ewelina Rupnik Tri Dung Nguyen Stéphane Jacquemoud Yann Klinger Université de Paris, Institut de Physique du Globe de Paris, CNRS, Paris, France Université de Gustave Eiffel, IGN-ENSG, LaSTIG, Saint-Mandé, France
Abstract

Understanding the anisotropic reflectance of complex Earth’s surfaces from satellite images is essential for many applications. Neural radiance fields (NeRF) have gained popularity as a learning technique capable of inferring the bidirectional reflectance distribution function (BRDF) of a scene from a set of images. However, previous research has mainly focused on applying NeRF to close-range images, estimating simple Microfacet BRDF models, which are inadequate for most Earth surfaces. Additionally, high-quality NeRFs typically require dozens of images acquired simultaneously, a rare scenario in satellite imagery. To overcome these challenges, we introduce BRDF-NeRF, designed to explicitly estimate the Rahman-Pinty-Verstraete (RPV) model, a semi-empirical BRDF model widely used in remote sensing. We evaluate our method on two datasets: (1) Djibouti, captured in a single epoch at different viewing angles and for a fixed Sun position, and (2) Lanzhou, captured at different epochs with varying viewing angles and Sun positions. Our work, using only three or four satellite images for training, shows that BRDF-NeRF can successfully synthesize new views from directions far from those of the training set, and generate high-quality digital surface models (DSMs).

keywords:
Neural radiance fields, Satellite images, BRDF, Parametric RPV model, Digital surface model
journal: ISPRS Journal of Photogrammetry and Remote Sensing

1 Introduction

Over the past two decades, significant progress has been made in image processing algorithms. In particular, 3D surface reconstruction has benefited from high-resolution spatial data and algorithms such as semi-global stereo matching (SGM), which can generate detailed surface maps of urban and natural environments (hirschmuller:08:sgm; rosu2015measurement; mpd:06:sgm). However, state-of-the-art approaches still face challenges, particularly with radiometrically heterogeneous surfaces, complex reflectance functions, or diachronic acquisitions. Recent research has attempted to address these challenges by leveraging learning algorithms capable of modelling complex resemblance functions, given appropriate architectures and sufficient training datasets (PSMNet; chebbi2023deepsim; wu2024evaluation). At the same time, a new approach to surface reconstruction has emerged with Neural Radiance Fields (NeRF), which differs from other learning-based methods by operating in a self-supervised manner. It works on single pixels rather than patches, treats non-Lambertian surfaces and generates new synthetic views (Mildenhall20eccv_nerf). Besides, NeRF is capable of estimating the Bidirectional Reflectance Distribution Function (BRDF) of the surface at the same time. Understanding the BRDF of continental surfaces is crucial for a variety of applications, including land cover mapping, assessment of Earth’s radiation budget, climate change studies, vegetation density analysis, and intercalibration of spaceborne sensors (dumont2010high). However, due to the anisotropic nature of the Earth’s reflectance, estimation of the BRDF generally requires numerous angular measurements, which is challenging with a few satellite images.

The aim of this paper is to model the surface’s BRDF explicitly from sparse satellite views, and improve surface reconstruction, particularly over landscapes with anisotropic reflectance characteristics (e.g., bare soil, vegetation). We select NeRF as the algorithmic framework for its capacity to model angle dependent surface reflectance. Our BRDF-NeRF workflow (Figure 1) is designed for satellite acquisitions with only three synchronous views and incorporates the semi-empirical Rahman-Pinty-Verstraete (RPV) BRDF model, widely used in the remote sensing community to represent the BRDF of natural surfaces (rahman1993coupled). To the best of our knowledge, this work is the first to integrate a BRDF model into neural radiance fields for land surfaces. It extends our previous work on neural radiance fields with sparse optical satellite images (zhang2023spsnerf1). The open-source code is available at github.com/LulinZhang/BRDF-NeRF. The terms DSM and surface, as well as SGM and stereo matching are used interchangeably throughout this article.

Refer to caption
Figure 1: BRDF-NeRF Workflow. A few satellite RGB images, and the corresponding low-resolution depth maps calculated using a classical stereo matching algorithm are fed into BRDF-NeRF to predict the normals n, and the RPV parameters 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT, k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT, describing the surface reflectance. 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT represents the amplitude component, k controls the overall shape of the anisotropic behaviour, 𝚯𝚯\bm{\Theta}bold_Θ establishes the degree of forward or backward scattering, and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT allows to model the hotspot effect. The five parameters are integrated into a RPV renderer to generate the synthetic image. In the meantime, high resolution depths are obtained by accumulating the weights in the volume estimated by BRDF-NeRF.

2 Related works

2.1 Neural Radiance Fields

The vanilla NeRF (Mildenhall20eccv_nerf) leverages a large number of images captured with a pinhole camera to represent small-size scenes as particles that emit light, instead of reflecting light. Subsequent NeRF variants proposed to relax some of those defining constraints, without compromising the quality of the outputs, i.e., the synthesised images and the 3D model. In the following paragraphs we briefly discuss the state-of-the-art approaches relevant to our work.

NeRF from few views

Since NeRF relies solely on pixel values for network training, a large number of input images is essential for generating photo-realistic novel views. Attempts to train NeRF with sparse input images often result in overfitting and inaccurate estimation of scene depth, leading to artifacts in the rendered novel views. This limitation restricts the applicability of NeRF and prolongs training time. To address this challenge, efforts have been made to adapt NeRF to sparse input images by introducing various regularization priors. A common approach is to incorporate depth supervision, including sparse depth (deng2022depth; wang2022sparsenerf; Somraj_2023; guo2024depthguided) and dense depth (wei2021nerfingmvs; roessle2022dense; zhang2023spsnerf1). In addition, methods such as image features (yu2021pixelnerf) or semantic regularization (xu2022sinnerf) have also been explored.

NeRF and BRDF

While vanilla NeRF excels in view synthesis, it cannot relight or edit materials, due to its inability to decompose outgoing radiance into incoming radiance and surface material reflectance. Some researchers proposed to extend NeRF to incorporate information on the Bidirectional Reflectance Distribution Function (BRDF), which characterises how materials reflect light under different viewing and lighting conditions. The majority of BRDF-compatible NeRF variants, such as those proposed by (bi2020neural; srinivasan2020nerv; boss2021nerd; yang2022psnerf; verbin2022refnerf; mai2023neural), adopt some version of the microfacet BRDF model (walter2007microfacet). This model represents reflectance as the superposition of diffuse and specular components and typically includes a surface roughness parameter, influencing the appearance of the surface through a distribution of microfacet orientations. However, while microfacet BRDF models offer effective parameterization, they poorly model BRDF of natural surfaces (e.g., soil, vegetation) which exhibit a more or less marked backscattering behavior (hotspot effect). Additionally, data-driven BRDFs pre-trained on BRDF databases have been explored in NeRF (Zhang_2021). However, the databases consist of artificial materials and have been built in controlled environments. The spectral and directional optical properties of natural materials are often very different. Figure 2 shows the scattering patterns of the Lambertian, Microfacet and RPV models. The latter is an anisotropic BRDF model widely used in remote sensing and which we will use in this work.

Refer to caption
(a) Lambertian
Refer to caption
(b) RPV: back scattering
Refer to caption
(c) RPV: forward scattering
Refer to caption
(d) Microfacet
Refer to caption
(e) RPV: bell shape scattering
Refer to caption
(f) RPV: bowl shape scattering
Figure 2: Scattering Patterns. Lambertian (a), Microfacet (d) and RPV (b, c, e, f) BRDF models. RPV can be used to simulate complex scattering indices.

NeRF in Earth Observations

Earth observation community has mainly focused on adapting the initial NeRF’s design to meet the specificities of space imagery: changing shadows, dynamic scene due to asynchronous acquisitions, as well as sparse views. Shadow-NeRF (derksen2021shadow) pioneered the application of NeRFs to satellite images, where the authors have explicitly modelled the shadows of the scene by leveraging the Sun’s direction. Sat-NeRF(mari2022sat) extends Shadow-NeRF by replacing the pinhole camera model with an empirical push broom model (i.e., Rational Polynomial Coefficients) and modelling transient objects in the scene such as moving cars. EO-NeRF (Mari_2023_CVPR) employs a novel geometry-based shadow rendering, resulting in more accurate digital surface models (DSMs). SpS-NeRF (zhang2023spsnerf1) further adapted NeRF for scenarios with few satellite views by introducing spatial guidance within NeRF sampling, conditioned on low-resolution input depths. Sat-Mesh (qu2023sat) used a latent vector to deal with inconsistent appearances in satellite imagery, while SUNDIAL (behari2024sundial) proposed a secondary shadow ray casting technique to jointly learn satellite scene geometry, illumination components and Sun direction. SatensoRF (zhang2024satensorf) decomposed colour into ambient, diffuse and specular light. Season-NeRF (gableman2024incorporating) learned and rendered seasonal variations by incorporating time as an additional input variable. GC-NeRF (wan2024constraining) proposed a geometric loss to create a compact weight distribution around the surface. In addition, RS-NeRF (xie2023remote), SAT-NGP (billouard2024sat) and SatensoRF (zhang2024satensorf) addressed the computational efficiency of NeRF by accelerating the runtimes with hash encoding and voxel occupancy grids to sample points near the surface, as well as through tensor decomposition. Last but not least, Radar fields (ehret2023radar) extended NeRF to spaceborne synthetic aperture radar (SAR) images.

2.2 BRDF in remote sensing

Existing BRDF models

Numerous BRDF models have been developed to describe the spectral and directional reflectance of natural and artificial surfaces. They can be classified into physical, empirical and semi-empirical models. Physical models (pinty1991extracting) are based on rigorously defined physical parameters and offer the most accurate descriptions of observed scenes. However, a large number of multiangular observations are required to retrieve these parameters by model inversion, making them impractical for optically complex surfaces. Empirical models (walthall1985simple; minnaert1941reciprocity; shibayama1985view) are derived as simple statistical fits to observed data, and provide no additional insights into the surface type or structure. Semi-empirical models (rahman1993coupled; wanner1995derivation; hapke1981bidirectional; roujean1992bidirectional; lucht2000algorithm) employ specific mathematical functions to best represent the physical interactions between the radiation field and the surface. They accept a reduced number of parameters, which facilitate their inversion. The semi-empirical RPV model (rahman1993coupled) is among the most commonly used. It is capable of representing the reflectance of various natural surfaces with just four parameters (Figure 2), and has been used to address atmospheric radiation transfer problems (martonchik1998techniques; martonchik1998determination); classify forest types (koukal2014evaluation); simulate plant leaves reflectance (biliouris2009rpv); estimate BRF values under unmeasured illumination and viewing angles (lattanzio2007consistency); estimate surface albedo (martonchik1998techniques; martonchik1998determination; privette2002first); and identify surface properties (widlowski2001characterization; gao2003detecting).

Deriving BRDF

Most of the parameters controlling BRDF cannot be measured in the field, but are obtained by invertion of surface reflectance models on observations. To guarantee reliable estimates, the surface must be observed over a wide range of illumination/viewing angles. Laboratory and field measurements using goniophotometers have traditionally been used to measure reflectance (lv2016multi; sandmeier2000brdf; combes2007new). Over the last few decades, several spaceborne instruments have been designed to carry out multiangular observations, such as MISR, POLDER, MODIS, CHRIS/Proba, and VIIRS. These instruments have limited spatial resolutions, ranging from a few tens to a few hundreds of meters. (labarre2019retrieving) have inverted the Hapke model on a set of 21 multiangular Pleiades images, acquired at a spatial resolution of 2m. However, such acquisitions are rarely available and inversion with three or four images is ill-posed.

In this paper, we explore the potential for estimating the BRDF of natural surfaces using as few as three high-resolution multispectral optical images. Our approach offers new possibilities for studying solar radiation reflected from the Earth’s surface, taking advantage of the multiplicity, temporal coverage, and high spatial resolution of optical satellite imagery.

3 Radiance Fields with RPV Reflectance

We briefly introduce the vanilla NeRF architecture and discuss two key ingredients of our BRDF-NeRF: geometry modelling (depths and normals) as well as the radiometric rendering (RPV BRDF model). The BRDF-NeRF workflow is described in Figure 1.

Preliminaries

NeRF (Mildenhall20eccv_nerf) represents a continuous volumetric field of a static scene that emits light, optimized with a fully connected deep network. Given a 3D point x=(x,y,z)x𝑥𝑦𝑧\textbf{x}=(x,y,z)x = ( italic_x , italic_y , italic_z ) accompanied with a viewing angle wr=(dx,dy,dz)subscriptw𝑟subscript𝑑𝑥subscript𝑑𝑦subscript𝑑𝑧\textbf{w}_{r}=(d_{x},d_{y},d_{z})w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = ( italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ), NeRF predicts a volume density σ𝜎\sigmaitalic_σ and a colour c=(r,g,b)c𝑟𝑔𝑏\textbf{c}=(r,g,b)c = ( italic_r , italic_g , italic_b ). NeRF renders images by sampling N𝑁Nitalic_N query points along each camera ray and accumulating the colours with weights defined by density, and imposes the rendered images to be close to the training images. Each camera ray r is defined by an origin point o and a viewing direction vector wrsubscriptw𝑟\textbf{w}_{r}w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT such that r(t)=o+twrr𝑡o𝑡subscriptw𝑟\textbf{r}(t)=\textbf{o}+t\textbf{w}_{r}r ( italic_t ) = o + italic_t w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Each query point xisubscriptx𝑖\textbf{x}_{i}x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in r is defined as xi=o+tiwrsubscriptx𝑖osubscript𝑡𝑖subscriptw𝑟\textbf{x}_{i}=\textbf{o}+t_{i}\textbf{w}_{r}x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = o + italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, where tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT lies between the near and far bounds of the scene, tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. The rendered pixel value C(r)Cr\textbf{C}(\textbf{r})C ( r ) of ray r is calculated as follows:

C(r)=i=1NTiαici,C(r)superscriptsubscript𝑖1𝑁subscript𝑇𝑖subscript𝛼𝑖subscriptc𝑖\displaystyle\textbf{C(r)}=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}\textbf{c}_{i}}~{},C(r) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (1)

whith αi=1eσiδisubscript𝛼𝑖1superscript𝑒subscript𝜎𝑖subscript𝛿𝑖{\alpha}_{i}=1-e^{-{\sigma}_{i}{\delta}_{i}}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - italic_e start_POSTSUPERSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Ti=j=1i1(1αj)subscript𝑇𝑖superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗T_{i}=\prod_{j=1}^{i-1}{(1-{\alpha}_{j})}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and δi=ti+1tisubscript𝛿𝑖subscript𝑡𝑖1subscript𝑡𝑖{\delta}_{i}=t_{i+1}-t_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the opacity of the current query point xisubscriptx𝑖\textbf{x}_{i}x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the transmittance. The contribution of colour cisubscriptc𝑖\textbf{c}_{i}c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the accumulated colour C(r)Cr\textbf{C}(\textbf{r})C ( r ) increases with opacity and transmittance.

3.1 Geometric modelling

We incorporate geometric information to extend the applicability of BRDF-NeRF to sparse view acquisitions and to predict surface normals that are essential for accurate estimates of BRDF, as detailed in Section 3.2.

Depth supervision

Instead of querying ray points crossing the entire volume of the scene, as is the case in the vanilla NeRF, we narrow it down to a buffer space defined around the location of an approximately known surface. This tactic reduces ambiguity and enables reliable volume densities to be estimated with fewer images. We further encourage the depths predicted by NeRF to remain close to the input surface by using the following loss term introduced in (zhang2023spsnerf1):

depth(r)=rRsub(corr(r)(D(r)D¯(r))2,\mathcal{L}_{depth}(\textbf{r})=\sum_{\textbf{r}\in R_{sub}}(corr(\textbf{r})(% D(\textbf{r})-\overline{D}(\textbf{r}))^{2}~{},caligraphic_L start_POSTSUBSCRIPT italic_d italic_e italic_p italic_t italic_h end_POSTSUBSCRIPT ( r ) = ∑ start_POSTSUBSCRIPT r ∈ italic_R start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_c italic_o italic_r italic_r ( r ) ( italic_D ( r ) - over¯ start_ARG italic_D end_ARG ( r ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (2)

where D(r)𝐷r{D}(\textbf{r})italic_D ( r ) are the predicted depths calculated as D(r)=i=1NTiαiti𝐷rsuperscriptsubscript𝑖1𝑁subscript𝑇𝑖subscript𝛼𝑖subscript𝑡𝑖{D}(\textbf{r})=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}t_{i}}italic_D ( r ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, while the D¯(r)¯𝐷r\overline{D}(\textbf{r})over¯ start_ARG italic_D end_ARG ( r ) are the input depths obtained from stereo matching on low-resolution images. We have observed that the performance of stereo matching on low resolution images is marginally affected by a change in surface BRDF and can therefore provide sufficiently good depth initialisations for our radiance fields. The parameter corr(r), which corresponds to the similarity score obtained by stereo matching, acts as a weight or confidence. It adjusts the level of supervision, having a strong impact where confidence is high and a minimal impact where input depths are uncertain. Rsubsubscript𝑅𝑠𝑢𝑏R_{sub}italic_R start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT is a subset of rays that satisfy at least one of the following two conditions: (1) S(𝐫)>Σ(𝐫)𝑆𝐫Σ𝐫S(\mathbf{r})>\Sigma(\mathbf{r})italic_S ( bold_r ) > roman_Σ ( bold_r ); (2) |(D(r)D¯(r))|>Σ(𝐫)𝐷r¯𝐷rΣ𝐫\left|(D(\textbf{r})-\overline{D}(\textbf{r}))\right|>\Sigma(\mathbf{r})| ( italic_D ( r ) - over¯ start_ARG italic_D end_ARG ( r ) ) | > roman_Σ ( bold_r ), where S(r)2=i=1NTiαi(tiD(r))2𝑆superscriptr2superscriptsubscript𝑖1𝑁subscript𝑇𝑖subscript𝛼𝑖superscriptsubscript𝑡𝑖𝐷r2S(\textbf{r})^{2}=\sum_{i=1}^{N}{T_{i}{\alpha}_{i}(t_{i}-D(\textbf{r}))^{2}}italic_S ( r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_D ( r ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT represents the uncertainty of the predicted depth, and Σ(r)=1corr(r)Σr1𝑐𝑜𝑟𝑟r\Sigma(\textbf{r})=1-corr(\textbf{r})roman_Σ ( r ) = 1 - italic_c italic_o italic_r italic_r ( r ) represents the uncertainty of the input depth. In other words, depth supervision is only applied to rays for which the predicted depths are more uncertain than the input depths.

Surface normal

BRDF is a function that depends on both the incident and viewing angles, which are defined relative to the surface normal. Therefore, the surface normal is crucial to accurately recovering the BRDF. In NeRF, it can be derived as analytical or learned. The analytical normal is calculated as the negative of the normalized gradient of the density field σ𝜎\sigmaitalic_σ with respect to the spatial location x as n(x)=x(σ)x(σ)2n(x)subscript𝑥𝜎subscriptnormsubscript𝑥𝜎2\textbf{n(x)}=-\frac{\nabla_{x}(\sigma)}{\|\nabla_{x}(\sigma)\|_{2}}n(x) = - divide start_ARG ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_σ ) end_ARG start_ARG ∥ ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_σ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG (srinivasan2020nerv). The learned normal is predicted from a spatial MLP and can be supervised implicitly (bi2020neural) or with the analytical normal (verbin2022refnerf; li2022neural). Approaches relying on learnt normals led to smooth surfaces and a loss of detail in our case studies. Consequently, we chose to incorporate the analytical normal into our architecture because, despite its computational cost, it provides more accurate and better resolved normal vectors (srinivasan2020nerv).

3.2 Radiometric rendering

The geometric approach presented above guarantees decent 3D reconstructions of Lambertian scenes. Next, we adapt this approach to handle non-Lambertian natural surfaces by estimating a BRDF and incorporating it into the rendering Equation 1.

RPV equation

We estimate the reflectance of natural surfaces using the Rahman-Pinty-Verstraete (RPV) model (rahman1993coupled), a semi-empirical model well suited to satellite images (see Equation 3). We chose this model for its simplicity, its physics-based parameters and its ability to represent asymmetric BRDF, including the hotspot effect. The latter corresponds to a sharp increase in reflectance, which becomes maximum when the illumination and viewing directions are coincident.

In this model, the colour c of a surface point, defined by the normal vector n, the illumination direction wirsubscriptw𝑖𝑟\textbf{w}_{ir}w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and the viewing direction wrsubscriptw𝑟\textbf{w}_{r}w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (Figure 3), is calculated as the product of the incoming light Lirsubscript𝐿𝑖𝑟L_{ir}italic_L start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT, the cosine of the incident angle |wirn|subscriptw𝑖𝑟n|\textbf{w}_{ir}\cdot\textbf{n}|| w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT ⋅ n |, and the bidirectional reflectance factor simulated by RPV𝑅𝑃𝑉RPVitalic_R italic_P italic_V:

c(n,wir,wr)=Lir|wirn|RPV(n,wir,wr),cnsubscriptw𝑖𝑟subscriptw𝑟subscript𝐿𝑖𝑟subscriptw𝑖𝑟n𝑅𝑃𝑉nsubscriptw𝑖𝑟subscriptw𝑟\textbf{c}(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r})=L_{ir}\cdot|\textbf{w}_{% ir}\cdot\textbf{n}|\cdot RPV(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r}),c ( n , w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT , w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) = italic_L start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT ⋅ | w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT ⋅ n | ⋅ italic_R italic_P italic_V ( n , w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT , w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) , (3)

Lirsubscript𝐿𝑖𝑟L_{ir}italic_L start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT is set to a unit vector, and |wirn|subscriptw𝑖𝑟n|\textbf{w}_{ir}\cdot\textbf{n}|| w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT ⋅ n | is approximated by |wir[0,0,1]|subscriptw𝑖𝑟001|\textbf{w}_{ir}\cdot[0,0,1]|| w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT ⋅ [ 0 , 0 , 1 ] | because the analytical normal n is not sufficiently smooth. The RPV𝑅𝑃𝑉RPVitalic_R italic_P italic_V term can be broken down into an amplitude parameter 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT and three angle-dependent functions: modified Minnaert function M𝑀Mitalic_M, Henyey-Greensteon function FHGsubscript𝐹𝐻𝐺F_{HG}italic_F start_POSTSUBSCRIPT italic_H italic_G end_POSTSUBSCRIPT, and backscatter function H𝐻Hitalic_H:

RPV(n,wir,wr)=𝝆𝟎M(θir,θr,k)FHG(g,𝚯)H(𝝆𝒄,G),𝑅𝑃𝑉nsubscriptw𝑖𝑟subscriptw𝑟subscript𝝆0𝑀subscript𝜃𝑖𝑟subscript𝜃𝑟ksubscript𝐹𝐻𝐺𝑔𝚯𝐻subscript𝝆𝒄𝐺RPV(\textbf{n},\textbf{w}_{ir},\textbf{w}_{r})=\bm{\rho_{0}}\cdot M({\theta}_{% ir},{\theta}_{r},\textbf{k})\cdot F_{HG}(g,\bm{\Theta})\cdot H(\bm{\rho_{c}},G),italic_R italic_P italic_V ( n , w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT , w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) = bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT ⋅ italic_M ( italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , k ) ⋅ italic_F start_POSTSUBSCRIPT italic_H italic_G end_POSTSUBSCRIPT ( italic_g , bold_Θ ) ⋅ italic_H ( bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT , italic_G ) , (4)

with M(θir,θr,𝒌)=(cosθircosθr(cosθir+cosθr))𝒌1𝑀subscript𝜃𝑖𝑟subscript𝜃𝑟𝒌superscript𝑐𝑜𝑠subscript𝜃𝑖𝑟𝑐𝑜𝑠subscript𝜃𝑟𝑐𝑜𝑠subscript𝜃𝑖𝑟𝑐𝑜𝑠subscript𝜃𝑟𝒌1M({\theta}_{ir},{\theta}_{r},\bm{k})=(cos\theta_{ir}cos\theta_{r}(cos\theta_{% ir}+cos\theta_{r}))^{\bm{k}-1}italic_M ( italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_k ) = ( italic_c italic_o italic_s italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT italic_c italic_o italic_s italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_c italic_o italic_s italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT + italic_c italic_o italic_s italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT bold_italic_k - 1 end_POSTSUPERSCRIPT, FHG(g,𝚯)=(1𝚯2)(1+2𝚯cosg+𝚯2)3/2subscript𝐹𝐻𝐺𝑔𝚯1superscript𝚯2superscript12𝚯𝑐𝑜𝑠𝑔superscript𝚯232F_{HG}(g,\bm{\Theta})=(1-\bm{\Theta}^{2})\cdot(1+2\bm{\Theta}cosg+\bm{\Theta}^% {2})^{-3/2}italic_F start_POSTSUBSCRIPT italic_H italic_G end_POSTSUBSCRIPT ( italic_g , bold_Θ ) = ( 1 - bold_Θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ⋅ ( 1 + 2 bold_Θ italic_c italic_o italic_s italic_g + bold_Θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 3 / 2 end_POSTSUPERSCRIPT, H(𝝆𝒄,G)=1+(1𝝆𝒄)/(1+G)𝐻subscript𝝆𝒄𝐺11subscript𝝆𝒄1𝐺H(\bm{\rho_{c}},G)=1+(1-\bm{\rho_{c}})/(1+G)italic_H ( bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT , italic_G ) = 1 + ( 1 - bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT ) / ( 1 + italic_G ), and the geometric factor G=(tan2θir+tan2θr2tanθirtanθrcosΦ)1/2𝐺superscript𝑡𝑎superscript𝑛2subscript𝜃𝑖𝑟𝑡𝑎superscript𝑛2subscript𝜃𝑟2𝑡𝑎𝑛subscript𝜃𝑖𝑟𝑡𝑎𝑛subscript𝜃𝑟𝑐𝑜𝑠Φ12G=(tan^{2}\theta_{ir}+tan^{2}\theta_{r}-2tan\theta_{ir}tan\theta_{r}cos\Phi)^{% 1/2}italic_G = ( italic_t italic_a italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT + italic_t italic_a italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - 2 italic_t italic_a italic_n italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT italic_t italic_a italic_n italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_c italic_o italic_s roman_Φ ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. The illumination wirsubscriptw𝑖𝑟\textbf{w}_{ir}w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and viewing wrsubscriptw𝑟\textbf{w}_{r}w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT directions are decomposed into zenith angles θirsubscript𝜃𝑖𝑟{\theta}_{ir}italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and θrsubscript𝜃𝑟{\theta}_{r}italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, azimuth angles ΦirsubscriptΦ𝑖𝑟{\Phi}_{ir}roman_Φ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟{\Phi}_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, relative azimuth angle ΦΦ\Phiroman_Φ and phase angle g𝑔gitalic_g, all defined in a spherical coordinate system determined by the surface normal n (Figure 3).

Refer to caption
Figure 3: BRDF Defining Directions. Decomposition of the illumination wirsubscriptw𝑖𝑟\textbf{w}_{ir}w start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and viewing wrsubscriptw𝑟\textbf{w}_{r}w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT directions into θirsubscript𝜃𝑖𝑟{\theta}_{ir}italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT (solar zenith angle) and θrsubscript𝜃𝑟{\theta}_{r}italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (viewing zenith angle), ΦirsubscriptΦ𝑖𝑟\Phi_{ir}roman_Φ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT (solar azimuth angle) and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (viewing azimuth angle), g (phase angle) and ΦΦ\Phiroman_Φ (relative azimuth angle).

Detailed Description of RPV Model Input Parameters

The RPV parameter 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT in Equation 4 plays the role of pseudo-albedo. The modified Minnaert function controls the anisotropic behaviour of the surface using the parameter k. If k1k1\textbf{k}\approx 1k ≈ 1 the surface is quasi-Lambertian; if k<1k1\textbf{k}\textless 1k < 1 a bowl-shaped pattern dominates (reflectance values increase with the viewing zenith angle); and if k>1k1\textbf{k}\textgreater 1k > 1 a bell-shaped pattern dominates (reflectance values decrease with the viewing zenith angle) (widlowski2004canopy). The parameter 𝚯𝚯\bm{\Theta}bold_Θ of the Henyey-Greenstein function controls the amount of radiation scattered in the forward (0 𝚯absent𝚯absent\leq\bm{\Theta}\leq≤ bold_Θ ≤ 1) or backward (-1 𝚯absent𝚯absent\leq\bm{\Theta}\leq≤ bold_Θ ≤ 0) directions. The backscatter function H𝐻Hitalic_H is written as a function of a geometric factor G𝐺Gitalic_G and a parameter 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT, which represents the sharp increase in reflectance in the hotspot direction. When θir=θrsubscript𝜃𝑖𝑟subscript𝜃𝑟\theta_{ir}=\theta_{r}italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and Φir=ΦrsubscriptΦ𝑖𝑟subscriptΦ𝑟\Phi_{ir}=\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the geometric factor disappears and H𝐻Hitalic_H reaches its maximum value, contributing to increase total reflectance. When estimating the RPV model, the ranges of variation of the parameters 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT, k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT are fixed to [0, 1], [0, 2], [-1, 1] and [0, 1] (koukal2014evaluation).

To give the reader an intuition about how RPV parameters affect reflectance, we analyse the bidirectional reflectance function (BRF) of selected points in one of our datasets (Figure 4). The BRFs are plotted by varying the viewing directions (zenith and azimuth angles between [0, 90] and [0, 360]) and fixing the Sun’s direction to θir=52.1subscript𝜃𝑖𝑟superscript52.1\theta_{ir}=52.1^{\circ}italic_θ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT = 52.1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and Φir=142.5subscriptΦ𝑖𝑟superscript142.5\Phi_{ir}=142.5^{\circ}roman_Φ start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT = 142.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT. The pseudo-albedo of the selected surface point estimated by BRDF-NeRF is 𝝆𝟎=[0.122,0.105,0.091]subscript𝝆00.1220.1050.091\bm{\rho_{0}}=[0.122,0.105,0.091]bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT = [ 0.122 , 0.105 , 0.091 ], the normal n=[0, 0, 1]. Among the six combinations of (𝚯,𝐤,𝝆𝒄𝚯𝐤subscript𝝆𝒄\bm{\Theta},\mathbf{k},\bm{\rho_{c}}bold_Θ , bold_k , bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT) given in Table 1, backward scattering (Figure 4 (b)) was predicted by BRDF-NeRF. Note that this BRF is consistent with a result generated independently over the same area with 21 Pleiades views (labarre2019retrieving).

Refer to caption
(a) Point location
Refer to caption
(b) Backward
Refer to caption
(c) Forward
Refer to caption
(d) Bowl shape
Refer to caption
(e) Bell shape
Refer to caption
Refer to caption
(f) Hotspot effect
Figure 4: RPV Model Parameters – BRF. Reflectances displayed in polar coordinates correspond to point A shown in (a). They are generated by evaluating the RPV model over a range of viewing directions, where 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT and n are predicted by our BRDF-NeRF and fixed, while the Sun position is fixed and displayed as a white symbol. (b) backward scattering corresponding to the RPV parameters (𝚯,𝐤,𝝆𝒄𝚯𝐤subscript𝝆𝒄\bm{\Theta},\mathbf{k},\bm{\rho_{c}}bold_Θ , bold_k , bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT) found by BRDF-NeRF; (c) forward scattering obtained by modifying 𝚯𝚯\bm{\Theta}bold_Θ; (d-e) transition between bowl shaped and bell shaped scattering obtained by changing the 𝐤𝐤\mathbf{k}bold_k parameter; (f) hotspot effect with modified 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT. The combinations of the six parameters are provided in Table 1.
Param. Backward Forward Bowl shape Bell shape Hotspot effect
k 0.996 0.996 0.500 1.500 0.996 0.996
𝚯𝚯\bm{\Theta}bold_Θ -0.174 0.174 -0.174 -0.174 -0.174 -0.174
𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT 0.979 0.979 0.979 0.979 0.500 0
Table 1: RPV Model Parameter Sets. Different RPV parameters k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT corresponding to the reflectance spectra in Figure 4 (b-f).

3.3 Network architecture

Refer to caption
Figure 5: BRDF-NeRF Architecture. The input 3D locations x are sampled along the camera rays and fed into BRDF-NeRF to query density and RPV parameters. BRDF-NeRF consists of shared 8-layer spatial MLPs, 1-layer feature MLP and four 1-layer RPV MLPs. The 8-layer MLPs are followed by a softplus function to predict the density σ𝜎\sigmaitalic_σ, which is further used to calculate the analytical normal n. The RPV MLPs are concatenated with a sigmoid function to predict 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT, k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT. The elements within the dashed rectangle are pre-trained on the assumption of a Lambertian surface for the first 20% of the total training steps. The colour c is predicted with the RPV rendering equation in Equation 3, where Wirsubscript𝑊𝑖𝑟W_{ir}italic_W start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT and Wrsubscript𝑊𝑟W_{r}italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT represent the Sun and camera directions, respectively.

The network architecture presented in Figure 5 consists of two progressively trained components: the geometric part and the RPV part. Initially, the geometric part is pre-trained on the assumption of a Lambertian surface. After this pre-training phase (i.e., similar-to\sim20% of the total duration of the training time), the RPV part is introduced. Three MLPs predicting k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT as well as the analytical normal n engage in training, while the albedo 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT is finetuned to match the pseudo-albedo of our new rendering equation (see Equation 3). The separation of the geometry part from the RPV part, which handles the case of non-Lambertian surfaces, ensures that the final training stage works on well-initialised normal vectors.

The input spatial locations x are transformed by positional encoding, the activations for 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT, k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT are sigmoid function, and the k and 𝚯𝚯\bm{\Theta}bold_Θ are scaled to [0, 2] and [-1, 1] to match their real value ranges (koukal2014evaluation). Ultimately, all the parameters of the BRDF-NeRF are optimized to minimise the combination of (1) the colour loss between the ground truth pixel colour 𝐂¯(r)¯𝐂r\textbf{$\overline{\mathbf{C}}$}(\textbf{r})over¯ start_ARG bold_C end_ARG ( r ) and the predicted pixel colours C(r)Cr\textbf{C}(\textbf{r})C ( r ), and (2) the depth loss depth(r)subscript𝑑𝑒𝑝𝑡r\mathcal{L}_{depth}(\textbf{r})caligraphic_L start_POSTSUBSCRIPT italic_d italic_e italic_p italic_t italic_h end_POSTSUBSCRIPT ( r ) (Equation 2):

=rRC(r)𝐂¯(r)22+λdepth(r),subscriptr𝑅superscriptsubscriptnormCr¯𝐂r22𝜆subscript𝑑𝑒𝑝𝑡r\mathcal{L}=\sum_{\textbf{r}\in R}\left\|\textbf{C}(\textbf{r})-\textbf{$% \overline{\mathbf{C}}$}(\textbf{r})\right\|_{2}^{2}+\lambda\mathcal{L}_{depth}% (\textbf{r})~{},caligraphic_L = ∑ start_POSTSUBSCRIPT r ∈ italic_R end_POSTSUBSCRIPT ∥ C ( r ) - over¯ start_ARG bold_C end_ARG ( r ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ caligraphic_L start_POSTSUBSCRIPT italic_d italic_e italic_p italic_t italic_h end_POSTSUBSCRIPT ( r ) , (5)

where λ𝜆\lambdaitalic_λ is a weight balancing the contribution of colour and depth. See our experiment on finding optimal λ𝜆\lambdaitalic_λ in Section 4.6.

4 Numerical experiments

We conduct a series of experiments to evaluate our BRDF-NeRF on novel view synthesis (Section 4.4) and altitude estimation (Section 4.5) tasks. In addition, we examine the influence of atmospheric correction from TOA (Top Of Atmosphere) to TOC (Top Of Canopy) (Section 4.3) and carry out ablation studies to determine the best training strategy (Section 4.6), and the most optimal way of rendering (Section 4.6).

Evaluation Metrics

Precision metrics are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index measure (SSIM) (wang2004image) for view synthesis, and Mean Altitude Error (MAE) for altitude extraction. Ground truth (GT) images are true images not seen during training, while the GT surface is a DSM generated with high-resolution Pleiades panchromatic images, with a ground sampling distance (GSD𝐺𝑆𝐷GSDitalic_G italic_S italic_D) of 0.5 m, and stereo image matching (mpd:06:sgm). BRDF-NeRF is also compared to competitions Sat-NeRF (mari2022sat), SpS-NeRF (zhang2023spsnerf1) as well as DSMs generated with SGM using full-resolution images (i.e., SGMZ1𝑆𝐺subscript𝑀𝑍1SGM_{Z1}italic_S italic_G italic_M start_POSTSUBSCRIPT italic_Z 1 end_POSTSUBSCRIPT with GSD=2𝐺𝑆𝐷2GSD=2italic_G italic_S italic_D = 2m). The source code for EO-NeRF (Mari_2023_CVPR) was not available for comparison. The RPV parameters are indirectly validated by examining quantitative metrics for novel view synthesis. Additionally, one can compare the BRF in Figure 4 (b) with an independent result derived from 21 Pléiades images in (labarre2019retrieving).

4.1 Implementation Details

Training

Our network is trained with the Adam Optimizer (lr=5e-4, decay=0.9, batch size=1024). We use SpS-NeRF’s ray sampling strategy to sample 64 stratified points along each ray, accompanied with 64 guided points following a Gaussian distribution. We optimize BRDF-NeRF for 100k iterations, which takes similar-to\sim10 hours on NVIDIA GPU with 40GB RAM. For fair comparison, the competitive methods (i.e., Sat-NeRF and SpS-NeRF) are also trained for 100k iterations with 64 + 64 points along each ray, which takes similar-to\sim6 hours. BRDF-NeRF is less efficient in training than competitive methods, mainly due to the normal analytical computation. Computational efficiency was not the aim of this work and could be improved in the future with techniques such as tensor decomposition.

Light visibility

It is important to take into account the visibility of samples when scenes contain occlusions, for example when acquiring images in mountainous or urban areas. Visibility describes the transmission of a sample between the light source and the query point. Brute-force computation of the light visibility is computationally expensive, as it requires marching rays from all the query points along the camera ray to the light source. Previous work treats the light visibility with different strategies: (1) assuming that light visibility is the same everywhere (mai2023neural; boss2021nerd); in this scenario, a shadow is embedded into the albedo colour; (2) with a fully analytical approach provided that the camera-light configuration is collocated, i.e., the light rays are aligned with the camera rays (bi2020neural); (3) with a semi-analytical approach where visibility is calculated by ray tracing from the light source to the estimated surface (Zhang_2021; li2022neural; yang2022s; Mari_2023_CVPR); (4) with MLP-learnt visibility, supervised by photometric loss (verbin2022refnerf), or by light visibility calculated by ray tracing (srinivasan2020nerv; yang2022psnerf; derksen2021shadow; mari2022sat).

The fully analytical strategy is intractable in situations where light and view are not collocated, while the semi-analytical strategy can lead to noisy results, especially when few input views are used. The learning-based approach gives the network superfluous freedom that could confuse light visibility with other phenomena, such as albedo colour. Since our main objective is to model natural surfaces, such as bare soil, from aerial images, we can safely assume that the light visibility is constant everywhere, i.e., equal to 1.

4.2 Dataset

Evaluations are carried out on two sites (Djibouti, Lanzhou). We extract two regions of interest (similar-to\sim 1.5 km ×\times× 1.5 km) in each site, which we refer to as A𝐴Aitalic_A and B𝐵Bitalic_B (e.g., Dji-A and Dji-B). In each dataset, we distinguish three test scenarios ranging from easy (novel view interpolation) to very hard (novel view extrapolation). The input images are RGB with a GSD of 2m and undergo atmospheric correction prior to processing (see Section 4.3). Sadly, we could not perform experiments on the open DFC dataset (bosch2019semantic) representing urban scenes since RPV is designed for natural surfaces.

Djibouti Dataset

It is located in the Asal-Ghoubbet rift (Republic of Djibouti) (labarre2019retrieving) and consists of 21 multiangular Pleiades 1B images collected in a single flyby on January 26, 2013. Three quasi-nadir images are chosen for training, and three other images for testing, including interpolation and extrapolation scenarios (Figure 6 (a)).

Lanzhou Dataset

It is located in Lanzhou (China) and consists of 3 Pleiades 1B images acquired on April 23, 2013 and 3 Pleiades 1A images acquired on June 29, 2013. This is a multi-date dataset where the position of the Sun changes between acquisitions (Figure 6 (b)).

Refer to caption
(a) Djibouti
Refer to caption
(b) Lanzhou
Figure 6: Testing Scenarios – Sun and Viewing Angles. The angles are displayed in polar coordinate systems, where the radius and angular coordinate represent zenith and azimuth, respectively. We design up to three test scenarios, which differ in the viewing angle of the test images. Easy corresponds to a case where the test images are interpolated from the training images, while hard and very hard correspond to extrapolation. Three training images are used in Djibouti, four in Lanzhou (two of each epoch).

4.3 The effect of atmospheric correction

Atmospheric correction is important for two reasons. Firstly, the atmosphere affects the signal received by the satellite sensor. Secondly, the RPV model, like all BRDF models, describes TOC reflectances whereas Pleiades images are typically supplied as calibrated at TOA. We apply the atmospheric correction using an Orfeo ToolBox (OTB) software library (OTB2002) which is based on the 6S radiative transfer code (vermote1997second). The correction model requires four atmospheric parameters: ozone content UO3subscript𝑈𝑂3U_{O3}italic_U start_POSTSUBSCRIPT italic_O 3 end_POSTSUBSCRIPT (cm-atm), water vapor content UH2Osubscript𝑈𝐻2𝑂U_{H2O}italic_U start_POSTSUBSCRIPT italic_H 2 italic_O end_POSTSUBSCRIPT (g/cm2), aerosol optical thickness τAsubscript𝜏𝐴\tau_{A}italic_τ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT (unitless) and atmospheric pressure Pasubscript𝑃𝑎P_{a}italic_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT (hPa). The first three parameters are estimated from ancillary datasets corresponding to the day of image acquisition, available on NASA’s Earth observation website (https://neo.gsfc.nasa.gov/). The atmospheric pressure is approximated using the formula Pa=1013.25(10.0065Z/288.15)5.31subscript𝑃𝑎1013.25superscript10.0065𝑍288.155.31P_{a}=1013.25\cdot(1-0.0065\cdot Z/288.15)^{5.31}italic_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 1013.25 ⋅ ( 1 - 0.0065 ⋅ italic_Z / 288.15 ) start_POSTSUPERSCRIPT 5.31 end_POSTSUPERSCRIPT, where Z𝑍Zitalic_Z is the surface altitude expressed in meter. The four parameters, along with the adjacency radius for both epochs in the Lanzhou dataset, are shown in Table 2. The Djibouti dataset had been corrected for atmosphere by the satellite image provider.

Figure 7 shows an example of comparison between input images with and without performing atmospheric correction. Images tones between different epochs become more similar with atmospheric correction. Figure 8 and Table 3 compared results based on images with and without atmospheric correction on novel view synthesis and altitude estimation. Results without atmospheric correction gained generally worse metrics and displayed artifacts in synthetic images.

Epoch UO3subscript𝑈𝑂3U_{O3}italic_U start_POSTSUBSCRIPT italic_O 3 end_POSTSUBSCRIPT UH2Osubscript𝑈𝐻2𝑂U_{H2O}italic_U start_POSTSUBSCRIPT italic_H 2 italic_O end_POSTSUBSCRIPT τAsubscript𝜏𝐴\tau_{A}italic_τ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT Pasubscript𝑃𝑎P_{a}italic_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT adjacency
(cm-atm) (g/cm2) (unitless) (hPa) radius (-)
23/04/2013 0.3220 1.7333 0.4665 783 1.0
29/06/2013 0.2969 2.5625 0.0980 783 1.0
Table 2: Atmospheric Correction Parameters. Values for ozone content UO3subscript𝑈𝑂3U_{O3}italic_U start_POSTSUBSCRIPT italic_O 3 end_POSTSUBSCRIPT, water vapor content UH2Osubscript𝑈𝐻2𝑂U_{H2O}italic_U start_POSTSUBSCRIPT italic_H 2 italic_O end_POSTSUBSCRIPT, aerosol optical thickness τAsubscript𝜏𝐴\tau_{A}italic_τ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and atmospheric pressure Pasubscript𝑃𝑎P_{a}italic_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, as well as the adjacency radius used for atmospheric correction in the Lanzhou dataset.
Refer to captionRefer to caption
Refer to captionRefer to caption
(a) Without atmospheric correction
Refer to captionRefer to caption
Refer to captionRefer to caption
(b) With atmospheric correction
Figure 7: Atmospheric Correction – Visualisations. Comparison between input images without and with atmospheric correction for the Lzh-A multi-date dataset. The similarity of images tones between different epochs is improved after atmospheric correction.
Method AC MAE \downarrow PSNR \uparrow SSIM \uparrow
Easy Hard Easy Hard
SpS-NeRF 3.697 28.282 26.131 0.880 0.858
SpS-NeRF 3.558 29.548 24.755 0.965 0.900
BRDF-NeRF 3.439 33.315 29.857 0.946 0.923
BRDF-NeRF 3.420 32.196 28.420 0.979 0.94
Table 3: Ablation of Atmospheric Correction – Quantitative Evaluation on Lzh-A. Experiments with and without atmospheric correction (AC). BRDF-NeRF performs best, with the smallest MAE and biggest SSIM. Although BRDF-NeRF shows slightly higher PSNR, qualitative visualisation in Figure 8(b) reveal stripe artefacts, which are not present in BRDF-NeRF in Figure 10(e).
Refer to caption
(a) SpS-NeRF
Refer to caption
(b) BRDF-NeRF
Refer to caption
(c) GT
Refer to caption
(d) SpS-NeRF
Refer to caption
(e) BRDF-NeRF
Refer to caption
(f) GT
Figure 8: Ablation of Atmospheric Correction – View Synthesis and Altitude Estimation Visualisation. Results on the Lzh-A site using images without atmospheric correction (AC). The view synthesis is displayed in the first line, and the altitude estimation in the second. BRDF-NeRF synthesises images with fewer artifacts than SpS-NeRF and estimates surface closer to GT. Performance is further improved on images with AC (Figure 10 (e) and Figure 12 (e)).

4.4 Novel view synthesis

Quantitative metrics are presented in Table 4 while qualitative visualisations are provided in Figures 9 and 10. The visualisations here are limited to the most challenging scenario (very hard for Djibouti and hard for Lanzhou) due to space limitation. Our BRDF-NeRF outperforms Sat-NeRF and SpS-NeRF. Among competitive methods, SpS-NeRF produces less blurry renderings than Sat-NeRF (compare Figure 9 (a) and (c) as well as Figure 10 (b) and (d)). Nevertheless, both methods produce minimal hallucination effects (Figure 9 (a), Figure 10 (b,d)). The two competitive methods remain less sharp and far from the colour tone of the NeRF which includes our realistic RPV-based BRDF (compare Figure 9 (c) and (e)). PSNR and SSIM metrics are best for BRDF-NeRF, followed by SpS-NeRF in second place. The margins between BRDF-NeRF and competitive approaches increase from easy to very hard mode, indicating greater robustness of BRDF-NeRF. From single to multiple epoch datasets (i.e., from Djibouti to Lanzhou), the image quality rendered by Sat-NeRF and SpS-NeRF decreased significantly, while BRDF-NeRF  recovered photorealistic images in both cases.

Metric Method Dji-A Dji-B Lzh-A Lzh-B
PSNR Sat-NeRF 32.747 31.818 29.979 25.470
(Easy) SpS-NeRF 38.832 38.174 29.548 30.904
\uparrow BRDF-NeRF 41.844 40.823 32.196 32.165
PSNR Sat-NeRF 25.542 23.699 25.963 20.811
(Hard) SpS-NeRF 28.348 27.468 24.755 24.148
\uparrow BRDF-NeRF 36.232 35.448 28.420 27.814
PSNR Sat-NeRF 23.581 21.288 / /
(VHard) SpS-NeRF 23.144 22.31 / /
\uparrow BRDF-NeRF 33.35 32.376 / /
SSIM Sat-NeRF 0.927 0.950 0.962 0.925
(Easy) SpS-NeRF 0.975 0.979 0.965 0.970
\uparrow BRDF-NeRF 0.985 0.988 0.979 0.975
SSIM Sat-NeRF 0.766 0.825 0.909 0.760
(Hard) SpS-NeRF 0.840 0.887 0.900 0.928
\uparrow BRDF-NeRF 0.957 0.965 0.94 0.953
SSIM Sat-NeRF 0.676 0.768 / /
(VHard) SpS-NeRF 0.614 0.72 / /
\uparrow BRDF-NeRF 0.918 0.942 / /
Sat-NeRF 12.85 18.059 61.299 27.489
MAE SpS-NeRF 1.438 1.761 3.558 3.235
\downarrow BRDF-NeRF 1.378 1.614 3.42 2.941
SGMZ1 1.061 1.052 1.409 1.220
Table 4: Quantitative Evaluation. The best and second best performing metrics are in blue and magenta. For each dataset, we train a BRDF-NeRF to render three images in easy, hard (and very hard when existing) modes at the same time. BRDF-NeRF achieves better PSNR, SSIM and MAE than Sat-NeRF and SpS-NeRF. However, BRDF-NeRF has higher MAEs than SGMZ1, which we attribute to NeRF’s design that handles pixels individually without taking context into account.
Refer to caption
Refer to caption
(a) Sat-NeRF Dji-A
Refer to caption
Refer to caption
(b) Sat-NeRF Dji-B
Refer to caption
Refer to caption
(c) SpS-NeRF Dji-A
Refer to caption
Refer to caption
(d) SpS-NeRF Dji-B
Refer to caption
Refer to caption
(e) BRDF-NeRF Dji-A
Refer to caption
Refer to caption
(f) BRDF-NeRF Dji-B
Refer to caption
Refer to caption
(g) GT Dji-A
Refer to caption
Refer to caption
(h) GT Dji-B
Figure 9: Novel View Synthesis – Djibouti Dataset. Renderings of sites A and B of the Djibouti dataset in the very hard scenario (Figure 6). Sat-NeRF and SpS-NeRF renderings are less sharp and detailed than BRDF-NeRF. Note that the colour tone of BRDF-NeRF is closest to that of ground truth (GT).
Refer to caption
Refer to caption
(a) Sat-NeRF Lzh-A
Refer to caption
Refer to caption
(b) Sat-NeRF Lzh-B
Refer to caption
Refer to caption
(c) SpS-NeRF Lzh-A
Refer to caption
Refer to caption
(d) SpS-NeRF Lzh-B
Refer to caption
Refer to caption
(e) BRDF-NeRF Lzh-A
Refer to caption
Refer to caption
(f) BRDF-NeRF Lzh-B
Refer to caption
Refer to caption
(g) GT Lzh-A
Refer to caption
Refer to caption
(h) GT Lzh-B
Figure 10: Novel View Synthesis – Lanzhou Dataset. Renderings of sites A and B of the multi-date Lanzhou dataset in the hard scenario. Hallucination artefacts are revealed in both Sat-NeRF and SpS-NeRF but are more pronounced in the latter. Despite this, SpS-NeRF renders sharper surface details than Sat-NeRF. BRDF-NeRF generates images with fine details and much less artifacts.

4.5 Altitude estimation

Quantitative metrics are presented in Table 4, whereas qualitative visualisations are provided in Figures 11 and 12. Sat-NeRF, which was not designed for scenarios with few images, estimates surface altitudes that are tens of meters away from the ground truth surface. SpS-NeRF performs better, thanks to the dense depth supervision, as opposed to supervision with sparse points in Sat-NeRF. Visual assessment reveals that altitudes predicted by Sat-NeRF are either flat (Figure 11 (a-b)), or contain a made up pattern (Figure 12 (a)). SpS-NeRF altitudes are more faithful to ground truth but remain noisy. Our BRDF-NeRF outperforms both versions of NeRF, producing less noisy surfaces while retaining detail.

Compared with surfaces obtained with the classical stereo matching (mpd:06:sgm), BRDF-NeRF appears smoother and less detailed (compare Figure 12 (f) and (h)) in areas with good texture. However, on poorly textured areas where stereo matching is challenging, our BRDF-NeRF predicts coherent altitudes (compare Figure 11 (e) and (g)). Note also that our ground truth surface was generated with stereo matching algorithms, thus the comparison is possibly slightly biasing the MAE metric in favour of the stereo matching surface.

Refer to caption
Refer to caption
(a) Sat-NeRF Dji-A
Refer to caption
Refer to caption
(b) Sat-NeRF Dji-B
Refer to caption
Refer to caption
(c) SpS-NeRF Dji-A
Refer to caption
Refer to caption
(d) SpS-NeRF Dji-B
Refer to caption
Refer to caption
(e) BRDF-NeRF Dji-A
Refer to caption
Refer to caption
(f) BRDF-NeRF Dji-B
Refer to caption
Refer to caption
(g) SGMZ1 Dji-A
Refer to caption
Refer to caption
(h) SGMZ1 Dji-B
Refer to caption
Refer to caption
(i) GT Dji-A
Refer to caption
Refer to caption
(j) GT Dji-B
Figure 11: Altitude Estimation – Djibouti Dataset. Sat-NeRF fails to recover the altitudes. SpS-NeRF recovers noisy altitudes. The noise is suppressed to some extent in BRDF-NeRF. SGMZ1 generates generally more detailed surfaces, but underperforms in weakly textured areas (see green rectangle).
Refer to caption
Refer to caption
(a) Sat-NeRF Lzh-A
Refer to caption
Refer to caption
(b) Sat-NeRF Lzh-B
Refer to caption
Refer to caption
(c) SpS-NeRF Lzh-A
Refer to caption
Refer to caption
(d) SpS-NeRF Lzh-B
Refer to caption
Refer to caption
(e) BRDF-NeRF Lzh-A
Refer to caption
Refer to caption
(f) BRDF-NeRF Lzh-B
Refer to caption
Refer to caption
(g) SGMZ1 Lzh-A
Refer to caption
Refer to caption
(h) SGMZ1 Lzh-B
Refer to caption
Refer to caption
(i) GT Lzh-A
Refer to caption
Refer to caption
(j) GT Lzh-B
Figure 12: Altitude Estimation – Lanzhou Dataset. Sat-NeRF fails to recover the altitudes. Surfaces obtained by SpS-NeRF and BRDF-NeRF are similar, with SpS-NeRF manifesting slightly noisier altitude predictions and a hallucination artefact. SGMZ1 recovers surfaces with more detailed topography but more noise than SpS-NeRF and BRDF-NeRF.

4.6 Ablations

Training strategy

Our BRDF-NeRF model is trained progressively to ensure proper initialisation of the geometry (i.e., density weights in NeRF) before learning the RPV parameters. We perform an ablation with different pre-training strategies to determine the optimal point at which the transition from Lambertian to RPV model should occur. In the Preno approach, the entire network is trained without pre-training. In the Presho, Premed and Prelon approaches, we start from the Lambertian assumption and switch to the RPV model at different training steps, as shown in Figure 13. The initial learning rate is set to 5e4 and decreases to 3.65e5, 2.15e5 and 1.27e5, respectively.

Qualitative results are presented in Figures 14 and 15, while quantitative metrics are provided in Table 5. The absence of pre-training leads to blurry synthetic images and noisy altitude estimations. Performance differences between Presho, Premed, and Prelon are minor, with Premed emerging as the most optimal choice.

Depth Loss Weighting

We perform an ablation experiment to evaluate the contribution of the depth loss term. Removing the term entirely results in very poor altitude predictions, confirming that a simple NeRF architecture is unable to learn from just three views. By increasing the weight in the range [13,503]13503[\frac{1}{3},\frac{50}{3}][ divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 50 end_ARG start_ARG 3 end_ARG ], altitude metrics improve consistently (see Figure 15). The PSNR and SSIM metrics corresponding to the synthetic image quality reach a maximum at λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG, suggesting that assigning greater importance to depths compromises the rendering quality. We set the λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG in all our experiments.

Surface or volume rendering

The rendering equation in Equation 3 can be applied as surface rendering (Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT) or volume rendering (Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT). In surface rendering, the RPV parameters n, 𝝆𝟎subscript𝝆0\bm{\rho_{0}}bold_italic_ρ start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT, k, 𝚯𝚯\bm{\Theta}bold_Θ and 𝝆𝒄subscript𝝆𝒄\bm{\rho_{c}}bold_italic_ρ start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT are estimated at the surface by accumulating N𝑁Nitalic_N points along the ray and applying the rendering once for each ray. In volume rendering, rendering is applied to each sample individually, associating it with a color c, and accumulation is done on the sample colours instead of RPV parameters. Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT is more rigorous than Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT, as the latter assumes every point along the ray follow the RPV reflectance, while Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT assumes the same only for points on the surface which is concordant wit the RPV model definition. We demonstrate in Table 6 and Figure 16 that Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT outperforms Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT and adopt this rendering method in our experiments.

Refer to caption
Figure 13: Training Strategies. Fours training strategies of BRDF-NeRF from no pre-training to long pre-training.
Refer to caption
Refer to caption
(a) Preno, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(b) Presho, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(c) Premed, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(d) PreLon, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(e) Premed, λ=0𝜆0\lambda=0italic_λ = 0
Refer to caption
Refer to caption
(f) Premed, λ=13𝜆13\lambda=\frac{1}{3}italic_λ = divide start_ARG 1 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(g) Premed, λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(h) GT
Figure 14: Training Strategies – View Synthesis Visualisation. Different pre-training strategies are tested on the very hard scenario of Dji-A. Preno generates a blurry image; Presho, Premed and Prelon synthesised images that are close to GT, while Premed shows advantages in restoring better details, particularly in the zoom-in view labeled in the orange rectangle. λ=0𝜆0\lambda=0italic_λ = 0 recovers blurry image, λ=13𝜆13\lambda=\frac{1}{3}italic_λ = divide start_ARG 1 end_ARG start_ARG 3 end_ARG and λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG synthesised novel views with unified image tone, but zoom-in view shows blurrier detail than λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG.
Refer to caption
Refer to caption
(a) Preno, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(b) Presho, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(c) Premed, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(d) Prelon, λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(e) Premed, λ=0𝜆0\lambda=0italic_λ = 0
Refer to caption
Refer to caption
(f) Premed, λ=13𝜆13\lambda=\frac{1}{3}italic_λ = divide start_ARG 1 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(g) Premed, λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG
Refer to caption
Refer to caption
(h) GT
Figure 15: Training Strategies – Altitude Estimation Visualisation on Dji-A sites, using different training strategy for BRDF-NeRF. Preno’s surface is very noisy; Presho, Premed and Prelon generate similar surfaces, among which Presho is the noisiest. λ=0𝜆0\lambda=0italic_λ = 0 fails to recover topographic reliefs due to a lack of depth supervision. Performance is improved in λ=13𝜆13\lambda=\frac{1}{3}italic_λ = divide start_ARG 1 end_ARG start_ARG 3 end_ARG, with noisy details. λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG estimates surface smoother than λ=103𝜆103\lambda=\frac{10}{3}italic_λ = divide start_ARG 10 end_ARG start_ARG 3 end_ARG, thanks to the good quality of input depth.
Pre λ𝜆\lambdaitalic_λ MAE \downarrow PSNR \uparrow SSIM \uparrow
Easy Hard VHard Easy Hard VHard
no 103103\frac{10}{3}divide start_ARG 10 end_ARG start_ARG 3 end_ARG 1.632 38.057 33.446 31.186 0.966 0.93 0.884
sho 1.449 41.166 35.36 32.999 0.983 0.951 0.913
med 1.378 41.844 36.232 33.35 0.985 0.957 0.918
lon 1.39 41.415 35.645 31.663 0.984 0.947 0.88
med λ=0𝜆0\lambda=0italic_λ = 0 9.432 39.408 34.755 31.870 0.973 0.941 0.900
λ=13𝜆13\lambda=\frac{1}{3}italic_λ = divide start_ARG 1 end_ARG start_ARG 3 end_ARG 1.877 40.894 35.822 33.318 0.982 0.951 0.914
λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG 1.353 41.109 34.915 32.107 0.983 0.949 0.908
Table 5: Training Strategies – Quantitative Evaluation. Pre refers to the various training settings shown in Figure 13, while λ𝜆\lambdaitalic_λ is the parameter that balances the contribution of colour and depth losses in Equation 5. The best and second best performing metrics are in blue and magenta. Premed achieved the best PSNR and SSIM and the second best MAE. λ=503𝜆503\lambda=\frac{50}{3}italic_λ = divide start_ARG 50 end_ARG start_ARG 3 end_ARG ranks the best for MAE, but has worse PSNR and SSIM than Premed. Tests correspond to Dji-A dataset.
Refer to caption
Refer to caption
(a) Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT novel view
Refer to caption
Refer to caption
(b) Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT surface
Refer to caption
Refer to caption
(c) Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT novel view
Refer to caption
Refer to caption
(d) Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT surface
Refer to caption
Refer to caption
(e) GT novel view
Refer to caption
Refer to caption
(f) GT surface
Figure 16: Rendering – Visualisations. Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT rendered novel view with similar quality to Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT, while zoom-in shows blurrier details. The surface qualities of Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT and Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT are similar with no obvious differences.
MAE\downarrow PSNR\uparrow SSIM\uparrow
Easy Hard VHard Easy Hard VHard
Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT 1.399 41.743 35.867 31.770 0.985 0.955 0.903
Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT 1.378 41.844 36.232 33.350 0.985 0.957 0.918
Table 6: Rendering – Quantitative Evaluation of Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT and Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT. Renvol𝑅𝑒subscript𝑛𝑣𝑜𝑙Ren_{vol}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_v italic_o italic_l end_POSTSUBSCRIPT produces new views and surfaces close to Rensur𝑅𝑒subscript𝑛𝑠𝑢𝑟Ren_{sur}italic_R italic_e italic_n start_POSTSUBSCRIPT italic_s italic_u italic_r end_POSTSUBSCRIPT with slightly poorer metrics overall. Tests correspond to Dji-A dataset.

5 Conclusion

We presented BRDF-NeRF, an extension of NeRF adapted to sparse satellite imagery, capable of estimating realistic BRDFs for natural surfaces. By incorporating the semi-empirical Rahman-Pinty-Verstraete (RPV) model, BRDF-NeRF enhances the rendering of anisotropic surface reflectance, leading to improved quality of both synthetic images and recovered surface altitudes. Although our experiments show promising results, certain limitations remain. At present, our method does not explicitly model shading effects, and although our surface reconstructions outperform other NeRF-based approaches, they still lack the regularity achieved by SGM. Future work will address these challenges.

6 Acknowledgements

This research was funded by CNES (Centre national d’études spatiales) through a PostDoc scholarship as well as CAROLInA/SURFACEs projects. The Djibouti dataset was obtained through the CNES ISIS framework. Numerical computations were carried out on the SCAPAD cluster at the Institut de Physique du Globe de Paris. We thank Arthur Delorme for the Lanzhou dataset. We also thank Manchun Lei and Antoine Lucas for fruitful discussions.