Online Proximal ADMM for Graph Learning
from Streaming Smooth Signals thanks: This research was supported by the IEEE SPS ME-UYR program.

Hector Chahuara and Gonzalo Mateos Department of Electrical Engineering, Pontificia Universidad Católica del Perú, Lima, Perú
Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
Abstract

Graph signal processing deals with algorithms and signal representations that leverage graph structures for multivariate data analysis. Often said graph topology is not readily available and may be time-varying, hence (dynamic) graph structure learning from nodal (e.g., sensor) observations becomes a critical first step. In this paper, we develop a novel algorithm for online graph learning using observation streams, assumed to be smooth on the latent graph. Unlike batch algorithms for topology identification from smooth signals, our modus operandi is to process graph signals sequentially and thus keep memory and computational costs in check. To solve the resulting smoothness-regularized, time-varying inverse problem, we develop online and lightweight iterations built upon the proximal variant of the alternating direction method of multipliers (ADMM), well known for its fast convergence in batch settings. The proximal term in the topology updates seamlessly implements a temporal-variation regularization, and we argue the online procedure exhibits sublinear static regret under some simplifying assumptions. Reproducible experiments with synthetic and real graphs demonstrate the effectiveness of our method in adapting to streaming signals and tracking slowly-varying network connectivity. The proposed approach also exhibits better tracking performance (in terms of suboptimality), when compared to state-of-the-art online graph learning baselines.

Index Terms:
Topology identification, dynamic network tracking, proximal ADMM, smooth signal, regret analysis

I Introduction

Graphs serve as powerful abstractions of the underlying relational structure among data entities (e.g., sensors, brain regions), thus becoming a viable modeling tool for a host of signal processing and machine learning applications. Accordingly, there has been growing interest in describing multivariate complex data, such as social networks, neural activity time series, and urban traffic flows, as graph signals; see e.g. [1]. However, this valuable (latent) graph structure may not be explicitly available, and so it needs to be estimated from nodal data before any meaningful downstream analysis can be conducted [2, 3]. Networked systems (as the examples above) are in constant flux, with increasing size and complexity, so there is a need for efficient topology inference algorithms that can process data streams generated by dynamic networks [4, 5].

Related work. The inverse problem of searching for a graph (encoded by e.g., an adjacency or Laplacia matrix) that in some sense best describes a given multivariate dataset is known as network topology inference or graph structure learning. Optimality criteria and constraints often arise from statistical priors, physical principles, interpretability, or downstream task goals, which result in models that tie the observations to the latent graph. These models can be probabilistic [6, 7, 8], assume graph signals are stationary or generated by network diffusion [9, 10, 11, 12, 13], or else they are smooth w.r.t. to the graph [14, 15, 16, 17, 18]. In this work, we rely on the latter class.

Specifically, graph learning from smooth signals is formulated as a non-smooth strongly convex optimization problem, and several batch solvers have been developed to date. For instance, a scalable primal-dual algorithm was proposed in [15], while alternating direction method of multipliers (ADMM) iterations were put forth in [19]. An accelerated dual-domain proximal gradient algorithm (DPG) was studied in [16], with the first convergence rate results for this problem. The best rates to date are reported in [17], for a proximal ADMM (PADMM) algorithm developed for time-varying graph learning. All these approaches operate in batch mode, they are non-recursive, and hence their computational complexity and memory storage grow linearly with time when used to estimate dynamic network topologies.

Recognizing these challenges, online graph learning methods have been recently developed to process data sequentially-in-time. The online proximal gradient (PG) [20] and DPG [21] approaches can track slowly-varying dynamic graph topologies from streams of smooth signal observations. We would be remiss not to mention other online graph learning methods that rely on assumptions other than signal smoothness. For instance, [22] introduces an online algorithm that uses observations from a Laplacian-based continuous-time graph process, while [23] leverages graph signal stationarity. An encompassing (model-independent) framework that exploits a prediction-correction strategy was proposed in [24].

Proposed approach and contributions. In this paper, our main contribution is to develop a novel online algorithm to track the topology of (potentially dynamic) undirected graphs using streaming smooth signals. Our method builds on PADMM [25], whose provably fast local convergence properties have been well documented in a batch graph learning setting [17]. Our technical contributions are significant for several reasons. The proposed online version of PADMM (henceforth dubbed OPADMM): (i) incorporates proximal terms in the topology updates which effectively serve as temporal-variation regularizers; (ii) yields entrywise separable and lightweight updates, with time-independent computational cost and memory footprint; and (iii) exhibits sublinear static regret under simplifying assumptions. Numerical tests with synthetic and real-world graphs demonstrate that PADMM outperforms state-of-the-art online graph learning baselines for this problem, and can seamlessly track slowly-varying dynamic network topologies. In support of reproducible research, we share the code used to generate all figures in this paper.

II Preliminaries and Problem Formulation

II-A Graph-theoretic notions and problem statement

Consider an unknown undirected graph denoted as 𝒢(𝒱,,𝐖)𝒢𝒱𝐖\mathcal{G}\left(\mathcal{V},\mathcal{E},\mathbf{W}\right)caligraphic_G ( caligraphic_V , caligraphic_E , bold_W ), with vertices 𝒱={1,,n}𝒱1𝑛\mathcal{V}=\left\{1,\dots,n\right\}caligraphic_V = { 1 , … , italic_n }, edges 𝒱×𝒱𝒱𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}caligraphic_E ⊆ caligraphic_V × caligraphic_V, and the symmetric adjacency matrix 𝐖+n×n𝐖superscriptsubscript𝑛𝑛\mathbf{W}\in\mathbb{R}_{+}^{n\times n}bold_W ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT that collects the edge weights with 𝐖i,j=0subscript𝐖𝑖𝑗0\mathbf{W}_{i,j}=0bold_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0 for (i,j)𝑖𝑗(i,j)\notin\mathcal{E}( italic_i , italic_j ) ∉ caligraphic_E. Also, 𝐖i,i=0subscript𝐖𝑖𝑖0\mathbf{W}_{i,i}=0bold_W start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT = 0, i𝒱for-all𝑖𝒱\forall i\in\mathcal{V}∀ italic_i ∈ caligraphic_V, since we exclude self-loops. The graph Laplacian is 𝐋=diag(𝐝)𝐖𝐋diag𝐝𝐖\mathbf{L}=\textrm{diag}\left(\mathbf{d}\right)-\mathbf{W}bold_L = diag ( bold_d ) - bold_W, where 𝐝=𝐖𝟏n𝐝subscript𝐖𝟏𝑛\mathbf{d}=\mathbf{W}\mathbf{1}_{n}bold_d = bold_W1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represents the vector of nodal degrees

Along with the graph 𝒢(𝒱,,𝐖)𝒢𝒱𝐖\mathcal{G}\left(\mathcal{V},\mathcal{E},\mathbf{W}\right)caligraphic_G ( caligraphic_V , caligraphic_E , bold_W ), the graph signal observations are denoted as 𝐱n𝐱superscript𝑛\mathbf{x}\in\mathbb{R}^{n}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the value measured at node i𝒱𝑖𝒱i\in\mathcal{V}italic_i ∈ caligraphic_V. The Dirichlet energy or total variation (TV) quantifies the smoothness of graph signals supported on 𝒢𝒢\mathcal{G}caligraphic_G [26], and is given by

TV(𝐱)𝐱𝐋𝐱=12ijWij(𝐱i𝐱j)2,TV𝐱superscript𝐱top𝐋𝐱12subscript𝑖𝑗subscript𝑊𝑖𝑗superscriptsubscript𝐱𝑖subscript𝐱𝑗2\text{TV}\left(\mathbf{x}\right)\coloneqq\mathbf{x}^{\top}\mathbf{L}\mathbf{x}% =\frac{1}{2}\sum_{i\neq j}W_{ij}\left(\mathbf{x}_{i}-\mathbf{x}_{j}\right)^{2},TV ( bold_x ) ≔ bold_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Lx = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (1)

where TV(𝐱)[0,λmax]TV𝐱0subscript𝜆max\text{TV}\left(\mathbf{x}\right)\in\left[0,\lambda_{\text{max}}\right]TV ( bold_x ) ∈ [ 0 , italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ], where λmaxsubscript𝜆max\lambda_{\text{max}}italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT is the largest eigenvalue of the positive semidefinite (PSD) matrix 𝐋𝐋\mathbf{L}bold_L. We say a graph signal is smooth (or low-pass bandlimited) if it has a small total variation.

We have all the ingredients to state the topology inference problem. Our goal is to learn an undirected graph 𝒢(𝒱,,𝐖)𝒢𝒱𝐖\mathcal{G}\left(\mathcal{V},\mathcal{E},\mathbf{W}\right)caligraphic_G ( caligraphic_V , caligraphic_E , bold_W ) from a stream of graph signal measurements 𝐗{𝐱(k)}𝐗superscript𝐱𝑘\mathbf{X}\coloneqq\{\mathbf{x}^{(k)}\}bold_X ≔ { bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT } that are smooth on 𝒢𝒢\mathcal{G}caligraphic_G. We also consider tracking dynamic networks with slowly time-varying weight matrix 𝐖(k)superscript𝐖𝑘\mathbf{W}^{(k)}bold_W start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, k=1,2,𝑘12k=1,2,\dotsitalic_k = 1 , 2 , … (𝒱𝒱\mathcal{V}caligraphic_V remains fixed throughout).

II-B Batch topology identification from smooth signals

Consider a batch of p𝑝pitalic_p graph signals 𝐱(k)superscript𝐱𝑘\mathbf{x}^{(k)}bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, k=1,,p𝑘1𝑝k=1,\dots,pitalic_k = 1 , … , italic_p collected as columns of the data matrix 𝐗=[𝐱(1),,𝐱(p)]n×p𝐗superscript𝐱1superscript𝐱𝑝superscript𝑛𝑝\mathbf{X}=[\mathbf{x}^{(1)},\ldots,\mathbf{x}^{(p)}]\in\mathbb{R}^{n\times p}bold_X = [ bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_x start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Then form the vertex pairwise dissimilarity matrix 𝐙+n×n𝐙superscriptsubscript𝑛𝑛\mathbf{Z}\in\mathbb{R}_{+}^{n\times n}bold_Z ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, with entries Zij=𝐗i,:𝐗j,:22subscript𝑍𝑖𝑗superscriptsubscriptnormsubscript𝐗𝑖:subscript𝐗𝑗:22Z_{ij}=\left\|\mathbf{X}_{i,:}-\mathbf{X}_{j,:}\right\|_{2}^{2}italic_Z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∥ bold_X start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_j , : end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, (i,j)𝒱𝑖𝑗𝒱(i,j)\in\mathcal{V}( italic_i , italic_j ) ∈ caligraphic_V. With these definitions, the aggregate TV measure over 𝐗𝐗\mathbf{X}bold_X can be expressed as

k=1pTV(𝐱(k))=tr(𝐗𝐋𝐗)=12𝐙𝐖1,1,superscriptsubscript𝑘1𝑝TVsuperscript𝐱𝑘trsuperscript𝐗top𝐋𝐗12subscriptnormdirect-product𝐙𝐖11\sum_{k=1}^{p}\text{TV}\left(\mathbf{x}^{(k)}\right)=\text{tr}\left(\mathbf{X}% ^{\top}\mathbf{L}\mathbf{X}\right)=\frac{1}{2}\left\|\mathbf{Z}\odot\mathbf{W}% \right\|_{1,1},∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT TV ( bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) = tr ( bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_LX ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_Z ⊙ bold_W ∥ start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , (2)

where direct-product\odot stands for the Hadamard product and 1,1\|\cdot\|_{1,1}∥ ⋅ ∥ start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT denotes the entrywise matrix 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm. Identity (2) shows that minimizing the aggregate TV measure w.r.t. 𝐖𝐖\mathbf{W}bold_W boils down to preferentially dropping edges (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) with higher pairwise nodal dissimilarities Zi,jsubscript𝑍𝑖𝑗Z_{i,j}italic_Z start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, thus sparsifying \mathcal{E}caligraphic_E. This link between signal smoothness and edge sparsity was first recognized in [15], which advocates graph structure learning by solving the following convex inverse problem

min𝐖n×n𝐙𝐖1,1α𝟏nlog(𝐖𝟏n)+β2𝐖F2subject todiag(𝐖)=𝟎n,Wij=Wji0,ij,formulae-sequenceformulae-sequencesubscriptmin𝐖superscript𝑛𝑛subscriptdelimited-∥∥direct-product𝐙𝐖11𝛼superscriptsubscript1𝑛topsubscript𝐖𝟏𝑛𝛽2superscriptsubscriptdelimited-∥∥𝐖𝐹2subject todiag𝐖subscript0𝑛subscript𝑊𝑖𝑗subscript𝑊𝑗𝑖0𝑖𝑗\begin{split}&\operatorname*{min}_{\mathbf{W}\in\mathbb{R}^{n\times n}}\enskip% \left\|\mathbf{Z}\odot\mathbf{W}\right\|_{1,1}-\alpha\mathbf{1}_{n}^{\top}\log% \left(\mathbf{W}\mathbf{1}_{n}\right)+\frac{\beta}{2}\left\|\mathbf{W}\right\|% _{F}^{2}\\ &\text{subject to}\enskip\textrm{diag}\left(\mathbf{W}\right)=\mathbf{0}_{n},W% _{ij}=W_{ji}\geq 0,i\neq j,\end{split}start_ROW start_CELL end_CELL start_CELL roman_min start_POSTSUBSCRIPT bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ bold_Z ⊙ bold_W ∥ start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT - italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_log ( bold_W1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG italic_β end_ARG start_ARG 2 end_ARG ∥ bold_W ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL subject to diag ( bold_W ) = bold_0 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ≥ 0 , italic_i ≠ italic_j , end_CELL end_ROW (3)

where α,β>0𝛼𝛽0\alpha,\beta>0italic_α , italic_β > 0 are tunable regularization hyperparameters. The logarithmic barrier imposed over the nodal degrees 𝐝=𝐖𝟏n𝐝subscript𝐖𝟏𝑛\mathbf{d}=\mathbf{W}\mathbf{1}_{n}bold_d = bold_W1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT excludes isolated (null degree) vertices in the learned graph. The Frobenius-norm regularization on 𝐖𝐖\mathbf{W}bold_W controls edge sparsity via β𝛽\betaitalic_β.

Problem (3) can be further simplified. For instance, the free-decision variables can be arranged into a compact vector 𝐰vec(triu(𝐖))+r𝐰vectriu𝐖superscriptsubscript𝑟\mathbf{w}\coloneqq\textrm{vec}\left(\textrm{triu}\left(\mathbf{W}\right)% \right)\in\mathbb{R}_{+}^{r}bold_w ≔ vec ( triu ( bold_W ) ) ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, r=n(n1)2𝑟𝑛𝑛12r=\frac{n(n-1)}{2}italic_r = divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 2 end_ARG, i.e. the upper-triangular elements of 𝐖𝐖\mathbf{W}bold_W (since it is symmetric and hollow). To impose Wij0subscript𝑊𝑖𝑗0W_{ij}\geq 0italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≥ 0, we augment the cost with an indicator function ι𝟎(.)\iota_{\geq\mathbf{0}}\left(.\right)italic_ι start_POSTSUBSCRIPT ≥ bold_0 end_POSTSUBSCRIPT ( . ), which takes the value of zero when its argument is a vector with non-negative entries, otherwise it is equal to ++\infty+ ∞. Given these definitions, one can rewrite (3) as an equivalent unconstrained, non-differentiable problem

argmin𝐰r2𝐳𝐰+β𝐰22+ι𝟎(𝐰)α𝟏nlog(𝐒𝐰),subscriptargmin𝐰superscript𝑟2superscript𝐳top𝐰𝛽superscriptsubscriptnorm𝐰22subscript𝜄absent0𝐰𝛼superscriptsubscript1𝑛top𝐒𝐰\operatorname*{argmin}_{\mathbf{w}\in\mathbb{R}^{r}}\enskip 2\mathbf{z}^{\top}% \mathbf{w}+\beta\left\|\mathbf{w}\right\|_{2}^{2}+\iota_{\geq\mathbf{0}}\left(% \mathbf{w}\right)-\alpha\mathbf{1}_{n}^{\top}\log\left(\mathbf{S}\mathbf{w}% \right),roman_argmin start_POSTSUBSCRIPT bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 2 bold_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_w + italic_β ∥ bold_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ι start_POSTSUBSCRIPT ≥ bold_0 end_POSTSUBSCRIPT ( bold_w ) - italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_log ( bold_Sw ) , (4)

where 𝐒{0,1}n×r𝐒superscript01𝑛𝑟\mathbf{S}\in\left\{0,1\right\}^{n\times r}bold_S ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT maps vectorized edge weights to nodal degrees.

III Proximal ADMM for Graph Learning

Problem (4) has a structure amenable to ADMM-type solvers, as first recognized in [19]. Here we briefly review the batch PADMM iterations [17] that inspired our online algorithm in Section IV.

III-A Applying ADMM to the graph learning problem

Going back to (4), we introduce an auxiliary variable 𝐯n𝐯superscript𝑛\mathbf{v}\in\mathbb{R}^{n}bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (a proxy for 𝐝𝐝\mathbf{d}bold_d) and the variable-splitting constraint 𝐯=𝐒𝐰𝐯𝐒𝐰\mathbf{v}=\mathbf{S}\mathbf{w}bold_v = bold_Sw, to yield

argmin𝐰r,𝐯nf(𝐰)+g(𝐯)subject to𝐒𝐰𝐯=𝟎n,subscriptargminformulae-sequence𝐰superscript𝑟𝐯superscript𝑛𝑓𝐰𝑔𝐯subject to𝐒𝐰𝐯subscript0𝑛\operatorname*{argmin}_{\mathbf{w}\in\mathbb{R}^{r},\mathbf{v}\in\mathbb{R}^{n% }}\enskip f\left(\mathbf{w}\right)+g\left(\mathbf{v}\right)\enskip\text{% subject to}\enskip\mathbf{S}\mathbf{w}-\mathbf{v}=\mathbf{0}_{n},roman_argmin start_POSTSUBSCRIPT bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_w ) + italic_g ( bold_v ) subject to bold_Sw - bold_v = bold_0 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (5)

where the functions f𝑓fitalic_f and g𝑔gitalic_g are defined as

f(𝐰)=𝑓𝐰absent\displaystyle f\left(\mathbf{w}\right)={}italic_f ( bold_w ) = 2𝐳𝐰+β𝐰22+ι𝟎(𝐰)2superscript𝐳top𝐰𝛽superscriptsubscriptnorm𝐰22subscript𝜄absent0𝐰\displaystyle 2\mathbf{z}^{\top}\mathbf{w}+\beta\left\|\mathbf{w}\right\|_{2}^% {2}+\iota_{\geq\mathbf{0}}\left(\mathbf{w}\right)2 bold_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_w + italic_β ∥ bold_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ι start_POSTSUBSCRIPT ≥ bold_0 end_POSTSUBSCRIPT ( bold_w )
g(𝐯)=𝑔𝐯absent\displaystyle g\left(\mathbf{v}\right)={}italic_g ( bold_v ) = α𝟏nlog(𝐯).𝛼superscriptsubscript1𝑛top𝐯\displaystyle-\alpha\mathbf{1}_{n}^{\top}\log\left(\mathbf{v}\right).- italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_log ( bold_v ) .

For ρ>0𝜌0\rho>0italic_ρ > 0, the augmented Lagrangian of the problem (5) is given by

ρ(𝐰,𝐯,𝝀)=f(𝐰)+g(𝐯)+𝝀(𝐒𝐰𝐯)+ρ2𝐒𝐰𝐯22.subscript𝜌𝐰𝐯𝝀𝑓𝐰𝑔𝐯superscript𝝀top𝐒𝐰𝐯𝜌2superscriptsubscriptnorm𝐒𝐰𝐯22\mathcal{L}_{\rho}\left(\mathbf{w},\mathbf{v},\boldsymbol{\lambda}\right)=f% \left(\mathbf{w}\right)+g\left(\mathbf{v}\right)+\boldsymbol{\lambda}^{\top}% \left(\mathbf{S}\mathbf{w}-\mathbf{v}\right)+\frac{\rho}{2}\left\|\mathbf{S}% \mathbf{w}-\mathbf{v}\right\|_{2}^{2}.caligraphic_L start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( bold_w , bold_v , bold_italic_λ ) = italic_f ( bold_w ) + italic_g ( bold_v ) + bold_italic_λ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Sw - bold_v ) + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ∥ bold_Sw - bold_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

PADMM generalizes ADMM by incorporating quadratic proximity terms in the primal subproblems [25]. Given PSD matrices 𝐆𝐆\mathbf{G}bold_G and 𝐇𝐇\mathbf{H}bold_H, the k𝑘kitalic_k-th PADMM iteration consists of three updates

𝐰(k+1)=superscript𝐰𝑘1absent\displaystyle\mathbf{w}^{(k+1)}={}bold_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = argmin𝐰rρ(𝐰,𝐯(k),𝝀(k))+𝐰𝐰(k)𝐆/22,subscriptargmin𝐰superscript𝑟subscript𝜌𝐰superscript𝐯𝑘superscript𝝀𝑘superscriptsubscriptnorm𝐰superscript𝐰𝑘𝐆22\displaystyle\operatorname*{argmin}_{\mathbf{w}\in\mathbb{R}^{r}}\enskip% \mathcal{L}_{\rho}(\mathbf{w},\mathbf{v}^{(k)},\boldsymbol{\lambda}^{(k)})+\|% \mathbf{w}-\mathbf{w}^{(k)}\|_{\mathbf{G}/2}^{2},roman_argmin start_POSTSUBSCRIPT bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( bold_w , bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) + ∥ bold_w - bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT bold_G / 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (6)
𝐯(k+1)=superscript𝐯𝑘1absent\displaystyle\mathbf{v}^{(k+1)}={}bold_v start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = argmin𝐯nρ(𝐰(k+1),𝐯,𝝀(k))+𝐯𝐯(k)𝐇/22,subscriptargmin𝐯superscript𝑛subscript𝜌superscript𝐰𝑘1𝐯superscript𝝀𝑘superscriptsubscriptnorm𝐯superscript𝐯𝑘𝐇22\displaystyle\operatorname*{argmin}_{\mathbf{v}\in\mathbb{R}^{n}}\enskip% \mathcal{L}_{\rho}(\mathbf{w}^{(k+1)},\mathbf{v},\boldsymbol{\lambda}^{(k)})+% \|\mathbf{v}-\mathbf{v}^{(k)}\|_{\mathbf{H}/2}^{2},roman_argmin start_POSTSUBSCRIPT bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT , bold_v , bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) + ∥ bold_v - bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT bold_H / 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7)
𝝀(k+1)=superscript𝝀𝑘1absent\displaystyle\boldsymbol{\lambda}^{(k+1)}={}bold_italic_λ start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = 𝝀(k)+ρ(𝐒𝐰(k+1)𝐯(k+1)).superscript𝝀𝑘𝜌superscript𝐒𝐰𝑘1superscript𝐯𝑘1\displaystyle\boldsymbol{\lambda}^{(k)}+\rho(\mathbf{S}\mathbf{w}^{(k+1)}-% \mathbf{v}^{(k+1)}).bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + italic_ρ ( bold_Sw start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) . (8)

Note that when 𝐆=𝟎r×r𝐆subscript0𝑟𝑟\mathbf{G}=\mathbf{0}_{r\times r}bold_G = bold_0 start_POSTSUBSCRIPT italic_r × italic_r end_POSTSUBSCRIPT and 𝐇=𝟎n×n𝐇subscript0𝑛𝑛\mathbf{H}=\mathbf{0}_{n\times n}bold_H = bold_0 start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT, the proximity terms vanish and the updates (6)-(8) reduce to those of the standard ADMM. Typical choices for these matrices are 𝐆=τ11𝐈rρ𝐒𝐒𝐆superscriptsubscript𝜏11subscript𝐈𝑟𝜌superscript𝐒top𝐒\mathbf{G}=\tau_{1}^{-1}\mathbf{I}_{r}-\rho\mathbf{S}^{\top}\mathbf{S}bold_G = italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_ρ bold_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_S and 𝐇=(τ21ρ)𝐈n𝐇superscriptsubscript𝜏21𝜌subscript𝐈𝑛\mathbf{H}=(\tau_{2}^{-1}-\rho)\mathbf{I}_{n}bold_H = ( italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - italic_ρ ) bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Parameter values 0<τ1<1ρ𝐒220subscript𝜏11𝜌superscriptsubscriptnorm𝐒220<\tau_{1}<\frac{1}{\rho\left\|\mathbf{S}\right\|_{2}^{2}}0 < italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < divide start_ARG 1 end_ARG start_ARG italic_ρ ∥ bold_S ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG and 0<τ2<1ρ0subscript𝜏21𝜌0<\tau_{2}<\frac{1}{\rho}0 < italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG, with 𝐒22=2(n1)superscriptsubscriptnorm𝐒222𝑛1\left\|\mathbf{S}\right\|_{2}^{2}=2(n-1)∥ bold_S ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 ( italic_n - 1 ) [16] , guarantee the PSD requirement and convergence of the PADMM algorithm [17].

III-B Efficient proximal updates

The primal updates (6) and (7) can be expressed in terms of proximal operators, as

𝐰(k+1)=proxτ1f(𝐰¯),𝐯(k+1)=proxτ2g(𝐯¯),formulae-sequencesuperscript𝐰𝑘1subscriptproxsubscript𝜏1𝑓¯𝐰superscript𝐯𝑘1subscriptproxsubscript𝜏2𝑔¯𝐯\mathbf{w}^{(k+1)}=\operatorname*{prox}_{\tau_{1}\cdot f}\left(\overline{% \mathbf{w}}\right),\quad\quad\mathbf{v}^{(k+1)}=\operatorname*{prox}_{\tau_{2}% \cdot g}\left(\overline{\mathbf{v}}\right),bold_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = roman_prox start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_f end_POSTSUBSCRIPT ( over¯ start_ARG bold_w end_ARG ) , bold_v start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = roman_prox start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_g end_POSTSUBSCRIPT ( over¯ start_ARG bold_v end_ARG ) ,

where the auxiliary variables 𝐰¯¯𝐰\overline{\mathbf{w}}over¯ start_ARG bold_w end_ARG and 𝐯¯¯𝐯\overline{\mathbf{v}}over¯ start_ARG bold_v end_ARG are given by

𝐰¯=¯𝐰absent\displaystyle\overline{\mathbf{w}}={}over¯ start_ARG bold_w end_ARG = 𝐰(k)τ1[ρ𝐒(𝐒𝐰(k)𝐯(k)+𝝀(k)ρ)],superscript𝐰𝑘subscript𝜏1delimited-[]𝜌superscript𝐒topsuperscript𝐒𝐰𝑘superscript𝐯𝑘superscript𝝀𝑘𝜌\displaystyle\mathbf{w}^{(k)}-\tau_{1}\cdot\left[\rho\cdot\mathbf{S}^{\top}% \left(\mathbf{S}\mathbf{w}^{(k)}-\mathbf{v}^{(k)}+\frac{\boldsymbol{\lambda}^{% (k)}}{\rho}\right)\right],bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ [ italic_ρ ⋅ bold_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Sw start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + divide start_ARG bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_ρ end_ARG ) ] ,
𝐯¯=¯𝐯absent\displaystyle\overline{\mathbf{v}}={}over¯ start_ARG bold_v end_ARG = 𝐯(k)τ2ρ(𝐒𝐰(k+1)𝐯(k)𝝀(k)ρ).superscript𝐯𝑘subscript𝜏2𝜌superscript𝐒𝐰𝑘1superscript𝐯𝑘superscript𝝀𝑘𝜌\displaystyle\mathbf{v}^{(k)}-\tau_{2}\cdot\rho\left(\mathbf{S}\mathbf{w}^{(k+% 1)}-\mathbf{v}^{(k)}-\frac{\boldsymbol{\lambda}^{(k)}}{\rho}\right).bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_ρ ( bold_Sw start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - divide start_ARG bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_ρ end_ARG ) .

To arrive at the custom PADMM algorithm for graph learning in [17], we recognize both proximal operators admit simple expressions

proxτf(𝐰)=subscriptprox𝜏𝑓𝐰absent\displaystyle\operatorname*{prox}_{\tau\cdot f}\left(\mathbf{w}\right)={}roman_prox start_POSTSUBSCRIPT italic_τ ⋅ italic_f end_POSTSUBSCRIPT ( bold_w ) = max(𝐰2τ𝐳2τβ+1,𝟎r),𝐰2𝜏𝐳2𝜏𝛽1subscript0𝑟\displaystyle\max\left(\frac{\mathbf{w}-2\tau\mathbf{z}}{2\tau\beta+1},\mathbf% {0}_{r}\right),roman_max ( divide start_ARG bold_w - 2 italic_τ bold_z end_ARG start_ARG 2 italic_τ italic_β + 1 end_ARG , bold_0 start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ,
proxτg(𝐯)=subscriptprox𝜏𝑔𝐯absent\displaystyle\operatorname*{prox}_{\tau\cdot g}\left(\mathbf{v}\right)={}roman_prox start_POSTSUBSCRIPT italic_τ ⋅ italic_g end_POSTSUBSCRIPT ( bold_v ) = 𝐯+𝐯2+4τα𝟏n2.𝐯superscript𝐯24𝜏𝛼subscript1𝑛2\displaystyle\frac{\mathbf{v}+\sqrt{\mathbf{v}^{2}+4\tau\alpha\mathbf{1}_{n}}}% {2}.divide start_ARG bold_v + square-root start_ARG bold_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_τ italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG .

which act entrywise on their vector arguments. All in all, the PADMM updates at the k𝑘kitalic_k-th iteration become

𝐰(k+1)=superscript𝐰𝑘1absent\displaystyle\mathbf{w}^{(k+1)}={}bold_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = max(𝐰¯2τ1𝐳2τ1β+1,𝟎r),¯𝐰2subscript𝜏1𝐳2subscript𝜏1𝛽1subscript0𝑟\displaystyle\max\left(\frac{\overline{\mathbf{w}}-2\tau_{1}\mathbf{z}}{2\tau_% {1}\beta+1},\mathbf{0}_{r}\right),roman_max ( divide start_ARG over¯ start_ARG bold_w end_ARG - 2 italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_z end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β + 1 end_ARG , bold_0 start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) , (9)
𝐯(k+1)=superscript𝐯𝑘1absent\displaystyle\mathbf{v}^{(k+1)}={}bold_v start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = 12(𝐯¯+𝐯¯2+4τ2α𝟏n),12¯𝐯superscript¯𝐯24subscript𝜏2𝛼subscript1𝑛\displaystyle\frac{1}{2}\left(\overline{\mathbf{v}}+\sqrt{\overline{\mathbf{v}% }^{2}+4\tau_{2}\alpha\mathbf{1}_{n}}\right),divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over¯ start_ARG bold_v end_ARG + square-root start_ARG over¯ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) , (10)
𝝀(k+1)=superscript𝝀𝑘1absent\displaystyle\boldsymbol{\lambda}^{(k+1)}={}bold_italic_λ start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = 𝝀(k)+ρ(𝐒𝐰(k+1)𝐯(k+1)).superscript𝝀𝑘𝜌superscript𝐒𝐰𝑘1superscript𝐯𝑘1\displaystyle\boldsymbol{\lambda}^{(k)}+\rho\left(\mathbf{S}\mathbf{w}^{(k+1)}% -\mathbf{v}^{(k+1)}\right).bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + italic_ρ ( bold_Sw start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) . (11)

PADMM enjoys a global convergence rate of 𝒪(1k)𝒪1𝑘\mathcal{O}(\frac{1}{k})caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ), and it has recently been shown to exhibit local convergence properties with a significantly faster rate of 𝒪(μk)𝒪superscript𝜇𝑘\mathcal{O}(\mu^{k})caligraphic_O ( italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ), 0<μ<10𝜇10<\mu<10 < italic_μ < 1 [27]; see also [17].

IV Online Graph Learning from Streaming Signals

IV-A Online PADMM algorithm

PADMM’s favorable convergence properties in the batch topology inference setting discussed so far, motivates well its adoption for online learning. Online estimation of 𝐖𝐖\mathbf{W}bold_W (or even tracking 𝐖(k)superscript𝐖𝑘\mathbf{W}^{(k)}bold_W start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT in dynamic settings) using streaming signals {𝐱(1),,𝐱(k),𝐱(k+1),}superscript𝐱1superscript𝐱𝑘superscript𝐱𝑘1\{\mathbf{x}^{(1)},\dots,\mathbf{x}^{(k)},\mathbf{x}^{(k+1)},\dots\}{ bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT , … } can be cast as the following time-varying optimization problem

𝐰^(k)argmin𝐰n2𝐳1:k𝐰+β𝐰22+ι𝟎(𝐰)f(k)(𝐰)α𝟏nlog(𝐒𝐰)g(𝐒𝐰).superscript^𝐰𝑘subscriptargmin𝐰superscript𝑛superscript2superscriptsubscript𝐳:1𝑘top𝐰𝛽superscriptsubscriptnorm𝐰22subscript𝜄absent0𝐰superscript𝑓𝑘𝐰subscript𝛼superscriptsubscript1𝑛top𝐒𝐰𝑔𝐒𝐰\hat{\mathbf{w}}^{(k)}\in\operatorname*{argmin}_{\mathbf{w}\in\mathbb{R}^{n}}% \enskip\overbrace{2\mathbf{z}_{1:k}^{\top}\mathbf{w}+\beta\left\|\mathbf{w}% \right\|_{2}^{2}+\iota_{\geq\mathbf{0}}\left(\mathbf{w}\right)}^{f^{(k)}\left(% \mathbf{w}\right)}\underbrace{-\alpha\mathbf{1}_{n}^{\top}\log\left(\mathbf{S}% \mathbf{w}\right)}_{g\left(\mathbf{S}\mathbf{w}\right)}.over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∈ roman_argmin start_POSTSUBSCRIPT bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over⏞ start_ARG 2 bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_w + italic_β ∥ bold_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ι start_POSTSUBSCRIPT ≥ bold_0 end_POSTSUBSCRIPT ( bold_w ) end_ARG start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( bold_w ) end_POSTSUPERSCRIPT under⏟ start_ARG - italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_log ( bold_Sw ) end_ARG start_POSTSUBSCRIPT italic_g ( bold_Sw ) end_POSTSUBSCRIPT . (12)

Notice that the non-smooth, strongly convex component f(k)superscript𝑓𝑘f^{(k)}italic_f start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is time-varying because of the vectorized dissimilarity matrix 𝐳1:ksubscript𝐳:1𝑘\mathbf{z}_{1:k}bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT, which is formed using all signals {𝐱(k)}superscript𝐱𝑘\{\mathbf{x}^{(k)}\}{ bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT } acquired by time k𝑘kitalic_k; see also (13).

While (12) can be solved by sequentially running a batch algorithm every time a new graph signal arrives, such procedure would incur in high delay and computational burden – an unsatisfactory solution for delay-sensitive tasks. Furthermore, searching for a high-precision solution in dynamic network settings may not be worth the effort, since the new estimate 𝐰^(k+1)superscript^𝐰𝑘1\hat{\mathbf{w}}^{(k+1)}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT may substantially deviate from the prior estimate 𝐰^(k)superscript^𝐰𝑘\hat{\mathbf{w}}^{(k)}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. Accordingly, we propose an online (recursive) algorithm to track the solution of the time-varying problem (12), which builds on the PADMM foundations in Section III.

Our online method consists on two steps per time instant k=1,2,𝑘12k=1,2,\ldotsitalic_k = 1 , 2 , …. First, we recursively update the upper-triangular entries of the Euclidean distance matrix 𝐳1:ksubscript𝐳:1𝑘\mathbf{z}_{1:k}bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT, once a new datum 𝐱(k)superscript𝐱𝑘\mathbf{x}^{(k)}bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is acquired at time k𝑘kitalic_k. Specifically, given 𝐱(k)superscript𝐱𝑘\mathbf{x}^{(k)}bold_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT we form the vectorized dissimilarity matrix 𝐳¯(k)superscript¯𝐳𝑘\overline{\mathbf{z}}^{(k)}over¯ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and update the running average

𝐳1:k=(1γ(k))𝐳1:k1+γ(k)𝐳¯(k),subscript𝐳:1𝑘1superscript𝛾𝑘subscript𝐳:1𝑘1superscript𝛾𝑘superscript¯𝐳𝑘\mathbf{z}_{1:k}=\left(1-\gamma^{(k)}\right)\mathbf{z}_{1:k-1}+\gamma^{(k)}% \overline{\mathbf{z}}^{(k)},bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT = ( 1 - italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT over¯ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , (13)

where γ(k)(0,1)superscript𝛾𝑘01\gamma^{(k)}\in(0,1)italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∈ ( 0 , 1 ) is a forgetting factor. In stationary settings (the graph is static), we choose γ(k)=1ksuperscript𝛾𝑘1𝑘\gamma^{(k)}=\frac{1}{k}italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG, which renders (13) an infinite-memory running sample average. In dynamic environments we select γ(k)=2×103superscript𝛾𝑘2superscript103\gamma^{(k)}=2\times 10^{-3}italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = 2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, so that (13) becomes an exponentially-weighted moving average with the ability to track topology changes. In the second step, our online algorithm updates the primal and dual vectors 𝐰(k)superscript𝐰𝑘\mathbf{w}^{(k)}bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, 𝐯(k)superscript𝐯𝑘\mathbf{v}^{(k)}bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and 𝝀(k)superscript𝝀𝑘\boldsymbol{\lambda}^{(k)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT by running a single iteration of the batch PADMM updates (9), (10), (11), respectively; but employing 𝐳1:ksubscript𝐳:1𝑘\mathbf{z}_{1:k}bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT instead of 𝐳𝐳\mathbf{z}bold_z in (9) to reflect the online nature of our method. The resulting topology estimate at time k𝑘kitalic_k is stored in 𝐰(k)superscript𝐰𝑘\mathbf{w}^{(k)}bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, and the overall online (O)PADMM procedure is tabulated under Algorithm 1.

Input: Edge weight-to-node degree map 𝐒𝐒\mathbf{S}bold_S; new dissimilarity information 𝐳¯(k)superscript¯𝐳𝑘\overline{\mathbf{z}}^{(k)}over¯ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT at time k𝑘kitalic_k ; regularization hyperparameters α𝛼\alphaitalic_α and β𝛽\betaitalic_β; PADMM hyperparameters ρ𝜌\rhoitalic_ρ, τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT; forgetting factor γ𝛾\gammaitalic_γ; initial values 𝐰(0)superscript𝐰0\mathbf{w}^{(0)}bold_w start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT, 𝐯(0)superscript𝐯0\mathbf{v}^{(0)}bold_v start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝝀(0)superscript𝝀0\boldsymbol{\lambda}^{(0)}bold_italic_λ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT
Output: Tracking solution available at the k𝑘kitalic_k-th iteration 𝐰(k)superscript𝐰𝑘\mathbf{w}^{(k)}bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT
for k=1,2,𝑘12italic-…k=1,2,\dotsitalic_k = 1 , 2 , italic_… do
      Updateγ(k)Updatesuperscript𝛾𝑘\text{Update}\enskip\gamma^{(k)}Update italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT
      𝐳1:k(1γ(k))𝐳1:k1+γ(k)𝐳¯(k)subscript𝐳:1𝑘1superscript𝛾𝑘subscript𝐳:1𝑘1superscript𝛾𝑘superscript¯𝐳𝑘\mathbf{z}_{1:k}\leftarrow\left(1-\gamma^{(k)}\right)\mathbf{z}_{1:k-1}+\gamma% ^{(k)}\overline{\mathbf{z}}^{(k)}bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT ← ( 1 - italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT over¯ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT
      𝐰¯𝐰(k1)τ1ρ𝐒(𝐒𝐰(k1)𝐯(k1)+𝝀(k1)ρ)¯𝐰superscript𝐰𝑘1subscript𝜏1𝜌superscript𝐒topsuperscript𝐒𝐰𝑘1superscript𝐯𝑘1superscript𝝀𝑘1𝜌\overline{\mathbf{w}}\leftarrow\mathbf{w}^{(k-1)}-\tau_{1}\rho\mathbf{S}^{\top% }\left(\mathbf{S}\mathbf{w}^{(k-1)}-\mathbf{v}^{(k-1)}+\frac{\boldsymbol{% \lambda}^{(k-1)}}{\rho}\right)over¯ start_ARG bold_w end_ARG ← bold_w start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT - italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ρ bold_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Sw start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT + divide start_ARG bold_italic_λ start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_ρ end_ARG );
      𝐰(k)12τ1β+1max(𝐰¯2τ1𝐳1:k,𝟎r)superscript𝐰𝑘12subscript𝜏1𝛽1¯𝐰2subscript𝜏1subscript𝐳:1𝑘subscript0𝑟\mathbf{w}^{(k)}\leftarrow\frac{1}{2\tau_{1}\beta+1}\max\left(\overline{% \mathbf{w}}-2\tau_{1}\mathbf{z}_{1:k},\mathbf{0}_{r}\right)bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ← divide start_ARG 1 end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β + 1 end_ARG roman_max ( over¯ start_ARG bold_w end_ARG - 2 italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT , bold_0 start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT );
      𝐯¯(1+ρτ2)𝐯(k1)ρτ2𝐒𝐰(k)+τ2𝝀(k1)¯𝐯1𝜌subscript𝜏2superscript𝐯𝑘1𝜌subscript𝜏2superscript𝐒𝐰𝑘subscript𝜏2superscript𝝀𝑘1\overline{\mathbf{v}}\leftarrow\left(1+\rho\tau_{2}\right)\mathbf{v}^{(k-1)}-% \rho\tau_{2}\mathbf{S}\mathbf{w}^{(k)}+\tau_{2}\boldsymbol{\lambda}^{(k-1)}over¯ start_ARG bold_v end_ARG ← ( 1 + italic_ρ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) bold_v start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT - italic_ρ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_Sw start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_λ start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT;
      𝐯(k)12(𝐯¯+𝐯¯2+4τ2α𝟏n)superscript𝐯𝑘12¯𝐯superscript¯𝐯24subscript𝜏2𝛼subscript1𝑛\mathbf{v}^{(k)}\leftarrow\frac{1}{2}\left(\overline{\mathbf{v}}+\sqrt{% \overline{\mathbf{v}}^{2}+4\tau_{2}\alpha\mathbf{1}_{n}}\right)bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ← divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over¯ start_ARG bold_v end_ARG + square-root start_ARG over¯ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG );
      𝝀(k)𝝀(k1)+ρ(𝐒𝐰(k)𝐯(k))superscript𝝀𝑘superscript𝝀𝑘1𝜌superscript𝐒𝐰𝑘superscript𝐯𝑘\boldsymbol{\lambda}^{(k)}\leftarrow\boldsymbol{\lambda}^{(k-1)}+\rho\left(% \mathbf{S}\mathbf{w}^{(k)}-\mathbf{v}^{(k)}\right)bold_italic_λ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ← bold_italic_λ start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT + italic_ρ ( bold_Sw start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT )
      end for
Algorithm 1 OPADMM for graph learning

IV-B Discussion, computational complexity and regret analyses

It is worth noting that matrix-vector multiplications with 𝐒𝐒\mathbf{S}bold_S effectively boil down to tabular searches and additions of the vector entries. All other operations in Algorithm 1 are pointwise, so the computational cost per iteration is 𝒪(r)𝒪𝑟\mathcal{O}\left(r\right)caligraphic_O ( italic_r ) (recall r=n(n1)2𝑟𝑛𝑛12r=\frac{n(n-1)}{2}italic_r = divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 2 end_ARG). This is on par with online PG [20] and DPG [21], and less expensive than the prediction-correction algorithm in [24]. Unlike batch algorithms for time-varying graph learning [17, 28], OPADMM’s memory storage and computational cost do not depend on the length of the data acquisition horizon. These batch algorithms augment the objective function (3) with a temporal-variation regularization, to enforce similarity between consecutive adjacency matrix estimates. Interestingly, this feature is seamlessly embedded in OPADMM by virtue of the proximity term in (6) – a significant improvement over the state-of-the-art online DPG algorithm.

To obtain performance guarantees in terms of static regret, we will henceforth assume that nodal degrees are in a compact set that is uniformly bounded away from zero (meaning dmin𝟏n𝐝dmax𝟏nprecedes-or-equalssubscript𝑑subscript1𝑛𝐝precedes-or-equalssubscript𝑑subscript1𝑛d_{\min}\mathbf{1}_{n}\preceq\mathbf{d}\preceq d_{\max}\mathbf{1}_{n}italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⪯ bold_d ⪯ italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for some 0<dmindmax0subscript𝑑subscript𝑑0<d_{\min}\leq d_{\max}0 < italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT). This is a loose requirement that ensures the subgradient of the objective has bounded norm. In practice if degrees become arbitrarily small, it is prudent to apply a threshold and remove the loosely connected nodes from 𝒢(𝒱,,𝐖)𝒢𝒱𝐖\mathcal{G}(\mathcal{V},\mathcal{E},\mathbf{W})caligraphic_G ( caligraphic_V , caligraphic_E , bold_W ). With this extra requirement one can show that for τ2=ρsubscript𝜏2𝜌\tau_{2}=\rhoitalic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_ρ so that 𝐇=𝟎n×n𝐇subscript0𝑛𝑛\mathbf{H}=\mathbf{0}_{n\times n}bold_H = bold_0 start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT, [29, Theorem 3] ensures that OPADMM attains a sublinear static regret, namely

k=1pf(k)(𝐰(k))+g(𝐯(k))min𝐒𝐰=𝐯k=1pf(k)(𝐰)+g(𝐯)𝒪(p).superscriptsubscript𝑘1𝑝superscript𝑓𝑘superscript𝐰𝑘𝑔superscript𝐯𝑘subscript𝐒𝐰𝐯superscriptsubscript𝑘1𝑝superscript𝑓𝑘𝐰𝑔𝐯𝒪𝑝\sum_{k=1}^{p}f^{(k)}(\mathbf{w}^{(k)})+g(\mathbf{v}^{(k)})-\min_{\mathbf{S}% \mathbf{w}=\mathbf{v}}\sum_{k=1}^{p}f^{(k)}(\mathbf{w})+g(\mathbf{v})\leq% \mathcal{O}(\sqrt{p}).∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) + italic_g ( bold_v start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - roman_min start_POSTSUBSCRIPT bold_Sw = bold_v end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( bold_w ) + italic_g ( bold_v ) ≤ caligraphic_O ( square-root start_ARG italic_p end_ARG ) .

The extension to the case where 𝐇𝟎n×n𝐇subscript0𝑛𝑛\mathbf{H}\neq\mathbf{0}_{n\times n}bold_H ≠ bold_0 start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT is left as future work.

Refer to caption
(a) Gaussian (stationary)
Refer to caption
(b) Erdös-Rényi (stationary)
Refer to caption
(c) Preferential attachment (stationary)
Refer to caption
(d) Gaussian (dynamic)
Refer to caption
(e) Erdös-Rényi (dynamic)
Refer to caption
(f) Preferential attachment (dynamic)
Figure 1: Convergence behavior illustrated via the evolution of suboptimality 𝐰(k)𝐰^2subscriptnormsuperscript𝐰𝑘^𝐰2\|\mathbf{w}^{(k)}-\hat{\mathbf{w}}\|_{2}∥ bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - over^ start_ARG bold_w end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for synthetic random graph models with n=100𝑛100n=100italic_n = 100 nodes: (a)-(c) stationary graphs, and (d)-(f) time-varying graphs. For the dynamic networks, the topology changes at k=1000𝑘1000k=1000italic_k = 1000. In stationary settings, our proposed method converges faster than online PG and DPG algorithms, while in dynamic settings, it demonstrates superior adaptability to changes in network topology.
Refer to caption
(a) mesh1e1
Refer to caption
(b) bcspwr03
Refer to caption
(c) lshp_265
Figure 2: Convergence behavior illustrated via the evolution of suboptimality 𝐰(k)𝐰^2subscriptnormsuperscript𝐰𝑘^𝐰2\|\mathbf{w}^{(k)}-\hat{\mathbf{w}}\|_{2}∥ bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - over^ start_ARG bold_w end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for real-world graphs from the datasets [30]. Our proposed method converges faster than online PG and DPG algorithms in all cases.

V Experimental Evaluation

Here we first test OPADMM using various graphs ensembles that were synthetically generated using different models. We evaluate the convergence of our proposed method by montoring the suboptimality (tracking error) 𝐰(k)𝐰^2subscriptnormsuperscript𝐰𝑘^𝐰2\|\mathbf{w}^{(k)}-\hat{\mathbf{w}}\|_{2}∥ bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - over^ start_ARG bold_w end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where 𝐰(k)superscript𝐰𝑘\mathbf{w}^{(k)}bold_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is the solution produced by Algorithm 1 at the k𝑘kitalic_k-th iteration, and 𝐰^^𝐰\hat{\mathbf{w}}over^ start_ARG bold_w end_ARG corresponds to the solution of the batch method using the values of hyperparameters α𝛼\alphaitalic_α and β𝛽\betaitalic_β that are the best in a grid search evaluation using the method in [15]. Finally, we perform a grid search for the hyperparameters ρ𝜌\rhoitalic_ρ, τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of OPADMM and compare the best case with the state-of-the-art-methods online PG and DPG. We also use real world datasets to validate our proposed method in a similar fashion to the tests with synthetic graphs. The code to generate all figures in this work is available in a public GitHub repository at https://github.com/hchahuara/ogl.

V-A Synthetic graphs

Stationary graphs. A set of p=1000𝑝1000p=1000italic_p = 1000 smooth graph signals, each one with n=100𝑛100n=100italic_n = 100 entries, were synthetically generated and corrupted with Gaussian noise (μ=0𝜇0\mu=0italic_μ = 0, σ2=0.01superscript𝜎20.01\sigma^{2}=0.01italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.01). Graphs realizations were drawn from three random graph models: random geometric with Gaussian kernel, Erdös-Rényi (ER), and preferential attachment (PA). Sparse connections were achieved with threshold 0.80.80.80.8 and scale 0.20.20.20.2 for the Gaussian model, edge probability 0.10.10.10.1 for the ER model, and 2222 connected nodes initially and then adding new nodes one at a time for the PA model. Numerical results for the aforementioned test cases are depicted in Figure 1 (a)-(c). For stationary graphs, OPADMM outperforms online PG and DPG in terms of convergence speed.

Time-varying graphs. We generated another set of p=2000𝑝2000p=2000italic_p = 2000 smooth graph signals, each with n=100𝑛100n=100italic_n = 100 elements, and similarly corrupted them with Gaussian noise (μ=0𝜇0\mu=0italic_μ = 0, σ2=0.01superscript𝜎20.01\sigma^{2}=0.01italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.01). These graph signals were designed to be smooth on time-varying (piecewise-stationary) graphs, with 10%percent1010\%10 % of the edges resampled after 1000 samples. Dynamic graphs were generated using the models with the same parameter values used for stationary graphs. The results are shown in Figure 1 (d)-(f), from where it is apparent that OPADMM outperforms PG in convergence. While online DPG initially tracks the solution marginally faster, ultimately, OPADMM adapts better to the network changes and ends up converging faster than DPG.

V-B Real world graphs

We also test the performance of our proposed algorithm using three datasets from the SuiteSparse Matrix Collection [30]. In particular, we use the structural engineering data mesh1e1 (n=48𝑛48n=48italic_n = 48 nodes), power network data bcspwr03 (n=118𝑛118n=118italic_n = 118 nodes), and thermal network data lshp_265 (n=265𝑛265n=265italic_n = 265 nodes). All experiments with these real graphs have been carried out by generating p=1000𝑝1000p=1000italic_p = 1000 synthetic smooth signals corrupted with Gaussian noise (μ=0𝜇0\mu=0italic_μ = 0, σ2=0.01superscript𝜎20.01\sigma^{2}=0.01italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.01). We report these results in Figure 2. It can be observed that OPADMM clearly yields a faster convergence behavior in comparison with online PG and DPG in all the performed tests. Our findings are thus fairly robust.

VI Conclusions

In this paper, we have developed an efficient online optimization method for graph learning. Leveraging the local linear convergence properties of the PADMM and the structure of the network topology inference problem, we derive an alternating minimization algorithm (termed OPADMM) to minimize a time-varying cost in an online fashion. The proximity term in the topology updates implements a temporal-variation regularization, and we show OPADMM exhibits sublinear static regret under simplifying assumptions. Computational experiments demonstrate the effectiveness of OPADMM in stationary or dynamic settings, and that it outperforms state-of-the-art online algorithms for synthetically generated instances. Furthermore, the proposed approach exhibits robust performance across a broad range of graphs, as evidenced via tests using real world datasets.

References

  • [1] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018.
  • [2] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots: Identifying network structure via graph signal processing,” IEEE Signal Process. Mag., vol. 36, no. 3, pp. 16–43, 2019.
  • [3] X. Dong, D. Thanou, M. Rabbat, and P. Frossard, “Learning graphs from data: A signal representation perspective,” IEEE Signal Process. Mag., vol. 36, no. 3, pp. 44–63, 2019.
  • [4] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,” vol. 106, no. 5, pp. 787–807, 2018.
  • [5] E. Dall’Anese, A. Simonetto, S. Becker, and L. Madden, “Optimization and learning with information streams: Time-varying algorithms and applications,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 71–83, 2020.
  • [6] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008.
  • [7] E. Pavez, H. E. Egilmez, and A. Ortega, “Learning graphs with monotone topology properties and multiple connected components,” IEEE Trans. Signal Process., vol. 66, no. 9, 2018.
  • [8] J. V. de Miranda Cardoso, J. Ying, and D. Palomar, “Graphical models in heavy-tailed markets,” in Proc. Adv. Neural. Inf. Process. Syst., 2021, pp. 1–13.
  • [9] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 467–483, 2017.
  • [10] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat, “Characterization and inference of graph diffusion processes from observations of stationary signals,” IEEE Trans. Signal Inf. Process. Netw., vol. 4, no. 3, pp. 481–496, 2018.
  • [11] D. Thanou, X. Dong, D. Kressner, and P. Frossard, “Learning heat diffusion graphs,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 484–499, 2017.
  • [12] M. Wasserman, S. Sihag, G. Mateos, and A. Ribeiro, “Learning graph structure from convolutional mixtures,” Trans. Mach. Learn. Res., pp. 1–29, 2023.
  • [13] M. Navarro, S. Rey, A. Buciulea, A. G. Marques, and S. Segarra, “Joint network topology inference in the presence of hidden nodes,” IEEE Trans. Signal Process., vol. 72, pp. 2710–2725, 2024.
  • [14] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning Laplacian matrix in smooth graph signal representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, 2016.
  • [15] V. Kalofolias, “How to learn a graph from smooth signals,” in Proc. Int. Conf. Artif. Intell. Statist., 2016, pp. 920–929.
  • [16] S. S. Saboksayr and G. Mateos, “Accelerated graph learning from smooth signals,” IEEE Signal Process. Lett., vol. 28, pp. 2192–2196, 2021.
  • [17] X. Wang, C. Yao, and A. M.-C. So, “A linearly convergent optimization framework for learning graphs from smooth signals,” IEEE Trans. Signal Inf. Process. Netw., vol. 9, pp. 490–504, 2023.
  • [18] M. Wasserman and G. Mateos, “Graph structure learning with interpretable Bayesian neural networks,” Trans. Mach. Learn. Res., pp. 1–27, 2024.
  • [19] X. Wang, C. Yao, H. Lei, and A. M.-C. So, “An efficient alternating direction method for graph learning from smooth signals,” in Proc. Int. Conf. Acoustics, Speech, Signal Process., 2021, pp. 5380–5384.
  • [20] S. S. Saboksayr, G. Mateos, and M. Cetin, “Online graph learning under smoothness priors,” in Proc. of European Signal Process. Conf., 2021, pp. 1820–1824.
  • [21] S. S. Saboksayr and G. Mateos, “Dual-based online learning of dynamic network topologies,” in Proc. Int. Conf. Acoustics, Speech, Signal Process., 2023, pp. 1–5.
  • [22] S. Vlaski, H. P. Maretić, R. Nassif, P. Frossard, and A. H. Sayed, “Online graph learning from sequential data,” in Proc. IEEE Data Science Workshop, 2018, pp. 190–194.
  • [23] R. Shafipour and G. Mateos, “Online topology inference from streaming stationary graph signals with partial connectivity information,” Algorithms, vol. 13, no. 9, 2020.
  • [24] A. Natali, E. Isufi, M. Coutino, and G. Leus, “Learning time-varying graphs from online data,” IEEE Open J. Signal Process., vol. 3, pp. 212–228, 2022.
  • [25] A. Beck, First-Order Methods in Optimization.   Philadelphia, PA: Society for Industrial and Applied Mathematics, 2017.
  • [26] D. Zhou and B. Schölkopf, “A regularization framework for learning from graph data,” in ICML Workshop on Statistical Relational Learning and its Connections to Other Fields, 2004.
  • [27] D. Han, D. Sun, and L. Zhang, “Linear rate convergence of the alternating direction method of multipliers for convex composite programming,” Math. Oper. Res., vol. 43, no. 2, pp. 622–637, 2017.
  • [28] J. V. d. M. Cardoso and D. P. Palomar, “Learning undirected graphs in financial markets,” in Proc. Asilomar Conf. on Signals, Systems, Computers, 2020, pp. 741–745.
  • [29] H. Wang and A. Banerjee, “Online alternating direction method,” in Proc. Int. Conf. Mach. Learn., Madison, WI, USA, 2012, p. 1699–1706.
  • [30] T. A. Davis and Y. Hu, “The University of Florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, Dec. 2011.