Tiered Service Architecture for Remote Patient Monitoring

Siddharth Chandak¹, Isha Thapa², Nicholas Bambos^1,2 and David Scheinker^2,3
¹ Department of Electrical Engineering, Stanford University, USA.
² Department of Management Science & Engineering, Stanford University, USA.
³ School of Medicine, Stanford University, USA.
{chandaks, ishadt, bambos, dscheink}@stanford.edu

Abstract

We develop a remote patient monitoring (RPM) service architecture, which has two tiers of monitoring: ordinary and intensive. The patient’s health state improves or worsens in each time period according to certain probabilities, which depend on the monitoring tier. The patient incurs a “loss of quality of life” cost or an “invasiveness” cost, which is higher under intensive monitoring than under ordinary. On the other hand, their health improves faster under intensive monitoring than under ordinary. In each period, the service decides which monitoring tier to use, based on the health of the patient. We investigate the optimal policy for making that choice by formulating the problem using dynamic programming. We first provide analytic conditions for selecting ordinary vs intensive monitoring in the asymptotic regime where the number of health states is large. In the general case, we investigate the optimal policy numerically. We observe a threshold behavior, that is, when the patient’s health drops below a certain threshold the service switches them to intensive monitoring, while ordinary monitoring is used during adequately good health states of the patient. The modeling and analysis provides a general framework for managing RPM services for various health conditions with medically/clinically defined system parameters.

I Introduction

Remote Patient Monitoring (RPM) is increasingly receiving attention as a method for monitoring patients with certain medical conditions in their normal living/working environments to increase their quality of life and the level of delivered health care [1, 2, 3]. This is becoming feasible via advancements in wearable medical devices, for example, wearable glucose monitors [4], smart watches with vital sign monitoring capabilities (e.g., heart rate, ECG, pulse oximetry) [5, 6], and other such sensors. Further, such devices are increasingly networked and can transmit and receive data over the Internet and act as edge devices in communication with computation servers in the Cloud.

Studies have shown the effectiveness of RPM for various medical conditions. For example, continuous glucose monitoring has been shown to improve glycemic control in patients with diabetes [7, 8]. Smart watches have been effectively used to monitor stress, movement disorders, sleep patterns, blood pressure, heart disease, and COVID-19 [5]. Other RPM devices have also been used to manage and track cardiac conditions, such as heart failure, arrhythmia, and hypertension [3]. These studies highlight the potential of RPM to improve patient outcomes and quality of life by allowing for timely interventions and personalized care.

However, the question remains of how intensively to monitor patients. Intensive monitoring schemes could range from remotely collecting more data on the patient health state and administering more medical intervention remotely (e.g., alerting the patient to increase medication dosage) to calling the patient into an urgent care facility.

While aggressive monitoring may provide more comprehensive data, it can be resource intensive, draining the wearable device battery [6] faster and requiring the clinicians to review more RPM data. From the patient’s perspective, under intensive monitoring, the patient experiences a higher “loss of quality of life” cost or “invasiveness” cost, since intensive monitoring would normally be more invasive to their personal lifestyle and can result in treatment fatigue [9]. In contrast, intensive monitoring (and correspondingly elevated medical intervention) would enable early detection and intervention for adverse events, and hence the patient’s health is expected to improve faster than under ordinary one. Therefore, to account for this trade-off between the invasiveness cost and the possibility of an early intervention, there is an inherent need for a systematic approach to determining the appropriate level of monitoring, based on the patient’s health state.

In this paper we develop a RPM service architecture, where the patient is placed under less or more intensive levels or tiers of monitoring, based on their health state. We then study the optimal monitoring strategy for this model and how it varies with different parameters. One would intuitively expect that a patient would be placed under intensive one when their health state deteriorates; on the other hand, they would be returned to ordinary monitoring when their health state improves enough. The decision to switch from ordinary to intensive monitoring (and/or vice versa) requires a systematic analysis and depends heavily on the various parameters of the service in a rather subtle and complicated way, as analyzed in the following sections.

Of course, the design of an RPM service for a specific medical condition is highly dependent on the specifics of that condition and requires specialized medical knowledge. The point of this paper is not to design a particular RPM service but to provide a general framework and systematic methodology for RPM services based on tunable parameters (e.g., health improvement or deterioration probabilities, monitoring options and invasiveness costs) so as to make justifiable monitoring choices. The parameters will have to be decided on and tuned by medical/clinical experts for condition-specific RPM services. While we work with a simplified model here, the intuition gained from the analysis can help clinicians take more informed monitoring decisions.

In section II, we develop the model of evolution of the patient health state under ordinary and intensive monitoring and demonstrate how it can be managed, using the methodology of dynamic programming. In section III, we provide some analytical results on the optimal management policy in the asymptotic regime of a large number of health states and provide conditions on the parameters for choosing ordinary vs intensive monitoring. In section IV, we numerically investigate the structure of optimal monitoring policies and demonstrate that they place the patient under intensive monitoring when the health state deteriorates below a certain threshold; otherwise they use ordinary monitoring. Finally, in section V, we discuss some extensions. Appendix A contains proofs for results presented in section III.

II The RPM Service Model

Consider a patient who can be in a health state $h_{t}\in\{0,1,2,3,...,H\}$ in each service time period $t\in\{0,1,2,3,...\}$ . The RPM service places the patient in a monitoring/intervention state $m_{t}\in\mathcal{M}=\{o,i\}$ in each time period $t$ , abbreviated to monitoring state, where $o$ denotes ordinary monitoring and $i$ intensive monitoring. Thus, one can view the monitoring-patient joint state $s_{t}\coloneqq(m_{t},h_{t})\in\mathcal{M}\times\mathcal{H}\eqqcolon\mathcal{S}$ as the system or service state at time $t$ .

A higher patient health state $h\in\{0,1,2,3,...,H\}$ corresponds to the patient having better health. In particular, the lowest health state $0$ is critical in the sense that, when the patient drifts into that state, they go beyond the scope of the current service model; at that point other emergency and/or more severe medical interventions are required, which are outside the scope of this service. Because of that, the states $(i,0)$ and $(o,0)$ are absorbing for the Markovian evolution of the health state, as explained below. Indeed, when the patient enters heath state $0$ under any monitoring state $i$ or $o$ , the service evolution stops, as other medical measures/interventions are initiated.

As seen below, in defining costs incurred at the various states, we first take the patient’s quality of life point of view. Under ordinary monitoring, the patient incurs a constant cost $C_{o}\geq 0$ at any state $(o,h)$ with $h\in\{1,2,...,H\}$ . Correspondingly, under intensive monitoring, the patient incurs a constant cost $C_{i}\geq 0$ at any state $(i,h)$ with $h\in\{1,2,..,H\}$ . These costs reflect the invasiveness loss for the patient. One may argue about the costs in more elaborate ways, for example, including patient risk factors and operational considerations of the service. For simplicity, we focus here on the invasiveness argument mentioned above.

Of special interest is the critical health state $h=0$ , where this model ceases to apply. On either state $(o,0)$ or $(i,0)$ where the patients health is critical and a cost $C_{c}$ is incurred.

We finally define the transition costs $C_{o\to i}$ (and $C_{i\to o}$ ), which is associated with the service transitioning the patient from ordinary to intensive monitoring (and vice versa, respectively).

We model the system as a controlled Markov chain. Such models are commonly used for medical decision making [10, 11]. We try to stay as simple as possible, yet still capture the essence of the problem and get insights into its solution. At the beginning of every time period $t$ , the service takes the decision/action (control) to either keep the monitoring state the same (as in the previous time period) or switch it to the alternate monitoring state. Formally, the decision/action space is $\mathcal{A}=\{o,i\}$ and each (state, action) pair is associated with a cost given by the function $c:\mathcal{S}\times\mathcal{A}\mapsto\mathbb{R}^{+}$ . The transition probabilities are given by $p(s^{\prime}|s,a)$ where $s^{\prime},s\in\mathcal{S}$ and $a\in\mathcal{A}$ . The cost functions and transition probabilities are defined as follows.

1. At health state $\mathbf{h=0}$ :

No action is taken with the the service ceasing operation. A cost of $C_{c}$ is incurred.

2. At health states $\mathbf{1\leq h\leq H}$ :

a)
Ordinary Monitoring ( $m=o$ ), no Switching ( $a=o$ ): Does not induce a monitoring change, and the system state transitions as follows:
1. i)
  
  $(o,h)\xrightarrow{a=o}(o,\min\{h+1,H\})$ with prob. $\lambda_{o}$
2. ii)
  
  $(o,h)\xrightarrow{a=o}(o,h-1)$ with prob. $\mu_{o}=1-\lambda_{o}$ ,
and a cost $c\big{(}(o,h),o\big{)}=C_{o}$ is incurred. Note that $\min\{h+1,H\}$ above is used to account for $(o,H)\to(o,H)$ with prob. $\lambda_{o}$ since $H$ is the highest health state.
b)
Ordinary Monitoring ( $m=o$ ) with Switching ( $a=i$ ): Induces a switch to intensive monitoring $m=i$ , and the system state transitions as follows:
1. i)
  
  $(o,h)\xrightarrow{a=i}(i,\min\{h+1,H\})$ with prob. $\lambda_{i}$
2. ii)
  
  $(o,h)\xrightarrow{a=i}(i,h-1)$ with prob. $\mu_{i}=1-\lambda_{i}$ ,
and a cost $c\big{(}(o,h),i\big{)}=C_{o\to i}+C_{i}$ is incurred.
c)
Intensive Monitoring ( $m=i$ ), no Switching ( $a=i$ ): Does not induce a monitoring change, and the system state transitions as follows:
1. i)
  
  $(i,h)\xrightarrow{a=i}(i,\min\{h+1,H\})$ with prob. $\lambda_{i}$
2. ii)
  
  $(i,h)\xrightarrow{a=i}(i,h-1)$ with prob. $\mu_{i}=1-\lambda_{i}$ ,
and a cost $c\big{(}(i,h),i\big{)}=C_{i}$ is incurred.
d)
Intensive Monitoring ( $m=i$ ) with Switching ( $a=o$ ): Induces a switch to ordinary monitoring $m=o$ , and the system state transitions as follows:
1. i)
  
  $(i,h)\xrightarrow{a=o}(o,\max\{h+1,H\})$ with prob. $\lambda_{o}$
2. ii)
  
  $(i,h)\xrightarrow{a=o}(o,h-1)$ with prob. $\mu_{o}=1-\lambda_{o}$ ,
and a cost $c\big{(}(i,h),o\big{)}=C_{i\to o}+C_{o}$ is incurred.

We can easily incorporate health state dependent costs and transition probabilities, but for simplicity we assume constant ones here. We make the following natural assumptions.

Assumption 1.

a)

The transition probabilities satisfy: $\lambda_{i}\geq\lambda_{o}$ .
b)

The costs satisfy: $0\leq C_{o}\leq C_{i}\leq C_{c}$ .

The first assumption 1.a) intuitively states that the patient’s health improves faster under intensive monitoring, rather than under ordinary. Regarding assumption 1.b), it is naturally expected that $C_{o}\leq C_{i}$ , as the patient’s “annoyance” is higher under intensive monitoring/intervention than under ordinary. Further, given the severity of entering the critical state $h=0$ , it is naturally expected that $C_{i}\leq C_{c}$ , and practically $C_{c}$ is expected to be much larger than $C_{i}$ .

II-A Optimal Monitoring Control

We study this problem under the discounted cost setting of the dynamic programming methodology [12], hence, costs incurred $t$ time periods into the future (with respect to present) are discounted by a factor of $\gamma^{t}$ with $0<\gamma<1$ . Starting from state $s_{0}=s\in\mathcal{M}\times\mathcal{H}$ , the total expected (discounted) cost to be incurred is

\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\gamma^{t}c(s_{t},a_{t})+\gamma^{T}C_{c}\Big{% |}\ s_{o}=s\Big{]}

when control action $a_{t}$ is taken by the service, at cost $c(s_{t},a_{t})$ introduced above, when its state is $s_{t}=(m_{t},h_{t})$ at time $t$ , until the patient enters the critical state $h=0$ at time $T$ and the service ceases operation. Hence, $T$ is the time the patient spends in the service, and at time $T$ the critical cost $C_{c}$ is incurred, however, discounted to $\gamma^{T}C_{C}$ . Thus, discounting by $\gamma$ implicitly reflects the patient’s desire to stay longer in service, hence, incur the critical cost further in future and discounted to $\gamma^{T}C_{c}$ .

A (stationary) monitoring policy $\pi(s)$ is a rule mapping each state $s=(m,h)\in\mathcal{M}\times\mathcal{H}$ to a control action $a\in\mathcal{A}=\{o,i\}$ to be taken at that state. The value function $V_{\pi}(s)$ of a policy $\pi(s)$ is the total expected (discounted) cost the system will incur until reaching the critical state $0$ and stop, when it starts from state $s$ at time $t=0$ . That is,

V_{\pi}(s)=\mathbb{E}\left[\sum_{t=0}^{T-1}\gamma^{t}c\Big{(}s_{t},\pi(s_{t})% \Big{)}+\gamma^{T}C_{c}\ \Big{|}\ s_{0}=s\right]

and satisfies the dynamic programming equation [12].

V_{\pi}(s)=c\Big{(}s,\pi(s)\Big{)}+\gamma\sum_{s^{\prime}\in\mathcal{S}}% \mathbb{P}\Big{(}s^{\prime}\mid s,\pi(s)\Big{)}V_{\pi}(s^{\prime}),

for all $s\in\mathcal{S}=\mathcal{M}\times\mathcal{H}$ , given the Markovian evolution dynamics of the system, specified by the state transition probabilities defined above.

The goal is to find an optimal policy $\pi^{*}$ which minimizes $V_{\pi}$ over all policies $\pi$ , i.e., $V_{\pi^{*}}(s)\leq V_{\pi}(s)$ for every $s\in\mathcal{S}$ over all policies $\pi$ . For simplicity, we define $V^{*}(s)\coloneqq V_{\pi^{*}}(s)$ which satisfies the following dynamic programming equation [12].

V^{*}(s)=\min_{a\in\{o,i\}}\Bigg{\{}c(s,a)+\gamma\sum_{s^{\prime}\in\mathcal{S% }}\mathbb{P}\Big{(}s^{\prime}|s,a\Big{)}V^{*}(s^{\prime})\Bigg{\}},

and can be solved numerically to yield the optimal policy $\pi^{*}(s)=\pi^{*}(m,h)$ , that is, what optimal decision to take when the patient is in health state $h$ under monitoring $m$ . For the state transition probabilities and costs defined before, this dynamic programming equation unfolds into:

(i)

For health states $1\leq h\leq H-1$

	$\displaystyle V^{}(i,h)=\min\bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{}(i,h{% +}1)+\mu_{i}V^{*}(i,h{-}1)\Big{]},$
	$\displaystyle\;\;\;\;\;\;\;\;C_{i\to o}+C_{o}+\gamma\Big{[}\lambda_{o}V^{}(o,% h{+}1)+\mu_{o}V^{}(o,h{-}1)\Big{]}\bigg{\}},$

	$\displaystyle V^{}(o,h)=\min\bigg{\{}C_{o}+\gamma\Big{[}\lambda_{o}V^{}(o,h{% +}1)+\mu_{o}V^{*}(o,h{-}1)\Big{]},$
	$\displaystyle\;\;\;\;\;\;\;\;C_{o\to i}+C_{i}+\gamma\Big{[}\lambda_{i}V^{}(i,% h{+}1)+\mu_{i}V^{}(i,h{-}1)\Big{]}\bigg{\}}.$

(ii)

At health state $H$ ,

	$\displaystyle V^{}(i,H)=\min\bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{}(i,H)% +\mu_{i}V^{*}(i,H{-}1)\Big{]},$
	$\displaystyle\;\;\;\;\;\;\;\;C_{i\to o}+C_{o}+\gamma\Big{[}\lambda_{o}V^{}(o,% H)+\mu_{o}V^{}(o,H{-}1)\Big{]}\bigg{\}},$

	$\displaystyle V^{}(o,H)=\min\bigg{\{}C_{o}+\gamma\Big{[}\lambda_{o}V^{}(o,H)% +\mu_{o}V^{*}(o,H{-}1)\Big{]},$
	$\displaystyle\;\;\;\;\;\;\;\;C_{o\to i}+C_{i}+\gamma\Big{[}\lambda_{i}V^{}(i,% H)+\mu_{i}V^{}(i,H{-}1)\Big{]}\bigg{\}}.$

(iii)

At the critical health state $h=0$ ,

V^{*}(i,0)=V^{*}(o,0)=C_{c}.

Note that in every $\min\{\cdot,\cdot\}$ above, the first term corresponds to the control keeping the existing monitoring state, while the second corresponds to the control switching to the alternate monitoring state and incurring the switching cost.

II-B Simplified RPM Service

In order to reduce the number of parameters for tractability of the analysis, we make the following simplification.

Definition 1.

Simplified RPM Service:.

a)

The cost for ordinary monitoring is set to zero: $C_{o}=0$ .
b)

The switching costs are set to zero: $C_{i\to o}=C_{o\to i}=0$ .

Assumption 1 is still satisfied.

Given the limited space of this short paper, we work with the simplified RPM service below (analysis and numerical results), where interesting insights emerge. The zero cost of invasiveness under ordinary monitoring has no significant impact on the results, and is merely a technical assumption to make our analysis easier. The non-zero transition cost is true in several applications where the intensive monitoring just involves a higher rate of collecting data about the patient’s health. We briefly comment on how non-zero transition costs affect our results in Section V.

The simplified RPM service is illustrated in Figure 1 below.

Refer to caption — Figure 1: The Simplified RPM Service. The blue and red arrows represent the possible transitions from state $(o,3)$ for actions $o$ and $i$ respectively. The arrows are labeled with probability of transition and cost incurred.

The dynamic programming equations for $V^{*}$ given above reduce in this case to the following:

(i)

For health states $1\leq h\leq H-1$ ,

\hskip-8.5359ptV^{*}(i,h)=V^{*}(o,h)\\ \hskip-8.5359pt=\min\Bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,h+1)+\mu_{i% }V^{*}(i,h-1)\Big{]},\\ \hskip 38.41139ptC_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,h+1)+\mu_{o}V^{*}(o,h-1% )\Big{]}\Bigg{\}}

(1)

(ii)

At health state $H$ ,

V^{*}(o,H)=V^{*}(i,H)\\ =\min\Bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,H)+\mu_{i}V^{*}(i,H-1)\Big% {]},\\ C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,H)+\mu_{o}V^{*}(o,H-1)\Big{]}\Bigg{\}}

(2)

(iii)

At the critical health state $h=0$ ,

V^{*}(i,0)=V^{*}(o,0)=C_{c},

(3)

Note that, in the absence of switching costs (that is, $C_{o\to i}=C_{i\to o}=0$ ), for any health state $h$ the $V^{*}$ is the same under both ordinary and intensive monitoring.

III Asymptotic Analysis

The optimal policy ${\pi^{*}}$ and value function $V^{*}$ can be computed numerically (in general) from the dynamic programming equations (1)-(3) for any given set of system parameters. But to develop intuition and characterize the optimal policy, in this section, we analyse ${\pi^{*}}$ in the asymptotic regime of a large number of health states $H\gg 1$ , i.e., a dynamic range of health states $H\to\infty$ . This allows for analytic tractability of the optimal policy and closed-form conditions on our policies of interest. We make the following assumptions in this section.

Assumption 2.

Large $H$ Asymptotic Regime.

a)

The number of health states is very large, i.e., the system operates in the asymptotic regime of $H\to\infty$ .
b)

Under ordinary monitoring the patient’s health drifts downwards, i.e., the improvement probability is $\lambda_{o}<0.5$ and the worsening one is $\mu_{o}=1-\lambda_{o}>0.5$

Assumption 2.a) allows us to use tools from random walk analysis, as done in Lemma 1, making the analysis tractable. Assumption 2.b) allows for the Markov chain to remain stable (positive recurrent) in the asymptotic regime.

The results obtained in this asymptotic regime can be thought of as an approximation for the RPM service with finite $H$ , when $H$ grows large. In the next section, we numerically demonstrate that this asymptotic approximation tracks the optimal policy and the value function for our RPM for a number of health states as low as $H=5$ .

We define $\widetilde{\Pi}$ as the the set of policies under which the service chooses the same action irrespective of the monitoring state, that is,

\widetilde{\Pi}=\{\pi\mid\pi(i,h)=\pi(o,h),\;\forall h\geq 1\}.

Based on (1) and (2), for all $h\geq 1$ , we then have

	$\displaystyle{\pi^{}}(o,h)={\pi^{}}(i,h)$
	$\displaystyle=\operatorname{arg\,min}_{\{i,o\}}\{C_{i}+\gamma\lambda_{i}V^{}% (i,h+1)+\gamma(1-\lambda_{i})V^{*}(i,h-1),$
	$\displaystyle\;\;\;\;\;\;C_{o}+\gamma\lambda_{o}V^{}(o,h+1)+\gamma(1-\lambda_% {o})V^{}(o,h-1)\}$

This implies that ${\pi^{*}}\in\widetilde{\Pi}$ and we can restrict our attention to the set $\widetilde{\Pi}$ . For simplicity, we introduce notation $V^{*}(h)=V^{*}(o,h)=V^{*}(i,h)$ for $h\in\mathcal{H}$ and similarly the notation ${\pi^{*}}(h)$ .

We next define an important policy $\pi_{o}$ where the patient stays under ordinary monitoring at all health states, i.e., $\pi_{o}(i,h)=\pi_{o}(o,h)=o$ for all $h\geq 1$ . Note that $\pi_{o}\in\widetilde{\Pi}$ and we define the corresponding value function $V_{o}(h)$ . Our first lemma gives the value function for this policy and presents an important property about the optimal policy.

Lemma 1.

For the simplified RPM (Definition 1) and under Assumption 2,

The value function $V_{o}$ for the policy $\pi_{o}$ is given by:

V_{o}(h)=\phi^{h}C_{c},

where

\phi=\frac{1-\sqrt{1-4\lambda_{o}\mu_{o}\gamma^{2}}}{2\lambda_{o}\gamma}.

Note that $\phi<1$ for $\gamma<1$ .

b)

For any choice of parameters, there exists $h^{\prime}$ such that, under the optimal policy, the patient prefers to stay in ordinary monitoring above health state $h^{\prime}$ , i.e., ${\pi^{*}}(h)=o$ , for all $h\geq h^{\prime}$ .

Proof.

See Appendix. A for proofs. ∎

For the simplified RPM, the cost of invasiveness under ordinary monitoring is zero. Then $V_{o}(h)=\mathbb{E}[\gamma^{T}C_{c}|h_{0}=h]$ , where $T$ is the time taken to reach health state $0$ , when started at health state $h$ and when the patient always stay under ordinary monitoring. $T$ here is precisely the hitting time of state $0$ for a random walk initiated at state $h$ . The proof for this lemma then follows from the moment generating function of the hitting time for an $\infty$ -state random walk.

An important implication of the above lemma is that the policy where the patient chooses to stay in intensive monitoring for all health states is never optimal. Our next theorem shows that the policy $\pi_{o}$ is actually optimal for a large choice of parameters.

Theorem 1.

Under Assumption 2, the policy $\pi_{o}$ is optimal ( ${\pi^{*}}=\pi_{o}$ ) for the simplified RPM (Definition 1) when the parameters satisfy

\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq\frac{C_{i}}{C_{c}}.

Proof.

See Appendix. A for proofs. ∎

We next define a threshold-policy $\pi_{t,\bar{h}}$ characterized by the health state ${\bar{h}}>0$ . These are the policies under which there exists a threshold ${\bar{h}}$ such that the patient stays in intensive monitoring when the patient’s health is below or at the threshold ${\bar{h}}$ and in ordinary monitoring when their health is better than ${\bar{h}}$ . Note that $\pi_{t,\bar{h}}\in\widetilde{\Pi}$ . So $\pi_{t,\bar{h}}(h)=i$ for $1\leq h\leq{\bar{h}}$ and $\pi_{t,\bar{h}}(h)=o$ for $h>{\bar{h}}$ . Our next theorem gives a set of conditions under which $\pi_{t,\bar{h}}$ is the optimal policy for some threshold ${\bar{h}}$ .

Theorem 2.

Under Assumption 2, the policy $\pi_{t,\bar{h}}$ is optimal ( ${\pi^{*}}=\pi_{t,\bar{h}}$ ) for some threshold ${\bar{h}}$ for the simplified RPM (Definition 1) when the following two conditions are satisfied:

a)

$\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}$ .
b)

$\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}\leq 1$ .

Proof.

See Appendix. A for proofs. ∎

Condition a) above is the complement of the condition in Theorem 1. Condition b) is an additional condition which our proof requires for the threshold policy to be optimal. In the asymptotic regime, we strongly believe that condition b) is not necessary, and condition a) alone is sufficient. Hence we believe that in the asymptotic regime, condition a) alone dictates what the optimal policy is and that the optimal policy can only be of two forms - $\pi_{o}$ and $\pi_{t,\bar{h}}$ . This is reinforced by the numerical analysis, presented next.

IV Performance

In this section, we glean insights on the optimal policy by numerically solving the dynamic programming equations given by (1)-(3) to find the optimal policy.

Figure 2 depicts the two policies discussed in the last section and a sample set of parameters under which they are optimal. Figure 2(a) shows the policy $\pi_{o}$ , under which the patient stays in ordinary monitoring at all health states. Figure 2(b) shows the policy $\pi_{t,\bar{h}}$ with threshold $\bar{h}=3$ , where the patient stays in intensive monitoring for health states $h\leq 3$ and in ordinary monitoring for health states $h>3$ . Let model 2(a) use the set of parameters $\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=20,C_{i}=1,C_{o}=0,\gamma=0.9$ and model 2(b) use the set of parameters $\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=60,C_{i}=1,C_{o}=0,\gamma=0.9$ . Then $\pi_{o}$ and $\pi_{t,\bar{h}}$ with $\bar{h}=3$ are optimal for model 2(a) and 2(b), respectively.

Note that the parameters for model 2(a) satisfy $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq C_{i}/C_{c}$ , which is sufficient for Theorem 1 to hold. Similarly the parameters for model 2(b) satisfy $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>C_{i}/C_{c}$ , which is condition (a) in Theorem 2. Note, however, that the parameters in model 2(b) do not satisfy condition (b) of Theorem 2, implying that the condition is not necessary.

When $H$ is finite, there also exist instances where the optimal policy is $\pi_{i}$ , where the patient chooses to stay in intensive monitoring for all health states $h$ . But we observed that this policy is optimal only in extreme cases where $H$ is very small or $\gamma$ is very close to $1$ . Hence we do not further analyse this policy here.

Figure 3 shows how closely our asymptotic analysis in Section III relates to the actual solution of the dynamic programming equations. The parameters are chosen such that the optimal policy is $\pi_{o}$ . We calculate $V^{*}(h)$ for this model, and compare it with the value function $V_{o}(h)=\phi^{h}C_{c}$ obtained for the asymptotic case $H\to\infty$ (Lemma 1). As observed in the plot, the value function obtained for $H=5$ is almost identical to the asymptotic approximation $V_{o}(h)$ . The accuracy of the asymptotic approximation in predicting the optimal policy is further demonstrated by our next result.

We next study the impact of different parameters on the optimal policy ${\pi^{*}}$ . Figure 4(a) shows how the optimal policy (numerically computed) varies with the cost ratio $C_{c}/C_{i}$ . For $C_{c}/C_{i}<20.02$ the optimal policy is $\pi_{o}$ , while for $C_{c}/C_{i}>20.02$ it is $\pi_{t,\bar{h}}$ with varied thresholds. Note that $C_{c}/C_{i}=20.02$ satisfies the condition in Theorem 1 with equality (the value of $C_{c}/C_{i}$ which satisfies $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})=C_{i}/C_{C}$ ). This shows that the condition obtained under the asymptotic assumption is a good indicator for our original problem with finite health states. As the ratio $C_{c}/C_{i}$ grows, the cost incurred on reaching the critical health state increases, and it gets optimal for the patient to stay under intensive monitoring till their health significantly improves.

Figure 4(b) shows how the optimal policy varies as the probability $\lambda_{i}$ increases. The optimal policy is $\pi_{o}$ for $\lambda_{i}<0.28$ and $\pi_{t,\bar{h}}$ with varied thresholds for $\lambda_{i}>0.28$ . Again, $\lambda_{i}=0.28$ solves the condition in Theorem 1 with equality. As the probability $\lambda_{i}$ increases, the probability of the patient’s health improving under intensive monitoring improves, incentivizing the patient to stay under intensive monitoring for longer. Finally, Figure 4(c) shows the impact of $\gamma$ on the optimal policy. As $\gamma$ increases, the patient incurs a higher discounted cost on reaching the critical state, and hence they stay under intensive monitoring for longer.

V Conclusions and Extensions

We have developed a two-tier service architecture for remote patient monitoring (RPM), where the service policy decides whether to place the patient under ordinary or intensive monitoring, given their health state. The optimal policy is first analyzed in asymptotic regimes and conditions are established for choosing ordinary vs intensive monitoring. The policy is then numerically computed and the dependence of its behavior on various key parameters is investigated.

An important extension would be to consider a more general model, which includes non-zero transition costs. Based on numerical experiments performed in the general case, the optimal policy in this case would be a threshold policy with two thresholds instead of the one observed in this paper. A patient under ordinary monitoring would be switched to intensive when their health state deteriorates below a certain lower health threshold, and a patient under intensive monitoring would switch to ordinary when their health state improves above an upper health threshold. There are also other direct extensions, e.g., the costs and probabilities of transitions could also be made dependent on the health state, allowing for a more realistic model.

References

[1] F. A. C. d. Farias, C. M. Dagostini, Y. d. A. Bicca, V. F. Falavigna, and A. Falavigna, “Remote patient monitoring: a systematic review,” Telemedicine and e-Health, vol. 26, no. 5, pp. 576–583, 2020.
[2] L. P. Malasinghe, N. Ramzan, and K. Dahal, “Remote patient monitoring: a comprehensive study,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 57–76, 2019.
[3] A. Zinzuwadia, J. M. Goldberg, M. A. Hanson, and J. D. Wessler, “Continuous cardiology: the intersection of telehealth and remote patient monitoring,” in Emerging Practices in Telehealth. Elsevier, 2023, pp. 97–115.
[4] I. Lee, D. Probst, D. Klonoff, and K. Sode, “Continuous glucose monitoring systems-current status and future perspectives of the flagship technologies in biosensor research,” Biosensors and Bioelectronics, vol. 181, p. 113054, 2021.
[5] M. Masoumian Hosseini, S. T. Masoumian Hosseini, K. Qayumi, S. Hosseinzadeh, and S. S. Sajadi Tabar, “Smartwatches in healthcare medicine: assistance and monitoring; a scoping review,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, p. 248, 2023.
[6] Y. B. David, T. Geller, I. Bistritz, I. Ben-Gal, N. Bambos, and E. Khmelnitsky, “Wireless body area network control policies for energy-efficient health monitoring,” Sensors, vol. 21, no. 12, p. 4245, 2021.
[7] M. I. Maiorino, S. Signoriello, A. Maio, P. Chiodini, G. Bellastella, L. Scappaticcio, M. Longo, D. Giugliano, and K. Esposito, “Effects of continuous glucose monitoring on metrics of glycemic control in diabetes: a systematic review with meta-analysis of randomized controlled trials,” Diabetes Care, vol. 43, no. 5, pp. 1146–1156, 2020.
[8] P. Prahalad, D. Scheinker, M. Desai, V. Y. Ding, F. K. Bishop, M. Y. Lee, J. Ferstad, D. P. Zaharieva, A. Addala, R. Johari et al., “Equitable implementation of a precision digital health program for glucose management in individuals with newly diagnosed type 1 diabetes,” Nature Medicine, pp. 1–9, 2024.
[9] B. W. Heckman, A. R. Mathew, and M. J. Carpenter, “Treatment burden and treatment fatigue as barriers to health,” Current Opinion in Psychology, vol. 5, pp. 31–36, Oct. 2015.
[10] L. N. Steimle and B. T. Denton, “Markov decision processes for screening and treatment of chronic diseases,” in Markov Decision Processes in Practice, pp. 189–222, 2017.
[11] O. Alagoz, et al., “Markov decision processes: a tool for sequential decision making under uncertainty,” Medical Decision Making, vol. 30, no. 4, pp. 474–483, 2010.
[12] D. Bertsekas, Dynamic programming and optimal control. Athena scientific, 2012, vol. II.
[13] W. Feller, An introduction to probability theory and its applications, Volume 2. John Wiley & Sons, 1991, vol. 81.

Appendix A Proofs

A-A Proof for Lemma 1

Proof.

For the policy $\pi_{o}$ , the value function is

V_{o}(h)=\mathbb{E}\left[\sum_{t=0}^{T-1}(\gamma^{t}c(s_{t},\pi_{o}(s_{t})))+% \gamma^{T}C_{c}|h_{0}=h\right],

for all $h\geq 1$ . Here $T$ denotes the time at which the patient reaches health state $0$ . Since $\pi_{o}(s_{t})=o$ , $c(s_{t},\pi_{o}(s_{t}))=0$ for all $t<T$ . This implies that $V_{o}(h)=C_{c}\mathbb{E}[\gamma^{T}|h_{0}=h]$ , where $T$ is the time at which the patient reaches health state $0$ .

Consider a infinite state 1-dimensional random walk with probability of moving forward $\mu_{o}$ and probability of moving backward $1-\mu_{o}=\lambda_{o}$ . Let $t_{0,h}$ be the time taken to hit state $h$ for a random walk initialized at state $0$ . Then $T$ defined above is equal to $t_{0,h}$ for this random walk and $\mathbb{E}[\gamma^{T}|h_{0}=h]=\mathbb{E}[\gamma^{t_{0,h}}]$ . Note that $t_{0,h}=t_{0,1}+t_{1,2}+\ldots,t_{h-1,h}$ . Now,

	$\displaystyle\mathbb{E}[\gamma^{t_{0,h}}]$	$\displaystyle=\mathbb{E}[\gamma^{t_{0,1}+t_{1,2}+\ldots t_{h-1,h}}]$
		$\displaystyle=\mathbb{E}[\gamma^{t_{0,1}}\times\gamma^{t_{1,2}}\times\ldots% \times\gamma^{t_{h-1,h}}]$
		$\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{E}[\gamma^{t_{0,1}}]% \times\mathbb{E}[\gamma^{t_{1,2}}]\times\ldots\times\mathbb{E}[\gamma^{t_{h-1,% h}}]$
		$\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{E}[\gamma^{t_{0,1}}]^{h}$

Here equality (a) follows from the independence of $t_{0,1},\ldots,t_{h-1,h}$ and equality (b) follows from the fact that $t_{0,1},\ldots,t_{h-1,h}$ follow the same distribution. Since $\mu_{o}>\lambda_{o}$ , the probability that the patient reaches state $1$ in the random walk starting at state $0$ is $1$ . Then [13, Chapter XIV, eqn. 4.8] gives us $\mathbb{E}[\gamma^{t_{0,1}}]=\phi$ and $V_{o}(h)=C_{c}\phi^{h},$ where

\phi=\frac{1-\sqrt{1-4\lambda_{o}\mu_{o}\gamma^{2}}}{2\lambda_{o}\gamma}.

Let $h^{\prime}=\lceil\log(C_{i}/C_{c})/\log(\phi)\rceil+1$ . Then

V_{o}(h^{\prime})=C_{c}\phi^{h^{\prime}}\leq C_{c}\phi\times(C_{i}/C_{c})=\phi C% _{i}.

Since $\phi\leq 1$ , note that $V_{o}(h)\leq V_{o}(h^{\prime})\leq\phi C_{i}$ for all $h\geq h^{\prime}$ . Consider a policy $\pi^{\prime}$ such that $\pi^{\prime}(h)=i$ . Then $V_{\pi^{\prime}}(h)\geq C_{i}$ . Now for any $h\geq h^{\prime}$ , $V_{o}(h)<V_{\pi^{\prime}}(h)$ as $\phi<1$ for $\gamma<1$ . This implies that the policy $\pi^{\prime}$ cannot be optimal, and hence under the optimal policy, the patient prefers to stay in ordinary monitoring for all health states $h\geq h^{\prime}$ . This completes the proof for Lemma 1.

∎

A-B Proof for Theorem 1

Proof.

We know that a policy $\pi$ is optimal if and only if

	$\displaystyle c(s,\pi(s))+\gamma\sum_{s^{\prime}}p(s^{\prime}\|s,\pi(s))V_{\pi}% (s^{\prime})$
	$\displaystyle=\min_{a}\left(c(s,a)+\gamma\sum_{s^{\prime}}p(s^{\prime}\|s,a)V_{% \pi}(s^{\prime})\right),$

is true for all states $s$ [12, Proposition 2.2 and 2.3]. Hence policy $\pi_{o}$ is optimal if and only if

	$\displaystyle c(s,o)+\gamma\sum_{s^{\prime}}p(s^{\prime}\|s,o)V_{o}(s^{\prime})$
	$\displaystyle\leq c(s,i)+\gamma\sum_{s^{\prime}}p(s^{\prime}\|s,i)V_{o}(s^{% \prime}),$

for all states $s$ . This implies that $\pi_{o}$ is optimal if and only if

		$\displaystyle\gamma(\lambda_{o}V_{o}(h+1)+\mu_{o}V_{o}(h-1))$
		$\displaystyle\leq C_{i}+\gamma(\lambda_{i}V_{o}(h+1)+\mu_{i}V_{o}(h-1)),$		(4)

for all $1\leq h$ . Subsituting value of $V_{o}(\cdot)$ from Lemma 1, we have that $\pi_{o}$ is optimal if and only if

	$\displaystyle C_{c}\gamma(\lambda_{o}\phi^{h+1}+\mu_{o}\phi^{h-1})\leq C_{i}+C% _{c}\gamma(\lambda_{i}\phi^{h+1}+\mu_{i}\phi^{h-1})$
	$\displaystyle\iff C_{c}\gamma\phi^{h-1}((\lambda_{o}-\lambda_{i})\phi^{2}+(\mu% _{o}-\mu_{i}))\leq C_{i}$
	$\displaystyle\stackrel{{\scriptstyle(b)}}{{\iff}}C_{c}\gamma\phi^{h-1}(\lambda% _{i}-\lambda_{o})(1-\phi^{2})\leq C_{i},$

for all $1\leq h$ . Here (b) follows from the definition that $\mu_{o}=1-\lambda_{o}$ and $\mu_{i}=1-\lambda_{i}$ .

Note that

	$\displaystyle\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq\frac{C_{i}}{C_{c}}$
	$\displaystyle\implies C_{c}\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq C_{i}$
	$\displaystyle\implies C_{c}\gamma\phi^{h-1}(\lambda_{i}-\lambda_{o})(1-\phi^{2% })\leq C_{i}\;\;\forall h\geq 1.$
	$\displaystyle\implies\gamma(\lambda_{o}V_{o}(h+1)+\mu_{o}V_{o}(h-1))$
	$\displaystyle\;\;\;\;\;\;\;\;\;\;\;\leq C_{i}+\gamma(\lambda_{i}V_{o}(h+1)+\mu% _{i}V_{o}(h-1))\;\;\forall h\geq 1.$

Hence if $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq\frac{C_{i}}{C_{c}}$ is satisfied then the policy $\pi_{o}$ is optimal. This completes the proof for Theorem 1. ∎

A-C Proof for Theorem 2

Proof.

Since $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}$ , eqn. (A-B) is not satisfied for $h=1$ . This implies that $\pi_{o}$ is not optimal. This implies that under the optimal policy the patient will stay under intensive monitoring for some state. Define $Q^{*}(h,i)=C_{i}+\gamma(\lambda_{i}V^{*}(h{+}1)+\mu_{i}V^{*}(h{-}1))$ (respectively, $Q^{*}(h,o)=\gamma(\lambda_{o}V^{*}(h{+}1)+\mu_{o}V^{*}(h{-}1))$ ). $Q^{*}(h,i)$ denotes the Q-functions, where action $i$ (or $o$ , respectively) is taken when initialized at health state $h$ and then actions are taken using the optimal policy. Note that ${\pi^{*}}(h)=o$ if $Q^{*}(h,o)<Q^{*}(h,i)$ , and ${\pi^{*}}(h)=i$ otherwise.

Suppose $Q^{*}(h,o)-Q^{*}(h,i)$ is monotonically decreasing as $h$ increases, then if action $o$ is optimal at some $h$ , then it will also be optimal at $h+1$ and so on (i.e., $Q^{*}(h,o)-Q^{*}(h,i)\leq 0\implies Q^{*}(h+1,o)-Q^{*}(h+1,i)$ ). As condition (a) already enforces that policy $\pi_{o}$ is not optimal, action $i$ has to be taken at state under the optimal policy. Also, Lemma 1 part (b) shows that there exists $h^{\prime}$ such that the optimal policy for health states above $h^{\prime}$ is $o$ , which implies that policy $o$ is played at some state. Hence the optimal policy has to be $\pi_{t,\bar{h}}$ for some $\bar{h}$ .

Hence we just need to show that $Q^{*}(h,o)-Q^{*}(h,i)$ is monotonically decreasing as $h$ increases to prove that $\pi_{t,\bar{h}}$ is optimal.

We can show that

Q^{*}(h,o)-Q^{*}(h,i)=\gamma(\lambda_{i}-\lambda_{o})(V^{*}(h-1)-V^{*}(h+1))-C% _{i}.

Now,

	$\displaystyle\big{(}Q^{}(h+1,o)-Q^{}(h+1,i)\big{)}-\big{(}Q^{}(h,o)-Q^{}(h% ,i)\big{)}$
	$\displaystyle=\gamma(\lambda_{i}-\lambda_{o})\Big{(}V^{}(h)-V^{}(h+2)$
	$\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;-V^{}(h-1)+V^{}% (h+1)\Big{)}$

Hence if $V^{*}(h)+V^{*}(h{+}1)\leq V^{*}(h{-}1)+V^{*}(h{+}2)$ is true for all $h\geq 1$ then $Q^{*}(h,o){-}Q^{*}(h,i)$ is monotonically decreasing with $h$ . Now we know using [12, Proposition 2.2] that

V^{*}(h)\leq\gamma\lambda_{o}V^{*}(h+1)+\gamma\mu_{o}V^{*}(h-1),

and

V^{*}(h+1)\leq\gamma\lambda_{o}V^{*}(h+2)+\gamma\mu_{o}V^{*}(h).

With some further manipulation, we can show that

	$\displaystyle V^{}(h)+V^{}(h+1)$	$\displaystyle\leq\frac{\gamma\lambda_{o}(1+\gamma\lambda_{o})}{1-\gamma^{2}% \lambda_{o}\mu_{o}}V^{*}(h+2)$
		$\displaystyle+\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu% _{o}}V^{*}(h-1).$

We can show that $\frac{\gamma\lambda_{o}(1+\gamma\lambda_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}$ is always less than $1$ for $\lambda_{o}\leq 0.5$ . Hence if $\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}\leq 1$ , then $V^{*}(h)+V^{*}(h{+}1)\leq V^{*}(h{-}1)+V^{*}(h{+}2)$ and hence $Q^{*}(h,o){-}Q^{*}(h,i)$ is monotonically decreasing with $h$ . Under the additional condition that $\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}$ , this implies that $\pi_{t,\bar{h}}$ is the optimal policy for some threshold $\bar{h}$ . ∎