Unstable C2C Devices - Suggestions

iquix · July 2, 2024, 10:29am

We all know that 3rd party C2C(cloud-to-cloud) devices don’t work properly on SmartThings.
This needs to be fixed.

1. Technologies of C2C

First, let’s review underlying technology of the C2C, especially SmartThings Schema.

When establishing connection between ST cloud and 3rd party server, two separate OAuth tokens are created.

[token1] ST cloud → 3rd party server
[token2] 3rd party server → ST cloud

Let’s make an example of a Wall Switch which is connected with 3rd party server.

[token1] is used when we send command to the device.
e.g. Pressing on/off from the SmartThings app, which requires the command to be sent to the 3rd party server to send the ‘on command’ to the switch.

[token2] is used to make a callback to ST cloud when the device status gets changed.
e.g. When pressing physical button on the wall switch, which evntually changes status of the device, then this change should be sent to the ST cloud by callback.

2. What makes C2C so unstable?

Both side of the server (ST cloud and 3rd party server) needs to secure OAuth token, and OAuth server on both side should respond to OAuth refresh request correctly.
But this is not happening in the real world situation of SmartThings C2C schema.

Actually, sending commands to C2C devices is relatively okay [token1], but most of the problems come from status callback [token2].

Since, discovery and state refresh request is called every 24 hours, when then [token1] is broken, the problem could be easily detected by SmartThings cloud at least in 24 hours.

24 hours is such a LONG time for users. but at least, it can be detected thereafter though.

Then, push notificiation is sent to user from SmartThings app, and the user is forced to undergo login process to the 3rd pary account, then both [token1] and [token2] get fixed.

…

However, when the [token2](=callback token) is broken (whether it got lost by an error of the 3rd party server, or ST cloud server failed to respond refresh request of the [token2]), it cannot be easily detected by ST cloud.

In this situation,

wrong device status shown in ST app.
The worse problem is, when a user makes automation which is triggered by C2C device, this automation won’t work.
The worst problem is, there’s NO WAY to fix [token2] error. The only solution is to REMOVE this C2C connection and RECREATE the C2C connection. Then the device is removed from all the automations and scenes that are related with the device.

3. Google’s case

Now, let’s see how other companies do.

We all know that, Google Home integrates the most IoT devices, and works stable, even if it intensively relies on C2C.
Now, let’s see inside of Google Home.

Google Home allow use FIXED token for the callback.
Below is the screen capture of Google’s Home Graph(Google Home C2C IoT) Service Account Key settings with my HomeAssistant server.

They know fixed key could cause security problem, so Google says they crawl the web for the possible leak of the fixed token, and if so, then they disable the token.

Google also provides something called “Workload Identity Fedration”.
It’s a short lived token, like ST uses, but it uses Workload Identity Pool Providers from services like AWS, Azure etc, which are reliable key maintainers.
Google KNOWS that those small IoT companies are NOT capable to maintain OAuth callback token healthy.

Anyway, for now, Google BOTH allows FIXED token and Workload Identity Fedration.
Which means, they still allow using fixed token for status callback(=[token2]).
Thats because even if Google knows about the security risk, but they know better about poor capabilities of their small partners, and more, they CARE FOR USERS’ EXPERIENCES.

4. My suggestions

Past several years, users of SmartThings C2C got pissed off by the errors, especially from the errors of callback token [token2].

Best solution is to keep the OAuth server healthy on both side (ST[token1] and 3rd party[token2]), BUT in real world situation, it seems impossible to acheive this.

I suggest followings.

ST app should allow users to FORCE re-login to the partner’s server even if it seems OK from SmartThings side.

캡처396×667 18.8 KB

For now, user can’t re-login to the partner server unless it shows ‘disconnected’ in the ST app.
As I mentioned above, even if it looks okay from the SmartThings side, it may be broken in reality.
And the only way to fix this is to DELETE and RE-LOGIN, but this removes all the device names, automation, scenes etc… and after re-login, user should reset all those things. VERY BAD USER EXPERIENCES.
If ST app allows users to FORCE re-login, then user could fix C2C problem more easily without loosing all their settings.

This is the temporary solution of the problem, but this can be applied IMMEDIATELY, and make users experience better.

Do polling frequently and detect error

ST should not rely on callback from partners. We all know that this is not reliable at all.
Polling interval should be much shorter than once in 24 hours. As far as I know, Google Home polls parters even more frequently than SmartThings do on their C2C parterns.
Also, ST could detect whether commanding system is broken by receiving errors while polling, and detect whether callback system is broken by discrepancies between polling status and callback status.
If ST detects error in either side, ST should mark that C2C connection as ‘disconnected’ and notify the user to re-login.

Make change for the callback token

Fundamental solution is to make the callback system robust.

Long Lived access token could be one solution. By setting expiration date longer for the callback token [token2], ST and partners doesn’t need to fix the codes - backward compatiblity.
I know this can be security concern for ST, but USERS CAN’T TOELRATE MORE WITH THIS UNSTABLE C2C SYSTEM.
Also ST can make use of Google’s solution for short lived token.

I hope SmartThings have the WILL to fix unstable C2C devices.

nayelyz · July 3, 2024, 2:57pm

Hi, @iquix

Thank you for the feedback provided, we understand it’s frustrating when third-party clouds lose the Reciprocal Access tokens, especially because it’s their responsibility to handle them correctly and ensure the callbacks don’t get interrupted causing issues for the users.
That said, the Schema developer can force the grantCallback interaction to get a new reciprocal token to fix the issue with the callbacks by sending requestGrantCallbackAccess set to True in the discovery response whose request executes every 24 hours as you mentioned, or when we go to Linked services and select the schema to refresh it.
Here’s the info about it:

So, the third-party cloud should have good management of tokens and errors to detect when a reciprocal access token’s refresh is consistently failing and correct this by sending the discovery response described above (only when needed, the overuse of this flag is highly disadviced).

In case the third party doesn’t support this solution, you can call SmartThings Support for them to forcefully invalidate the accessTokens to the linked account and log in again.

Also, it’s hard for the SmartThings side to know that the third party is failing to refresh the reciprocal access token because they lost the refresh token, it could be a temporary issue with their system sending incorrect requests, so, if it automatically logged you out, it could cause more confusion or inconveniences.

Shina_System_Co_Ltd · July 10, 2024, 6:34am

ST responds with TOKEN-EXPIRED or INVALID-TOKEN when a callback request is made with an invalid access token.

Implementing refresh immediately the access token upon seeing this response is not difficult. (We immediately do refresh and try the callback again.)

However, if the refresh token is missed, it can become a significant problem as mentioned above (maybe [token3]?).

This is an issue we have actually experienced.

It is unclear how SmartThings has implemented this. I am curious if they have implemented the refresh token as a read-and-burn token or if it can be reused multiple times.

As it is still possible, albeit rare, for the refresh token to be broken, (even a few hundred milliseconds of network problem can cause this issue) and the first proposed solution is very reasonable. I believe this feature should definitely be included.

requestGrantCallbackAccess is a flag that didn’t exist when I was developing, but it seems to be newly added. It looks useful in situations where the refresh token is compromised.

iquix · July 10, 2024, 8:45am

In real world situation, 3rd party cloud never detects token errors, and the frustration of user never ends.

I can’t call for help to SmartThings Support everytime I get this kind of error, and wait a couple of days for ST to forcefully invalidate accesToken. This happens very frequently.

That’s why I suggest force re-login (which equals to invalidate access token) feature in the Linked Services in ST app.

From the viewpoint of 3rd party companies, fixing C2C problem seems not easy.
Not enough information on new changes (as Shinasys engineer mentioned), lack of manpower to fix the problem, and etc… That’s why C2C devices never get stable.

I hope SmartThings let 3rd-party-partners know such a good feature like requestGrantCallbackAccess flag. (in some sort of educational sessions in SDC? or some good sample codes or etc…),
so ST C2C connection would be more healthy.

Unstable C2C Devices - Suggestions

Customers

Developers

Download the SmartThings App