This repository has been archived on 2022-09-21. You can view files and clone it, but cannot push or open issues or pull requests.
simplexmq/rfcs/done/2022-06-13-db-sync.md

6.6 KiB

DB access and processing messages for iOS notification service extension

Problem

The only way to receive/process notificaitons is via a separate NSE process that requires a concurrent DB and network access.

SQLite concurenncy does not work, so we need to sync database access.

The problem is complex, as we do not directly control db access from the app, it can be triggered by the message arriving and it may fail to complete in case the app is suspended in the background.

So we need to prevent db access from starting when we know the app is about to be suspended.

The last problem is how to receive and process messages in NSE - should it use recently added GET command or should it subscribe to the connections that receive messages and process messages normally.

To summarize, 2 problems need to be solved:

  1. sync db access between 2 (or more, if we add share extension) processes

  2. prevent access from starting when the process is due to suspend, only complete operations.

  3. Receiveing and processing agent messages in NSE

Proposed solution

For problem 1, we can use Posix semaphores from our core code, in the same bracket that manages database connection - it would wait for semaphore to be free and unlock it once the db operation is complete.

For problem 2, we need to communicate from the app when it goes to the background to prevent database access starting and completing before the suspension. This would set some aboutToSuspend STM flag (or the opposite) that would prevent operations from progressing (using STM retry, that would block until the flag has the value allowing operation to progress).

Several possibilities can be considered:

  • use this flag in the bracket that provides DB connections. While simple, it may lock some operations in the middle and may also lead to the situation when network operation succeeds but database access was blocked, and the database is not updated.
  • use this flag to stop network operations that would require database updates - like sending messages, subscriptions and ACK - all these operations would require database access once they succeed.
  • use two flags, for both cases above, but set them at different times since going to background - block new network operations as soon as the app goes to the background and block database access once the app is about to be suspended.

The last option seems more robust. To do it, there will be an api allowing the app to communicat its phase:

  • app going to the background would trigger blocking new network operations and start a new background task - background phase.
  • background task receiving completion warning, or, maybe, some time after it is started - probably 20 seconds - or whatever happens earlier - would trigger call blocking db access - suspended phase.
  • app becoming active would trigger unblocking both flags - active phase.

/_app phase <phase> where phase can be one of the above values.

NSE would also use the same phases:

  • sending active when it is started (the process starts as active, but it is possible that the new notification arrives to the same process, after the previous one sent background/suspension)
  • sending background + suspended (or suspended should set both flags) once it is finished processing the notification, provided no new notification arrived and started processing - this should be tracked in NSE.

For problem 3, NSE can do one of the following:

  • use SUB and process messages normally - the downside is that the app will have to resubscribe and it has to be tracked.
  • use GET and process messages by pushing them through the processing function - the downside it that a rewiring of message processing is needed.
  • use GET but deliver messages through the same queue as when they arrive normally (in which case getMessage agent function should not return the message, but will return a flag showing whether the message was received, or, possibly, or, possibly will return a message but the message would also be sent to the queue?).
  • process messages in agent manually and in chat via the queue.

One of the downside of GET is that it requires calling GET again after ACK. We could have two variants of ACK (or additional ACK parameter) - one that never delivers a new message, and another one that does. In this case, if get needs to process the next message (when the current one has no notification flag), it can call ACK that delivers the next message. But, it is probably a premature optimization, and having general support of batched commands would add more value.

Additional problem is concurrency in NSE - if the new notification from the same queue arrives before the current one finishes processing in the same process one of the following can happen:

  • the 2nd notification naively call GET and receives the same message.
  • the 2nd notification waits until the first finished processing, in which case it can run out of time.

The problem is that the app won't know it's the same queue, as nId is encrypted, so the agent should handle this scenario when the new call to getMessage is made before the previous one finished processing, and differentiate between calls made for additional messages (possibly, getMessage should include previous message ID, if it is available) and the first call.

EDIT: GETs have to be sent from UI to chat and from chat to agent as function calls, but the agent will have to queue get calls to make sure they return different messages. GET call would return message flags (incl. notification flag), so that the UI can send the next GET if needed without waiting.

Considered alternative: include notification content in the message and have NSE only perform decryption, without any network IO. In this case notification content would be in SEND and in NMSG, e2e encrypted.

While promising, as it solves network coordination issues and makes GET unnecessary, it creates mutliple other problems, so it was rejected:

  • message content is exposed to centralized ntf and apns servers, creating additional attack vector.
  • it adds complexity in security critical parts of the stack - double ratchet encryption, as it requires either storing message keys and using different IVs for notifications, or initializing completely separate ratchet for notifications content.
  • it reduces the size of the message.
  • it makes user experience worse, as:
    • it would not accelerate handshake for new contacts and for file delivery - this approach only works for content messages.
    • it would open the app without the new messages - the users would have to wait until the messages are received. It is also bad for "security optics" - the users might think that the message content was exposed to notifications.