General background information for all approaches
As an example let’s have a look on a transfer of dishes reference to some device. Let’s assume that the device sends the request to URL “/service/dishes/update”, the exchange uses the http protocol in JSON format (www.json.org). The server has a table “dishes” with fields: id (entry identificator), name (name of a dish), updated (the moment of dish update; it is better to support timezone “YYYY-MM-DDThh:mm:ssTZD”, i.e. “1997-07-16T19:20:30+1:00”), is_deleted (a sign of a removed entry). Note about the last field. By default its meaning equals 0. In the application where entities are synchronized between client and server, it is not recommended to physically delete data from the server (in order to exclude errors), so the removed dishes have is_deleted = 1. Once the device get the entity with is_deleted=1, it is deleted from the device. In all approaches described below, server returns JSON objects array to devices (it can be empty):[
{
[
{
Principles of data updating on a device
- If the received element exists on the device and isDeleted = 0, then it is updated
- If the received element doesn’t exist on the device and isDeleted = 0, then it is added
- If the received element exists on the device and isDeleted = 1, then it is deleted
- If the received element doesn’t exist on the device and isDeleted = 1, then nothing is done
Approach 1: All data is always synchronized
It is the simplest method. The device requests a list of dishes from a server. The server sends the entire list at once. Every time the whole list is received. It is not sorted. An example of the request: null, or “{}”
- simple server-side logic – always send everything
- simple logic on a device – always overwrite everything
- if the list is requested often (every 10 minutes), there will be a huge internet traffic
- if the list is requested seldom (once a day), the data may be out-of-date
- applications with small traffic
- transfer of data that doesn’t change often (list of cities, categories)
- transfer of application settings
- at the beginning of the project for the first prototype of the mobile application
Approach 2: Only updated data is synchronized
The device requests a list of dishes updated from the last synchronization. The list is received being sorted by “updated” in ascending order (it is not obligatory, just more useful). The device stores the value “updated” in the dish that has been sent the last and sends it to the server in parameter “lastUpdated” at the next request. Server sends a list of dishes which are newer than “lastUpdated” (updated > lastUpdated). The first request to server is “lastUpdated” = null. An example of the request: { lastUpdated: “2013-01-01 00:00:00” }
Approach 3: Chunk by chunk synchronization
Mobile devices have a small amount of random access memory. If a list consists of 3,000 dishes, then parsing of a large JSON line from server to device objects may cause a memory shortage. In this case the application will either have an emergency shut down or won’t save the 3,000 dishes. Even if the device manages to receive the whole line, the productivity of the application while synchronization will be slow (interface lags, not a smooth scrolling, etc). So, it is necessary to request a list by small chunks. To do so the device sends one more parameter (“amount”), that sets the size of chunk. The list sent must be sorted by field “undated” in the ascending order. Similar to the previous approach a device remembers the value “updated” in the last sent entity and sends it to the field “lastUpdated”. If a server sends an equal number of entities, then the device continues the synchronization and sends a request with an updated “lastUpdated” again. If a server sends less entities, it means that it doesn’t have new data and the synchronization finishes.
Approach 4: Correct chunk by chunk synchronization
In the previous approach it is possible to have a situation when a table has 250 dishes with similar “updated” (i.e. “2013-01-10 12:34:56”) and the size of the chunk equals 100, then only first 100 entities are recieved. The last 150 will be cut by a strict condition (updated > lastUpdated). Why will it be so? When first 100 entities are requested, “lastUpdated” is set as “2013-01-10 12:34:56”, and the next request has a condition (updated > “2013-01-10 12:34:56”). If you make the condition less strict (updated >= “2013-01-10 12:34:56”), it won’t help, as the device will request first 100 entities endless number of times. The situation with similar “updated” is not that rare. For example, when data is imported from a text file, field “updated” is set in NOW(). Import of the file with thousands of lines can take less than one second. It can happen that the whole directory will have a similar “updated” value. To correct it, it is necessary to use some field of the dish that is unique, at least within one moment (“updated”). Field “id” is unique for the whole table, so it is needed to additionally synchronize it. So, the implementation of the approach looks the following way. A server gives a list sorted by “updated” and “id”, a device requests data with the help of “lastUpdated” and a new parameter “lastId“. The server gets a more complicated selection condition: ((updated > lastUpdated) OR (updated = lastUpdated and id > lastId)).
Approach 5: Synchronization when the data on device is known
Previous approaches do not consider the fact that in reality server doesn’t know how far successfully was the data saved on a device. The device may simply not save half of the data due to unexpected errors. That is why it would be good to receive a confirmation from the device that all (or not all) dishes are saved. Besides a user also can apply application settings so that he needs only part of data. For example, if it is needed to synchronize dishes from only 2 cities out of 10. The synchronizations described above can’t provide it. The idea is the following. In a separate table (“stored_item_list”) server stores information about dishes that are in a device. It can be just a list of pairs “id – updated”. This table has all lists of pairs “id – updated” of all dishes of the all devices. Information about dishes that are in a device (a list of pairs “id – updated“) is sent to server together with the synchronization request. Once requested, server checks what dishes should be in a device and what dishes are there now. After this the difference is sent to the device. How does a server detects what dishes need to be in a device? In the simpliest case server sends a request that returns a list of pairs of all dishes “id – updated” (i.e. SELECT id, updated FROM dishes). On a scheme it is done with a method “WhatShouldBeOnDeviceMethod()”. Here is a disadvantage of the approach: server needs to calculate what needs to be on a device (sometimes executing a difficult SQL requests). By analyzing these two lists, server decides what needs to be sent to a device and what is needed to be removed. On a scheme it is “delta_item_list”, that is why the request has no “lastUpdated” and “lastId”, their tasks are executed by pair “id – updated”. How does server know about the dishes existing in a device? A new parameter “items” is added to the server request. The parameter contains a list of dish ids, that have been sent to the device in the last synchronization (“device_last_stored_item_list”). Surely you can send a list of all dish ids that are in the device without making the algorithm more complex. However if the device has 3,000 dishes and they are sent everytime, then the traffic will be huge and in the majority of synchronizations “item” will be empty. The server needs always to refresh its “stored_item_list” with data that returns from the device in parameter “items”.
Conclusion
Summary table on characteristics of approaches:№ | Approach | Traffic volume (5 – big) | Development complicity (5 – high) | Device memory usage (5 – high) | Data correctness on device (5 – high) | Possibility of selecting a certain device |
1 | All data is always synchronized | 5 | 1 | 5 | 5 | no |
2 | Only updated is synchronized | 1 | 2 | 5 | 3 | no |
3 | Chunk by chunk synchronization | 1 | 3 | 1 | 3 | no |
4 | Correct chunk by chunk synchronization | 1 | 3 | 1 | 3 | no |
5 | Synchronization when the data on device is known | 2 | 5 | 2 | 5 | yes |