On Time, On Point, On Budget!

Mobile Application Development: Synchronization With a Server

Vladimir Dolgopolov A big part of modern applications for mobile platforms (iOS, Android, etc.) has a server part. An application with out-of-date data loses its usefulness. It is important to provide a constant data update between server and device. Situation is the same for offline applications that also need to work without internet. Completely online applications don’t work (or are useless) without internet (i.e. Foursquare, Facebook). There are their own specifics, which go beyond the current article. I would like to describe the approaches we use for data synchronization based on the example of one of our offline applications. We developed simple algorithms in the first versions, and later the algorithms were perfected as we became more experienced. The article will present the same sequence starting from simple obvious practices to more complicated ones. It is necessary to mention that there will be only one-way data transfer (from server to device) in the article. A server here is a source of data.

General background information for all approaches

As an example let’s have a look on a transfer of dishes reference to some device. Let’s assume that the device sends the request to URL  “/service/dishes/update”,  the exchange uses the http protocol in JSON format (www.json.org). The server has a table “dishes” with fields: id (entry identificator), name (name of a dish), updated (the moment of dish update; it is better to support timezone “YYYY-MM-DDThh:mm:ssTZD”, i.e. “1997-07-16T19:20:30+1:00”), is_deleted (a sign of a removed entry). Note about the last field. By default its meaning equals 0. In the application where entities are synchronized between client and server, it is not recommended to physically delete data from the server (in order to exclude errors), so the removed dishes have is_deleted = 1. Once the device get the entity with is_deleted=1, it is deleted from the device. In all approaches described below, server returns JSON objects array to devices (it can be empty):
[ {
id: , name: , updated:, isDeleted: }, … ] An example of a server reply:
[ {
id: 5625, name: “Bread”, updated: “2013-01-06 06:23:12”, isDeleted: 0 }, { id: 23, name: “Cooked semolina”, updated: “2013-02-01 14:44:21”, isDeleted: 0 }, { id: 533, name: “Fish-soup”, updated: “2013-08-02 07:05:19”, isDeleted: 0 } ]

Principles of data updating on a device

  1. If the received element exists on the device and isDeleted = 0, then it is updated
  2. If the received element doesn’t exist on the device and isDeleted = 0, then it is added
  3. If the received element exists on the device and isDeleted = 1, then it is deleted
  4. If the received element doesn’t exist on the device and isDeleted = 1, then nothing is done

Approach 1: All data is always synchronized

It is the simplest method. The device requests a list of dishes from a server. The server sends the entire list at once. Every time the whole list is received. It is not sorted. An example of the request: null, or “{}” img35423456 Advantages:
  • simple server-side logic – always send everything
  • simple logic on a device – always overwrite everything
Disadvantages:
  • if the list is requested often (every 10 minutes), there will be a huge internet traffic
  • if the list is requested seldom (once a day), the data may be out-of-date
Area of usage:
  • applications with small traffic
  • transfer of data that doesn’t change often (list of cities, categories)
  • transfer of application settings
  • at the beginning of the project for the first prototype of the mobile application

Approach 2: Only updated data is synchronized

The device requests a list of dishes updated from the last synchronization. The list is received being sorted by “updated” in ascending order (it is not obligatory, just more useful). The device stores the value “updated” in the dish that has been sent the last and sends it to the server in parameter “lastUpdated” at the next request. Server sends a list of dishes which are newer than “lastUpdated” (updated > lastUpdated). The first request to server is “lastUpdated” = null.   An example of the request: { lastUpdated: “2013-01-01 00:00:00” } img-1365345 Here “last_updated” is a value which is stored in a device. Usually a device has a separate table for storing these values “last_updated” for every entity (dishes, cities, organizations, etc). This approach is good for synchronization of simple linear lists which rules of being sent to a device are similar for all the devices. For even more selective synchronization, have a look at Approach 5. Synchronization when the data on device is known. Usually this approach meets the majority of needs. The device gets only new data, synchronization can be done even every minute and traffic will be small. However there are problems connected with memory and processor limitations on mobile devices.

Approach 3: Chunk by chunk synchronization

Mobile devices have a small amount of random access memory. If a list consists of 3,000 dishes, then parsing of a large JSON line from server to device objects may cause a memory shortage. In this case the application will either have an emergency shut down or won’t save the 3,000 dishes. Even if the device manages to receive the whole line, the productivity of the application while synchronization will be slow (interface lags, not a smooth scrolling, etc). So, it is necessary to request a list by small chunks. To do so the device sends one more parameter (“amount”), that sets the size of chunk. The list sent must be sorted by field “undated” in the ascending order. Similar to the previous approach a device remembers the value “updated” in the last sent entity and sends it to the field “lastUpdated”. If a server sends an equal number of entities, then the device continues the synchronization and sends a request with an updated “lastUpdated” again. If a server sends less entities, it means that it doesn’t have new data and the synchronization finishes. img-25675768 Here “last_updated” and “amount” are values which are stored in mobile application. “last_item” is the last entity (dish) sent from server. Next list to be requested is newer than this value. An example of the request: { lastUpdated: “2013-01-01 00:00:00”, amount: 100 } Advantages: The device gets the amount of data that it is capable to proceed at once. The size of chunk is determined with practical tests. Simple entities can be synchronized 1,000 pieces at one time. However sometimes it may happen that entities with a big number of fields and difficult logics of proceeding and saving can be normally synchronized not more than 5 items at once. Disadvantages: If you have 250 dishes with similar “updated”, then when amount = 100, the last 150 are not reached by the devices. This situation is quite real and is described in the next approach.

Approach 4: Correct chunk by chunk synchronization

In the previous approach it is possible to have a situation when a table has 250 dishes with similar “updated” (i.e.  “2013-01-10 12:34:56”) and the size of the chunk equals 100, then only first 100 entities are recieved. The last 150 will be cut by a strict condition (updated > lastUpdated). Why will it be so? When first 100 entities are requested, “lastUpdated” is set as  “2013-01-10 12:34:56”, and the next request has a condition (updated > “2013-01-10 12:34:56”). If you make the condition less strict (updated >= “2013-01-10 12:34:56”), it won’t help, as the device will request first 100 entities endless number of times. The situation with similar “updated” is not that rare. For example, when data is imported from a text file, field “updated” is set in NOW(). Import of the file with thousands of lines can take less than one second. It can happen that the whole directory will have a similar “updated” value. To correct it, it is necessary to use some field of the dish that is unique, at least within one moment (“updated”). Field “id” is unique for the whole table, so it is needed to additionally synchronize it. So, the implementation of the approach looks the following way. A server gives a list sorted by “updated” and “id”, a device requests data with the help of “lastUpdated” and a new parameter “lastId“. The server gets a more complicated selection condition: ((updated > lastUpdated) OR (updated = lastUpdated and id > lastId)). img-35786867867 Here “last_updated”, “last_id” and “amount”  are values that are stored in a mobile application. “last_item” is the last sent entity sent from server (dish). The next list that is requested will be newer than this value.

Approach 5: Synchronization when the data on device is known

Previous approaches do not consider the fact that in reality server doesn’t know how far successfully was the data saved on a device. The device may simply not save half of the data due to unexpected errors. That is why it would be good to receive a confirmation from the device that all (or not all) dishes are saved. Besides a user also can apply application settings so that he needs only part of data. For example, if it is needed to synchronize dishes from only 2 cities out of 10. The synchronizations described above can’t provide it. The idea is the following. In a separate table (“stored_item_list”) server stores information about dishes that are in a device. It can be just a list of pairs “id – updated”. This table has all lists of pairs “id – updated” of all dishes of the all devices. Information about dishes that are in a device (a list of pairs “id – updated“) is sent to server together with the synchronization request. Once requested, server checks what dishes should be in a device and what dishes are there now. After this the difference is sent to the device. How does a server detects what dishes need to be in a device? In the simpliest case server sends a request that returns a list of pairs of all dishes “id – updated” (i.e. SELECT id, updated FROM dishes). On a scheme it is done with a method “WhatShouldBeOnDeviceMethod()”. Here is a disadvantage of the approach: server needs to calculate what needs to be on a device (sometimes executing a difficult SQL requests). By analyzing these two lists, server decides what needs to be sent to a device and what is needed to be removed. On a scheme it is “delta_item_list”, that is why the request has no “lastUpdated” and  “lastId”, their tasks are executed by pair “id – updated”. How does server know about the dishes existing in a device? A new parameter “items” is added to the server request. The parameter contains a list of dish ids, that have been sent to the device in the last synchronization  (“device_last_stored_item_list”). Surely you can send a list of all dish ids that are in the device without making the algorithm more complex. However if the device has 3,000 dishes and they are sent everytime, then the traffic will be huge and in the majority of synchronizations “item” will be empty. The server needs always to refresh its “stored_item_list” with data that returns from the device in parameter “items”. img-47897897897 It is necessary to implement a mechanism of clearing server data in  stored_item_list. For example, after reinstalling an application on a device, a server thinks that there is still actual data in it. So when the application is installing, the device needs somehow to inform the server so that it clears stored_item_list for this device. In our application in this case we send an additional parameter “clearCache” = 1.

Conclusion

Summary table on characteristics of approaches:
Approach Traffic volume (5 – big) Development complicity (5 – high) Device memory usage (5 – high) Data correctness on device (5 – high) Possibility of selecting a certain device
1 All data is always synchronized 5 1 5 5 no
2 Only updated is synchronized 1 2 5 3 no
3 Chunk by chunk synchronization 1 3 1 3 no
4 Correct chunk by chunk synchronization 1 3 1 3 no
5 Synchronization when the data on device is known 2 5 2 5 yes
“Correctness of data on device” is a probability that the device has all data that has been sent by the server. In case of approaches 1,5 there is 100% certainty that the device has all the data needed. In other cases there is no such a certainty. It doesn’t mean that other approaches can’t be used. However if some data is lost on a device, then it won’t be possible to correct it on server (moreover to know about it on a server side). Probably once there are unlimited tariffs for internet traffic and free wifi the problem of mobile traffic limitation won’t be that important. However currently there should be some tricks and smart approaches that can decrease internet expenses and increase application productivity. It doesn’t always work. Sometimes depending on the situation the simpler is the better. I hope that this article can help you to choose some approach that will be useful. Surprisingly in the internet there is not much description of synchronization between server and mobile device, however the number of applications that use this scheme is big. There is interesting link below: http://en.wikipedia.org/wiki/SyncML This entry was posted on Tuesday, November 12th, 2013 at 9:16 am and is filed under Mobile.