Shanghai Open Data Apps, or SODA, is a contest launched in 2015. The acronym is a perfect analogy for what this challenge is about: Data is like soda in the bottle. Usually they sit quietly in the bottle. But once you open it, bubbles of innovation will be sizzling and bursting, carrying tremendous amount of energy. And this is exactly what SODA has managed to do.
Read this article in Chinese
The themes of the challenge for the last two years were “Smart Transport” and “City Safety”. Working with 30 government agencies and companies, SODA unlocked 64 datasets totaling 4TB of data and designed 852 data innovation apps covering a wide range of areas including transport, finance, and public security. Approximately 4200 data innovation talents were closely involved in the solution-finding process. SODA helped raise over tens of millions of dollars for at least two Chinese and one British startups. While bringing great ideas into life, SODA itself has developed a well-round, sophisticated, and replicable model. It started in Shanghai, but now, with its reach and competitiveness, SODA will thrive as an international data innovation brand.
It all started with the “not-so-successful” government-led innovation data contest in Shanghai in 2014.
1.The Origin of SODA
In the autumn of 2014, the Shanghai Big Data Development Innovation Contest concluded with “Jin Jin Recycle” (JJR) crowned the champion by the organizer of the event, the Shanghai Municipal Commission of Economy and Informatization. JJR’s idea was to improve garbage recycling with the concept of sharing economy. The contest was launched with crowdsourcing in mind, meaning that the government wanted to use contributions from individuals and organizations to obtain ideas, which in turn, through their applications, would drive some government agencies to make available real time data about the life of the city. The JJR team received their prize of several tens of thousands of RMB alright. But, to the disappointment of many, including Project Leader, Professor Jin Yaohui of Shanghai Jiaotong University, it never gained access to government data on garbage recycling, which meant that the project was a no go.
The truth is, back then, open data was anything but mainstream in China. Shanghai took a leading role in 2011 when it pioneered in broadening data access, but it never met the demand of programmers. All accessible data were of little use, while all valuable data were not even close to being released. Another point to be noted is that people in China promote open data to drive innovation and entrepreneurship with the focus on effective output and economic returns. This runs against the West’s “mandatory” data publishing for legal and political considerations. Fundamental differences exist between the two cultures. If China is to succeed in its open data effort, it will have to find the common interests of all data-holding parties and figure out how to bring them benefits for sharing the data.
No one understands this pain point better than Zhang Baijun, who worked for the government for many years to publish information and data. In Spring 2015, he left the government and became the Vice President of China Industrial Design Institute. Shortly after that he brought together a group of old friends to brainstorm how to launch a data innovation contest that makes market sense and helps the government to publish open data. Professor Jin Yaohui of Shanghai Jiaotong University, Professor Zheng Lei of DMG Lab Fudan University, Wang Zhiyong of EnerLong LLC, and the author of this article, who is from OPEN DATA CHINA, were invited to reflect on the progress of open data initiatives in Shanghai and the 2014 Big Data Contest from a wide range of perspectives: business, technology, policy, urban science, and civil society. An ad hoc organizing committee was established to study open data challenges from other countries to learn good methods and to design a contest that best suits China’s national conditions. This is the very beginning of SODA.
2.The Foundation of SODA: Data Crowdsourcing
There were two famous open data programs that came into notice. The Open Data Challenge Series (ODCS), run jointly between the U.K.-based Open Data Institute and Nesta, is a series of seven challenge (Energy and Environment Challenge, Food Challenge, etc.) prizes to generate innovation and sustainable solutions to social problems. For each Challenge, the program will thoroughly study the available government data and provide the participants with detailed data lists and explanations to help them find viable solutions. There is also an incubation period of 5 months to grow the finalists’ solutions into viable startups. Another example is the BigApps competition sponsored by New York City. BigApps introduced two methods to guide the participants: One is major themes; the other is specific problem briefs organized around “BigIssues”. Any data source is acceptable though contestants are encouraged to use municipal datasets released by the government to find innovative solutions to address civic issues affecting New York City. One thing ODCS and BigApps have in common is that the data pool is already there. The contests are about creating value with the data but not creating the data per se. But for China, where data is not always available, what is the best way to organize a data contest?
Shanghai started building its own open data gateway DataShanghai (www.datashanghai.gov.cn) in 2010. But until 2015, the website was more of a database for storing static data. There was no dynamic data on it that could be used around certain themes to create business value. Copy what’s been done in the U.K. or the U.S. and paste them in China, it is going to be a not-so-successful contest all over again. Even if the challenge manages to identify innovative ideas, there is no way to deliver them. To tackle this dilemma, SODA raised the concept of “data crowdsourcing”. The idea was to have different parties contribute to a virtual data pool. Challengers shall use the data to try to come up with ideas of value, which in turn, will hopefully lead to more data sharing in the future.
The organizing committee had a hard nut to crack: how to persuade people to participate in data crowdsourcing. The answer was that they were to build a system that ensures the data provided by all parties meet the requirements of developers and circulate throughout the competition in a secure and controlled way. To this end, SODA designed a method of “self-consideration and joint-review; data samples for the 1st round and full access for the 2nd; encrypted data transmission and authorized control”.
In the preparatory stage, based on the theme of the challenge and developers’ feedback, the organizing committee produces a data list specifying all data fields, and requires that: All data must be within the designated timeframes; There must be at least a month’s data clips allowing for overlap between data entries; data space must also have overlap to make possible cross-and-check analysis of multi-source data. SODA then gives the list to relevant government agencies and companies, who will first decide whether they have the data, and if they do, whether they will release complete datasets, or whether they will only provide certain data fields according to their safety regulations. Data-providing parties will tell their decisions and concerns to the organizing committee, which then will invite them to a data safety joint review meeting to assess the possible vulnerabilities of data overlapping. The data list is then finalized, and data-providing parties and SODA sign a data agreement. When SODA receives all the data, it will conduct another technical assessment to make sure that the application of data masking is completed.
Now SODA has the data, the next step is to seek the balance between data security and openness. At the time in China there was a famous competition developed by Alibaba called Tianchi or Sky Pool. Their principle was “To Use but not to See”, which means data is not given to the competitors. Instead, Alibaba provides a “black box” cloud platform for them to enter codes for calculation. But SODA wanted to promote open data. For the organizer, it was important that the contestants are given real data so they can explore ways to use data from real-life scenarios. Yet to ensure data security, you need controlled. That’s why in the 1st round, only data samples are given to contestants for them to understand data structure and content and to come up with good ideas; and only in the 2nd round when the finalists get down to develop prototypes are they allowed full access.
After going through several data licenses such as Creative Commons licenses and data.gov.uk licenses, SODA drafted its own. It allows contestants to freely use the data and does not limit the purposes of data use. Given the data is only accessible to 2nd round challengers, the SODA license does set restrictions on re-distribution. To protect the interests of data-providers, organizers, developers and end users, it further requires contestants experimenting with the data to clarify that all the data in the contest are data clips and that the conclusions arising from such data may not be descriptive of the whole picture. To be more specific, when a participant enters the second round, he/she will be notified to prepare a license agreement. They will also submit personal information to verify identity and signature. When verified, they will receive access mode and password from the organizing committee. Now real work starts.
3.First Application: Smart Transport
After establishing “data crowdsourcing”, the next step is to use the method to obtain real data and to launch the contest. In 2015, with the support from the Shanghai Municipal Commission of Economy and Informatization, SODA was officially launched with the theme “Smart Transport”. There were three reasons for choosing this topic. First, information technology is widely applied in the area. Most data are automatically collected so it is more or less reliable. Second, transport rarely concerns identifiable personal information. SODA will have an easier time promoting “data crowdsourcing”. Third, a quick look at previous contests identifies transport as the area that most readily yields innovations and viable startups. To make the contest a success, it needs a smooth start.
Table 1 Part of 2015 SODA Data
Data Name | Provider | Data Fields | Time Period |
Shanghai Public Transportation Card Company | Public Transportation Card Data | Card Number, Transaction Date, Transaction Time, Line/Metro Station Name, Industry ( Bus, Subway, Taxi, Ferry, P+R Parking Lot) , Transaction Amount, Transaction Nature ( No Privilege , Privilege) | 20150401-20150430 |
Shanghai Pudong Public Transportation Company | Real-time Pudong Bus Data | Device Number, Line Code, Station Code, In-and-Out Station Status, Direction, GPS Report Time, BMDY | 20150101-20150430 |
Shanghai Qiang Sheng Smart Navigation Company | Taxi Operation Data | Vehicle ID, GPS Time, Longitude and Latitude, Speed, Number of Satellites, Service Condition, Skyway Condition, On-Position | 20150401-20150430 |
And what is smart transport without metros, buses and taxis? Thanks to the strong support from the Shanghai Municipal Transportation Commission, many public service companies, including Shanghai Public Transport Card CO., LTD., Shanghai Pudong New District Public Transport CO., LTD., and Shanghai Qiangsheng Intelligence Navigation Technology Satellite CO., LTD., participated in SODA’s data crowdsourcing and contributed high quality data: One month’s transport card transaction record, one month’s Qiangsheng Taxi GPS data, and four months ‘data recording buses leaving and entering stations. All the data, for the first time in China, was accessible to the public. All eyes were turned towards the contest. There was excitement in urban planning and data science communities, as well as in the media. Within a month, 3000 people from China and outside registered, with 823 teams formed and 505 smart transport ideas submitted.
Shanghai BaoCheng, one of the fifteen finalists, won “Best Business Model Prize”. Unlike most teams, BaoCheng was not a start-up company. At the time of the challenge it already raised an angel round. Their product, OKChexian, offers behavior-based auto insurance. BaoCheng developed a smartphone app to collect users’ driving behavior and data, then it used Qiangsheng Taxis’ GPS data to calculate the benchmark speed in Shanghai. The company compares the two sources of data to tell if the user is a careful driver and adjust rates accordingly. The recognition from the SODA judges was not the only acclaim the BaoCheng team received. Soon after the contest it raised tens of millions of dollars in a funding round led by IDG and JD.com and saw strong growth. OKChexian was among the first companies to go on-online on Shanghai’s “Shanghai Credit”platform.
Apart from projects of high commercial value, SODA also identified projects that focus on helping the public sector to improve transport service quality. Take A+P&T+U by Shanghai Tongji Urban Planning & Design Institute, the Tongji Team proposed the concept of “Smarter Connect” to tackle rush hour overcrowding in the metro and to enhance rider experience. They analyzed public transport card data and taxi data, and identified stations with the worst push and shove. Their suggestion was to use land transport to move people to less crowded locations or straight to their destinations. Another two public transport examples were GoBiking and Jinridian. Their focus was on last-mile public bike service. They analyzed multi-source data and made specific and viable advice on docking station locations and bike deployment strategies. No wonder an official from the Shanghai Municipal Transportation Commission made such comment after the contest: “This challenge has shown that old ways no longer work if we want to solve transport problems in Shanghai. Going forward, the answer is probably the release of more government and public data, combined with the wisdom of the people, and science and innovation”.
*
No one foresaw SODA’ huge popularity in 2015. Not the organizing committee, nor the data-providing government agencies and companies. They did not expect the participants to truly understand the data, let alone turning them into products. When preparing for the contest, some providers insisted that even if they made the data available, there was no way that anyone could figure them out and use them. Without saying, when they saw for themselves the amount of work done and products created, their attitude completely changed. To quote someone from one data-provider “We only gave them 10 datasets, and they came up with over 500 proposals. If the data stayed in the organization, we would never be able to achieve that”. As for the biggest winner, it had to be the Shanghai Municipal Transportation Commission. The SODA ideas proposed new business models, policy guidance and route planning, and offered the commission a notable number of solutions for urban transport problems. One commission official was heard to say that the contest was “an eye opener” and he had been “greatly inspired”. His hope was that in the future more data would be available and that his agency was going to stay in contact and work closely with the teams to turn their ideas into real products.
Following SODA’s triumphant debut in 2015, many cities started to experiment with data contests as well. Their principles were similar: open data and innovation. But they never came even close to SODA’s scale and achievement. So, what makes SODA different? The answer is SODA’s core model: data crowdsourcing, application crowd-creating, problem crowd-solving. An analysis of the SODA model will be found in Part 2 of this article.