Business Cloud News
Cedar Milazzo, vice president of engineering at Devicescape

Cedar Milazzo, vice president of engineering at Devicescape

Over the past few years there has been a notable flourishing of applications delivering insights off the back of crowdsourced information – Waze, Kamino, Meterfy and the like, enabled primarily by the proliferation of mobile devices and equally, innovations in data storage, processing and management. But with data volumes growing at a seemingly infinite rate, the need for highly scalable, flexible back-end platforms to support these applications has become quite urgent.

Cedar Milazzo, vice president of engineering at Devicescape, a company that delivers Wi-Fi connectivity solutions to telecom operators, including insights into how people use their smartphones while off the cellular network, explains some of the lessons learned as the company embraced Hadoop and switched from MySQL to NoSQL for its core data warehouse while shifting everything into the cloud in the process.

Devicescape offers a piece of mobile device software that intelligently manages the connectivity between LTE and any kind of Wi-Fi network. The company has compiled a curated virtual network (CVN) of more than 20 million public Wi-Fi hotspots, from a total monitored base of more than 315 million, and the software automatically verifies availability, quality and security of the network in real-time, allowing telcos that use the service to ensure users are automatically connected to Wi-Fi hotspots if the connection is better than cellular and vice versa.

The company claims to manage two billion monthly connections, with the platform metabolising terabytes of anonymised data – including the core information flying between the handset and the network, as well as information on usage and movement patterns and a range of other data points useful to operators. As one of its revenue streams Devicescape sells behavioural insights derived from this data to operators.

Cloud first

The service used to sit in a combination of on premise and managed hosting environments but as Devicescape grew it became clear that the company was nudging up against performance ceilings; it was also becoming cost prohibitive.

“The system was originally hosted here in our San Bruno datacentre but it was not as accessible as we would have liked it to be,” says Milazzo. “So we took the decision to move the entire platform out into the cloud. The accessibility and the speed is actually much better in remote locations now, and we don’t have anyone here who has to come out in the middle of the night to make sure things are working smoothly.”

In addition to the hosted application itself the company has a cluster of servers that handles the pings from mobile devices to verify that they are actually able to connect to the internet over a given access point. They also handle all incoming information, including data on how much information was transferred between the device and the network.

Over the past two and a half years the company has slowly shifted everything from its QA and staging systems, all the way up to its production workloads and the customer-facing service itself, into the cloud. All that’s left on premise are a couple of legacy file servers, primarily for cold data storage.

The company now relies primarily on a range of services hosted in AWS to stand up its external and internal services – Elastic Load Balancing to manage the incoming application traffic, S3, and Elastic MapReduce and EC2 for its Hadoop Hive jobs.

The company’s traffic is fairly cyclical and before the move to the cloud Devicescape had a number of servers in its datacentre which had to be prepared to handle the maximum amount of traffic at all times.

“When we switched over to AWS they had a lot of auto-scaling tools, which saved us a ton of money by optimising the number of servers that we needed at any given time. All of our servers are auto-scaling now. At peak time we’ll have four times the servers than at in the middle of the night when there is less demand. So it saves a lot.”

As for internal systems, the company also operates a cloud-centric policy. It relies primarily on Google Apps for productivity and email, and Milazzo says any new IT systems – both internal and customer facing – will likely be cloud-based.

Big data in the cloud and the move from MySQL to NoSQL

Devicescape also migrated its SQL transactional database over to DynamoDB hosted in AWS, in a bid to make it more scalable and flexible.

“When you go with NoSQL it’s much easier to add new data fields. It was really important because as we grew and enhanced our system we were constantly adding new information we were collecting. User behaviour, new information about the access points people were locating, and so forth. For a while we were constantly going back to our database tables and adding columns, or adding new columns with reference indexes and things like that. With NoSQL we can add new columns easily without basically having to redevelop the entire table.”

“The ability of the database to actually scale, it’s much simpler to maintain for larger datasets, and it’s more performant – these were the main drivers.”

But Milazzo says issues did crop up around its shift from MySQL to NoSQL – to be expected given the significant architectural and data mapping differences between relational and non-relational databases.

“There were definitely some growing pains. We had to manage the change from those two types of databases, and we experienced misqueued data in some circumstances, and had to tweak the data structure to ensure IO was kept under control in the new environment.”

A bigger challenge, he explains, emerged when the company completed the migration itself.

“When we switched over to DynamoDB inside AWS we had terabytes of data in our SQL databases, and had to copy that over. It took several days to do the copy and we had to raise the throughput fairly high on our instances. What that ended up doing was automatically sharding across multiple DynamoDB instances, so we ended up with hundreds of different instances.”

“When we dropped that back down we had all sorts of instances we had to pay for that were all fairly high throughput, so we had to engage in a massive effort to shut those down and shift all of that data around again… It was definitely a key lesson learned: you have to take data migration slowly and accept that it just isn’t going to happen overnight.”

Milazzo and his team corrected the sharding issue and the data is now coming in as individual records, but IOPS was still a problem following the migration.

“At the beginning of this year we saw the beginning of sharp increases in costs; and it started to accelerate. The good thing is it forced us to look at our architecture to ensure the way our data coming in, and how it was stored, was optimised and processed in such a way that were able to solve the issue.”

“There are definitely differences in how you want to handle the data coming in and how you’re processing it between hosted solution and cloud solution because of what they charge. We definitely had to learn that as well,” he says. “It wasn’t that big a deal from an effort standpoint, but if we had not taken control of it would have been a big deal from a cost standpoint.”

The Hadoop learning curve

But Devicescape also encountered a steep learning curve when it came to running Hadoop using spot instances in AWS. The company runs some fairly complex, multi-step queries from which it derives essential information for reporting, and each step along the way needs to complete before the next one can begin. If there are any errors or failures it can stop the whole process, and Milazzo says one of its big challenges was to make sure the team was able to monitor the process from the first step all the way to the last.

“This was one of the key problems with spot instances – we would have long running jobs that might run a day and a half and if something breaks we would lose our spot instances. When something screws up they just disappear under you.”

He says the company has invested so much in mitigating controls and monitoring tools for the spot instances that it was costing more to run, making redundant any savings accrued from using spot instances over reserved ones.

“My recommendation now would be to not use any spot instances for Hadoop clustering unless your jobs are really short jobs, taking less than an hour or so. Anything more than that and you risk the danger of running into those issues.”

Milazzo says despite the challenges Devicescape encountered in moving its systems to the cloud and switching from relational to a non-relational database, the benefits far outweigh the costs.

“Now that our systems are hosted and we don’t have to maintain them it really does reduce our costs, time and resource needed to maintain them,” he says.

“We’re able to run data processing jobs for reporting much more quickly with Hadoop, and it’s likely to only get better. There’s a lot of innovation going on around Hadoop and data processing right now. It almost seems like on a weekly basis there’s some new tool that stretches it further.”

With the rise of the Internet of Things, services that crowdsource information to generate value are likely to make technologies like Hadoop and NoSQL even more relevant than they are today, which will continue to reshape the IT landscape. In 5 to 6 years from now, the number of IoT sensors will to multiply and far outweigh the number of mobile phones according to most predictions, and Milazzo says the cloud will be the element that underpins all of these intersecting elements.

“This will create a ‘sensornet’ of sorts and I guarantee that data is going to end up in somebody’s cloud service and generate tons of insights,” he says. “Data will be ubiquitous, but you will need some way of turning that data into usable information.”