LinkedIn does not move to Azure because it has to or because it is owned by Microsoft: it moves because Microsoft can invest in technologies that even a company as large as LinkedIn cannot build for itself.
How Barracuda uses Microsoft Azure to deliver application security to customers
Nitzan Miron, VP of application security services at Barracuda Networks, discusses the WAF-as-a-Service product offering announced by Microsoft on Microsoft Ignite 2019
LinkedIn is so invested in running its own data centers that it has launched its own version of the Open Compute Project (OCP) based on the 19-inch rack, Open19. It has also contributed significant amounts of code to Microsoft’s SONiC network operating system to support functions it needs for its own data center network. But now it is planning to move to Azure.
A few months after the first announcement, TechRepublic sat down with LinkedIn CTO Raghu Hiremagalur to ask why the company is switching to the cloud and what progress has been made so far. And no, he says, it’s not because Microsoft owns or puts them under pressure – it’s about the ability to scale with new hardware and services that LinkedIn could never build for itself.
Real hyperscale
Raghu Hiremagalur, CTO on LinkedIn.
Image: LinkedIn
First, while Open19 was largely concerned with simplifying and reducing the costs of operating a data center, the switch to Azure eliminates the need to build new data centers.
SEE: Microsoft Azure: a guide for insiders (free PDF) (TechRepublic)
Ten years ago, LinkedIn’s problems were all about keeping the website available as traffic grew, and it has focused for several years on switching to micro services and just having the capacity to serve members. Then it started to think about scaling up the network and building an active-active data center architecture. In the past three years, that has shifted to trying to build data centers such as the hyperscale clouds such as Azure, adapting the network to the needs of the applications running on it instead of asking the application developers to work with the available infrastructure, bandwidth and latency.
But it does that in medium-sized data centers instead of the gigantic data centers of hyperscale cloud, and the problem was more likely to run out of space than to run out of power. LinkedIn has around 250,000 servers in five data centers – and that number is growing by a third every year. It also has 20 attendance points and colleagues with 4,000 networks, but that is not comparable with Azure.
“We are in the western US, the eastern US, Singapore and Texas; they are literally 57 regions,” Hiremagalur explains. “Being able to drive on Microsoft’s backbone is an immediate plus: it’s probably one of the best network backbones out there from a private backbone point of view, and they have 160-plus edge locations with Azure Front Door. So our ability to our members become much better than where we are now, because we can end their sessions close to where they are. “
LinkedIn does more than use Azure connectivity, says Hiremagalur: “Our plan is to move all our workloads – production, offline compute, current compute – to Azure. At some point we don’t want to serve data centers.”
This is not because LinkedIn could not grow its data centers: for the next five years, Hiremagalur sees no problems scaling up its network, data center capacity, power or other infrastructure requirements.
LinkedIn does not move to the cloud because it has to. But it is worthwhile to go through a fairly disruptive migration of complex workloads for the possibilities that Azure offers: agility.
“Whether it is about elasticity and capacity, or making use of Azure investments with their advanced infrastructure with Azure Front Door, or their network backbone, or the work they do in custom silicon, and the data center and network work they are working on are accelerated networking and FPGA and storage innovation . Those are all things that we would like to have access to over time, “says Hiremagalur. “And those are not things that we would invest in ourselves independently – it makes no sense for us to invest in it ourselves.”
LinkedIn will also take over cloud AI tools such as AzureML. “The Azure capabilities with the things they do in the AI space are amazing. We would certainly benefit significantly from the level of GPU calculation they have,” says Hiremagalur.
Multi-year migration
Being part of Microsoft means that LinkedIn gets an advanced look at what’s coming to Azure. Hiremagalur wants to get started with a migration that will take several years. “Given the amount of time we think we need to move our workloads to Azure, we wanted to start the process now and be ready to use all that goodness, if all of that is ready for us.”
Meanwhile, LinkedIn will continue its own product development, but at the same time it will prepare for the move – and think about what it can stop once it runs on Azure.
“In general, the interfaces that our infrastructure building blocks, such as storage indexing, offer to the rest of the engineering organization must remain constant or at least very similar, so our infrastructure teams will do the heavy work of adapting our infrastructure building blocks to the public cloud “Hiremagalur says.
But he doesn’t want to end with a copy of the current LinkedIn infrastructure, only in the cloud. “This is an opportunity for us to break down computers and storage. We have the ability to use elasticity on an extreme scale to work with the daily work pressure patterns that LinkedIn has (with most users logging in during working hours). Those are things that we want to take advantage of our way to Azure. “
LinkedIn uses very large graphic databases; there is a lot of Kafka (which was developed on LinkedIn and handled more than one trillion events per day there by 2015), and the Samza stream processing systems built on top of Kafka, such as offline compute and machine learning. It is very network intensive: for every byte of data coming from user activity in a LinkedIn data center, approximately 1,000 bytes of east-west traffic is generated in the data center (analysis of that information for the LinkedIn chart and machine learning systems such as recommending people you might know).
“We will be able to utilize this aggregation of network and storage on a large scale, along with the ability to independently calculate and scale storage. We are a very data-heavy system, so be able to do those two things managing as two separate units is also a big plus for us, “says Hiremagalur.
“The lower the network latency, the more you can do with graphic database traversals,” he notes. “The ability to run through our graph in very interesting ways, naturally requires very good, very well designed, distributed systems, as well as top networks. I look forward to being able to be serverless to scale for this type of workload and not to worry. about how these things get going and settle. Those things are great candidates for serverless computing. “
SEE: Special report: Prepare for serverless computer use (free PDF) (TechRepublic)
That is an architectural change that LinkedIn would have looked at whether it moved to Azure or remained in its own data centers. But the move means that there are infrastructure areas that LinkedIn can fully transfer to Azure.
“Carrying out a large-scale workload in a public cloud is different from managing things ourselves, where we have 100 percent control over literally everything. So we will have to learn how to manage a site in a very stable way with those changes,” Hiremagalur says.
Instead of thinking about hardware and service errors, engineers will have to plan for upgrade cycles that they have no control over, Hiremagalur explains. “We need to learn to respond to signals that Azure will be of service to us and figure out how to move or pause workloads. The way we manage security will be different. The layers of the stack that we have 100% control will just shrink: we have no control over the network, we have no control over the disparate datasets. So the way we think about infosec must evolve, the way we think about perimeter protection must evolve. “
That’s the usual story about cloud migration – you don’t move an application to another server, you move what you need to move to a different kind of abstraction. Once you have done the work, the reward is that you can concentrate on problems at a higher level.
“I see this as our ability to focus on areas where we deliver unique value and lean on our Azure counterparts, to do things that they do on an extreme scale and do very, very well,” says Hiremagalur. “I visualize this as a rise in sea level: the things that go under for us are things we only rely on for Azure. The rest of these are things that we continue to do and that we can still hope for to concentrate.”
On-prem is the new mainframe
The Open19 initiative is not going away, says Hiremagalur. “We have already had a lot of value for it: we have implemented it in our data centers, we have already contributed a lot of technology to OCP and we will continue to work with them.”
But apart from gigantic organizations such as Facebook that manage their own cloud, Hiremagalur also expects more and more companies to go to the public cloud for many of their workloads over time because their own developers will demand it.
“If you don’t have access to innovations in the public cloud for the next five to ten years, your company may be seen in the same way as companies that run on mainframes – and no company wants to be in that position.”
Cloud Insights newsletter
Your knowledge base for the latest news on AWS, Microsoft Azure, Google Cloud Platform, Docker, SaaS, IaaS, cloud security, containers, the public cloud, the hybrid cloud, the industry cloud and much more.
Delivered on Mondays
Register today