Facebook Tackles Vast Amount of Data With Project Prism

27 August 2012

Since the beginning of the internet the amount of information available has snowballed. This has lead to data centres having to deal with with more and more information.

The corporations who have had to deal with massively growing amounts of data (and moving it around globally) have had to come up with incredibly brilliant ways to deal with the problem.

Amazon launched their system publicly as S3, which allowed any developer or team with the inclination or need, to roll out their work on a platform with thousands of servers in multiple locations around the world. This meant when your website went from being unknown to being a global phenomena, your website could deal with the spike in traffic without changing infrastructure or hardware. Cloud computing has become very popular over the last few years and with most services you only pay for what you use. So you only pay for the billions of impressions when you get billions of impressions.

In the last 5 years Facebook has grown from modest beginnings to something that is colossal. With nearly a billion users (spambots or not), the data that needs to be processed and the number of locations this data needs to be access from, means that they have a unique volume associated with the problems.

Facebook, together with search giant Yahoo were the front runners in creating Hadoop, a software platform for processing and analyzing the monumental amounts of data streaming across the modern web. Yahoo started the open source project as a way of constructing the index that underpinned its web search engine, but because it was open source, others soon used it for their own online operations and contributed the platform code.

Now Facebook is staring down an avalanche of data and this week engineering bigwig of Facebook Jay Parikh announced that the company has developed two new software platforms that will see Hadoop scale even further. Plus Facebook intends to open source them both which is great.

What this new software does is lets you run a huge amount of tasks across a vast collection of Hadoop servers without running the risk of crashing the entire cluster, AND also allows you to run the software in multiple data centers across the globe.

“It lets us move data around, wherever we want,” Parikh says. “Prineville, Oregon. Forest City, North Carolina. Sweden.”

But as Facebook’s user base grew to 900 million users, they systems were plagued by a “single point of failure.” If a master server overseeing the cluster went down, the whole cluster went down — at least temporarily. Now Facebook has eliminated the single point of failure in the platform using a couple of very clever solutions called AvatarNode and Corona.

All of this is incredibly brilliant made better by going open source. I love reading about the challenges that can affect the ‘big picture’ and to see corporations use their power to fix them and share them with the world is fantastic.

You can read the whole article via Engadget

Cloud Computing Data Facebook Open Source

About Nicholas

I love helping people and solving problems. I am currently working on:
A Cape Town Fibre ISP – Atomic Access
Borderless Blockchain Mobile Network Operator – World Mobile
From England and currently living in Cape Town, South Africa.
Learn more about Nicholas.