Big Data Introduction

What is Data?

Any piece of information can be considered as data. We live in the age of data, where everything that surrounds us is linked to a data source and everything in our lives is captured digitally. The physical world around us has turned into raw information: internet, video, call data records, customer transactions, healthcare records, news, literature, scientific publications, economic data, weather data, geo-spatial data, stock market data, city and government records. This data can be in various forms and in various sizes. It can vary from small data to very big Data. So, let us see the classification of this data

  • Any data that can reside in RAM or memory is considered as small data. Small data is less than 10’s of GBs.
  • Any data that can reside in Hard Disk is considered as medium data. Medium data is in the range of 10’s to 1000’s of GBs.
  • Any data which cannot reside in Hard disk or in a single system is considered as Big Data. Its size is more than 1000’s of GBs.
What is Big Data? 

Big Data is also a data but with a huge size. So, Data which are very large in size is called Big Data. Normally we work on data of the size Mega Bytes (MB) [Word Doc, Excel etc.] or maximum Giga Bytes (GB) [Movies, Songs etc.] but data in Peta Bytes (PB) size is called Big Data. 

Big Data refers to huge sets of structured, semi structured or unstructured data that are mined by the organizations for the purpose of identifying new opportunities. About 80% of data captured today is unstructured which is being collected from various sources like sensors which are used to gather climate information, posts on various social media websites like tweets from twitter, Digital pictures and videos uploaded on various websites like Facebook, Purchase transaction records and other similar data. All this data is also Big Data. 

The Data Explosion
  • Every day 2.5 quintillion bytes (2.3 Trillion GB) of data is created every day.
  • 90 % of data in the world was created in the last 2 years.
  • As a business leader, it’s the consequences of this data explosion that you need to care about. 
Capture

Two key consequences result:

1. Knowledge Gap: The difference between collecting data and understanding data.
2. Execution Gap: The difference between understanding data and acting on it.

Big Data Big Sources

1. Social Networking Sites

Facebook, Google, LinkedIn all these sites generates huge amount of data on a day to day basis as they have billions of users worldwide.

2. E-Commerce Site

Sites like Amazon, Flipkart, Snapdeal generates huge number of logs from which users buying trends can be traced.

3. Weather Station

All the weather station and satellite gives very huge data which are stored and manipulated to forecast weather.

4. Telecom Company

Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their plans and for this they store the data of its million users.

5. Share Market

Stock exchange across the world generates huge amount of data through its daily transaction.

6. Search Engine Data

Search engines retrieve lots of data from different databases.

Examples of Big data generation

Walmart

  • 200 million weekly customers across 10,700 stores in 27 countries.
  • 1.5 million customer transactions every hour.
  • 3 PB of data are stored in Walmart’s Hadoop cluster.

Facebook

  • 4.5 billion Facebook likes every day.
  • 350 million photos uploaded on a daily basis.
  • 250 billion photos stored by Facebook.
  • 10 billion messages sent every day.
  • 1 trillion posts in Facebook’s graph search database.
Big Data Introduction
Scroll to top