We are in a data driven world! We can find use of data everywhere, but the main challenge is how to store and process the data. Data which is beyond storage capacity and beyond processing power, that data, define as “Big Data”. There are many sources which generate data like sensors in any digitized setup, cctv camera’s, social network, online shopping, air-lines etc. If we are looking in any organization, then in a single day, just by swiping our card we are generating data in GBs .If someone want to store these data in a local machine, it might be possible till machine have proper storage, but if data increases beyond the storage capacity of machine we store same in external storage (external hard disk), but what if data is increasing exponentially, which is common nowadays? We cannot keep on buying external storage and thus we may approach to data-center (Server) where we can store our increasing data and we called them as Sandbox. When we want data from the server to our local machine, the data is retrieved and processed. This above methodology is time consuming because of slow data transfer rate from server to local machine in complete data processing. There are 3 ‘V’s (Volume, Velocity and Variety) which are main factors of big data computing.
Hadoop Solution to Big data:
Hadoop frame work is introduced by Apache in which a logic (computation) which generally in kbs or Mbs will be sent to data to process instead of fetching data for processing. Hadoop solution to Big data. Hadoop works by distributing huge data into split of small size to process. Hadoop has two main component HDFS and MapReduce. HDFS (Hadoop Distributed File System) is a technique to store the data in distributed manner in order to compute fast and this computation on HDFS is done by MapReduce. By name only MapReduce give its functionality Map will do mapping of logic into data (distributed in HDFS) and once computation is done, reducer will collect the “result set” of map to give final result.