Hadoop

Hadoop is free, Java based framework for processing larger set of data. It was founded by the Apache group who believed in open source. As complexity of data is very large, they figured out with help of Google Map Reduce Technology that if processing is done in the distributed way it will serve the purpose. Disadvantage encountered with centralized system lies in its breakdown. If it happens no one can get access to his data. It may take days to rectify the issue and all will be deprived of their data which is a major issue. Providing complete solution to each user separately or saving the data to different servers solved the above problem.
Data storage is handled by NameNodes and DataNodes whereas Data processing is done by Job Tracker and Task Tracker. In Hadoop, instead of moving the data towards the processing unit, processing software is moved towards the data. This will save a lot of time.

Technologies used in Hadoop

  • HDFS (for data storage)
  • Map reduce (for data processing)

Different types of Data

  • Structure data (My SQL)
  • Semi Structured data (xml, json)
  • Unstructured data (text, video and audio)