
<p>In todays world, data is generated and collected faster than ever before. In the past, most data was generated by humans, typing data into application forms, ringing up purchases at point of sale machines and so on. Now, the majority of the data created is machine generated, collected in application logs and produced by sensors. The verbosity and sampling rate of these sources has exploded as computing capacity has expanded, storage has become cheaper and the business value of this data has increased. To meet these extreme challenges, a new breed of platforms has been developed including Hadoop, a wide range of NoSQL stores and cloud-enabled infrastructure.</p><p>As experience tells us, just putting the data somewhere is not the goal. Extracting, transforming and loading this data to an analytic system is what brings true life and enablement to the data we have collected. For this purpose, we have our trusted workhorse, the ETL platform. But it, too, must evolve in order to serve in this challenging new world.</p><p>Evolution 1: Pushing the Processing Down</p><p>With data sets becoming larger and more complex, it is increasingly important that processing remains close to the data. This concept helps assure a scalable/divisible processing model. Additionally, it reduces the bottlenecks of network latency. The model of picking up data, processing on another server and then putting it back down must be used with greater care because it is too costly in a high-volume environment. Local data processing performance has been thoroughly proven in Hadoop and MapReduce. We will definitely see more ETL tools leveraging Hadoop to perform their processing, and several mainstream ETL tools are already executing on this strategy.</p><p><a href="http://www.information-management.com/news/the-evolution-of-etl-10025153-1.html">Keep reading...</a></p>