Case Study
The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the company is able “to absorb large data volumes with fluctuating demands” and then, as needed, transfers data sets to Amazon Elastic MapReduce. He offered some general observations about the Hadoop platform… “Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple framework that works well.”
The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the company is able “to absorb large data volumes with fluctuating demands” and then, as needed, transfers data sets to Amazon Elastic MapReduce. He offered some general observations about the Hadoop platform… “Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple framework that works well.”
Architecture
Company B processes 5.5 terabytes or
36 billion rows of click stream data every day from digital media, site
behavior, social media, and offline media. In media speak, that’s about 400
billion media impressions per year. The daily process uses Amazon Elastic
MapReduce (EMR) to cleanse and aggregate the clickstream data into
transactional cookie-level (or session-level) data, which passes to Aster
Database for advanced analysis. In addition, the large data sets are retained
in Amazon S3 cold storage for future analysis. The transactional data is partitioned
among their clients. Clients have their separate custom data marts so that the
client data is not co-mingled. The client data is optimized and structured as
custom OLAP cubes for use by our internal client teams. Reporting to the
clients about digital marketing performance is performed from these cubes.
Company B built a data mart customized
to the unique business requirements for each client. Access to the client data
mart is primarily by our internal teams. Though there is high-level reporting
directly to clients so that they can understand why we decided on certain
advertising optimizations. Clients need to know why Company B has moved their
funds from one advertising channel to another.
Thank You
Saurav Kumar Sinha
No comments:
Post a Comment