Thursday, 13 March 2014

Hadoop - An Interesting Conversation before the training

This discussion was with the trainer from www.xcelframeworks.com for a Hadoop session

Q) Why we go for Hadoop Training ?

 With the growing adoption of Apache Hadoop as the platform for big data analysis across various industries, the need for IT professionals with expertise in operating Hadoop clusters is increasing rapidly. The power and flexibility of this platform also presents compelling opportunities to develop new data-driven services on Hadoop and HBase, making "Hadoop developer" one of the most sought after skills in the software industry. There has never been a better time to get Hadoop training.


Q) Will I be provided hadoop complete set of videos even if I join physical classroom training?
Ans : yes you will be provided with blog access that will have the videos?

Q) Can you share the course content of Hadoop training that will be taken up?
Ans :Yes it will be shared with you in startup kit along with courseware and complete software dvd.
Q)  How much practical knowledge of Hadoop usage alongwith its theoretical concepts can be expected?
Ans :We believe in hands-on so this three days training will be done in workshop model with 75% hands-on.

Q) It will be better if you can share the hardware and software requirements before the candidates walk-in to the class room.I hope internet facility will be available.

Ans : You need not worry we will do the setup and provide you with system,Internet will be availaiable.
Q) How much database, storage and networking knowledge is reqd to not only understand hadoop but to make it as a career later?
Ans : No prerequisite is their for this training we take it up from scratch.
Q) For getting recruited by any company and not having experience in database or related technologies so far and directly jumping to hadoop , will it become any hindrance towards opportunity?

Ans : No it would not as its different stream itself .Since Hadoop is not meant for a particular kind of data recruitment companies will not look for DB admin or other capabilities.They will look for a profile capable  of managing hadoop clusters.

Q) Can you please share a case Study ?

Ans : Yes , Please find below

Company A – Innovative e-Commerce Retailer

Company A is an innovative e-commerce retailer that sells a diverse set of products and services. Their distinctive is to provide their customers with an exciting online shopping experience tailored to their needs, along with superb service. The success factor for Company A is to know more about their customers so they can cater to them better. The Director of Data Engineering for Company A was interviewed. He has a team of twelve persons, whom are part of a technology group supporting data warehousing, business intelligence, and analytics. He described their focus as, “ensuring that everyone in the company has better access to the relevant data for marketing optimization and the A/B testing platform.” His group is… “Growing very well and am totally psyched about it.”
He continued by explaining the business problem and their approach… “Our goal is to be more responsive to the customer – to have a finger on the pulse of what’s going on. By integrating and analyzing data about the website behavior of customers, we can draw relationships that others may not have seen. It is more than data integration; it is pattern recognition.”
His team integrates clickstream information with email logs, ad viewing, and operational data “to figure out what is going on with our customer.” This data was used to support A/B testing and multivariate testing to track the entire customer’s experience – from searching for the product, ordering it, receiving the package, and even product returns, all to optimize the experience for the customer.
Architecture
He described the evolution of their architecture over the past two years… “The company was just a hundred people two years ago. We had no analytic tools. Our team was just two people and me. Given our tradition of doing things in an innovative way, we started using MapReduce with Apache Hadoop for analytics. However, we knew that this architecture would not be sufficient. We needed to build a data warehouse. In addition, Cognos and Spotfire were previously purchased and had a user base that required support. So, we investigated other analytic platforms, such as Aster Data (before its acquisition by Teradata), Oracle, Vertica, Greenplum, and MongoDB.”
About a year ago, the team acquired Teradata Aster to complement their Hadoop platform. The Teradata Aster platform is being used as a reporting platform, along with its analytic processing for discovering new insights beyond reporting.
When asked about the reasons for deciding on Teradata Aster, he explained, “I explain what I need to the Aster folks. We immediately had a good working relationship with some quality Aster folks. The project quickly gelled into place and has worked successfully over the past year.” Teradata Aster Database allows the team to analyze more of these potential behavior patterns in a stable and scalable way. In particular, the distributed parallel nature of the Teradata Aster discovery platform allows reasonable computation times with large data sets.
The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the company is able “to absorb large data volumes with fluctuating demands” and then, as needed, transfers data sets to Amazon Elastic MapReduce. He offered some general observations about the Hadoop platform…
“Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple framework that works well.”

Case Study:2

The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the company is able “to absorb large data volumes with fluctuating demands” and then, as needed, transfers data sets to Amazon Elastic MapReduce. He offered some general observations about the Hadoop platform…
“Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple framework that works well.”
Architecture
Company B processes 5.5 terabytes or 36 billion rows of clickstream data every day from digital media, site behavior, social media, and offline media. In media speak, that’s about 400 billion media impressions per year.
The daily process uses Amazon Elastic MapReduce (EMR) to cleanse and aggregate the clickstream data into transactional cookie-level (or session-level) data, which passes to Aster Database for advanced analysis. In addition, the large data sets are retained in Amazon S3 cold storage for future analysis.
The transactional data is partitioned among their clients. Clients have their separate custom data marts so that the client data is not co-mingled. The client data is optimized and structured as custom OLAP cubes for use by our internal client teams. Reporting to the clients about digital marketing performance is performed from these cubes.
Company B built a data mart customized to the unique business requirements for each client. Access to the client data mart is primarily by our internal teams. Though there is high-level reporting directly to clients so that they can understand why we decided on certain advertising optimizations. Clients need to know why Company B has moved their funds from one advertising channel to another.

No comments:

Post a Comment

मौलवी साहब

पहले घर की दालान से शिव मंदिर दिखता था आहिस्ता आहिस्ता साल दर साल रंग बिरंगे पत्थरों ने घेर लिया मेरी आँख और शिव मंदिर के बिच के फासले क...