This discussion was with the trainer from www.xcelframeworks.com for a Hadoop session
Q) Why we go for Hadoop Training ?
Q) Will I be provided hadoop complete set of videos even if I join physical classroom training?
Q) Why we go for Hadoop Training ?
With the growing adoption of Apache Hadoop as
the platform for big data analysis across various industries, the need
for IT professionals
with expertise in operating Hadoop clusters is
increasing rapidly. The power and flexibility of this platform also
presents compelling opportunities
to develop new data-driven services on Hadoop
and HBase, making "Hadoop developer"
one of the most sought after skills in the
software industry. There has never been a better time to get Hadoop
training.
Q) Will I be provided hadoop complete set of videos even if I join physical classroom training?
Ans : yes you will be provided with blog access that will have the videos?
Q) Can you share the course content of Hadoop training that will be taken up?
Ans :Yes it will be shared with you in startup kit along with courseware and complete software dvd.
Q) How much practical knowledge of Hadoop usage alongwith its theoretical concepts can be expected?
Ans :We believe in hands-on so this three days training will be done in workshop model with 75% hands-on.
Q) It will be better if you can share the hardware and software
requirements before the candidates walk-in to the class room.I hope
internet facility will be available.
Ans : You need not worry we will do the setup and provide you with system,Internet will be availaiable.
Q) How much database, storage and networking knowledge is reqd to not only understand hadoop but to make it as a career later?
Ans : No prerequisite is their for this training we take it up from scratch.
Q) For getting recruited by any company and not having experience in
database or related technologies so far and directly jumping to hadoop ,
will it become any hindrance towards opportunity?
Ans : No
it would not as its different stream itself .Since Hadoop is not meant
for a particular kind of data recruitment companies will not look for DB
admin or other capabilities.They will look for a profile capable of
managing hadoop clusters.
Q) Can you please share a case Study ?
Ans : Yes , Please find below
Q) Can you please share a case Study ?
Ans : Yes , Please find below
Company A – Innovative e-Commerce Retailer
Company A is an innovative e-commerce retailer that sells a diverse set of
products and services. Their distinctive is to provide their customers with an exciting online
shopping experience tailored to their needs, along with superb service. The success factor for
Company A is to know more about their customers so they can cater to them better.
The Director of Data Engineering for Company A was interviewed. He has a team of twelve
persons, whom are part of a technology group supporting data warehousing, business
intelligence, and analytics. He described their focus as, “ensuring that everyone in the company
has better access to the relevant data for marketing optimization and the A/B testing platform.”
His group is… “Growing very well and am totally psyched about it.”
He continued by explaining the business problem and their approach…
“Our goal is to be more responsive to the customer – to have a finger on the pulse of what’s
going on. By integrating and analyzing data about the website behavior of customers, we can
draw relationships that others may not have seen. It is more than data integration; it is pattern
recognition.”
His team integrates clickstream information with
email logs, ad viewing, and operational data
“to figure out what is going on with our customer.” This data was used
to support A/B testing
and multivariate testing to track the entire customer’s experience –
from searching for the product, ordering it, receiving the package, and
even product returns, all to optimize the
experience for the customer.
Architecture
He described the evolution of their architecture over the past two years…
“The company was just a hundred people two years ago. We had no analytic tools. Our team
was just two people and me. Given our tradition of doing things in an innovative way, we
started using MapReduce with Apache Hadoop for analytics. However, we knew that this
architecture would not be sufficient. We needed to build a data warehouse. In addition, Cognos
and Spotfire were previously purchased and had a user base that required support. So, we
investigated other analytic platforms, such as Aster Data (before its acquisition by Teradata),
Oracle, Vertica, Greenplum, and MongoDB.”
About a year ago, the team acquired Teradata Aster to complement their Hadoop platform. The
Teradata Aster platform is being used as a reporting platform, along with its analytic processing
for discovering new insights beyond reporting.
When asked about the reasons for deciding on Teradata Aster, he explained, “I explain what I
need to the Aster folks. We immediately had a good working relationship with some quality
Aster folks. The project quickly gelled into place and has worked successfully over the past year.”
Teradata Aster Database allows the team to analyze more of these potential behavior patterns
in a stable and scalable way. In particular, the distributed parallel nature of the Teradata Aster
discovery platform allows reasonable computation times with large data sets.
The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the
company is able “to absorb large data volumes with fluctuating demands” and then, as needed,
transfers data sets to Amazon Elastic MapReduce. He offered some general observations about
the Hadoop platform…
“Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source
and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple
framework that works well.”
Case Study:2
The Hadoop cold storage contains data that is infrequently accessed. By using Amazon EC2, the
company is able “to absorb large data volumes with fluctuating demands” and then, as needed,
transfers data sets to Amazon Elastic MapReduce. He offered some general observations about
the Hadoop platform…
“Hadoop Distributed File System is the sexy part of the Apache Hadoop project. It is open-source
and allows the problem solving qualities of MapReduce to really shine. It is a brilliantly simple
framework that works well.”
Architecture
Company B processes 5.5 terabytes or 36 billion rows of clickstream data every day from digital
media, site behavior, social media, and offline media. In media speak, that’s about 400 billion
media impressions per year.
The daily process uses Amazon Elastic MapReduce (EMR) to cleanse and aggregate the
clickstream data into transactional cookie-level (or session-level) data, which passes to Aster
Database for advanced analysis. In addition, the large data sets are retained in Amazon S3 cold
storage for future analysis.
The transactional data is partitioned among their clients. Clients have their separate custom
data marts so that the client data is not co-mingled. The client data is optimized and structured
as custom OLAP cubes for use by our internal client teams. Reporting to the clients about digital
marketing performance is performed from these cubes.
Company B built a data mart customized to the unique business requirements for each client.
Access to the client data mart is primarily by our internal teams. Though there is high-level
reporting directly to clients so that they can understand why we decided on certain advertising
optimizations. Clients need to know why Company B has moved their funds from one
advertising channel to another.
No comments:
Post a Comment