big data analytics- emerging techniques and technology for growth and profitability (1)

Upload: krishna-dash

Post on 02-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    1/22

    Big Data Analytics: Emerging Techniques and Technology for Growth andProfitability

    Sponsored by Endeca

    Speakers: Boris Evelson, Vice President & Principal Analyst, Forrester &Paul Sonderegger, Chief Strategist, Endeca

    Moderated by Ron Powell

    Ron Powell: Welcome everyone to our web event; Big Data Analytics: Emerging Techniquesand Technology for Growth and Profitability sponsored by Endeca. Endeca is a leading providingfor agile information management solutions that guide people to better decisions. I am RonPowell, I am the Associate Publisher and Editorial Director of BeyeNETWORK, a part ofTechTarget and I will be your Moderator for this web seminar.

    Our presentation today features Boris Evelson and Paul Sonderegger. Boris is Vice Presidentand Principal Analyst at Forrester. He is the Leading Expert in Business Intelligence and he helpsenterprises define BI strategies, governance and architectures and identify vendors andtechnologies that help them put information to use in business processes. Prior to joiningForrester, he held senior positions at Citibank, JPMorgan and Pricewaterhouse. Also speakingtoday is Paul Sonderegger, he has helped global organizations turn Big Data into better dailydecisions again competitive advantage. Prior to joining Endeca, Paul was the Principal Analyst atForrester Research focusing on search technology and user experiences.

    What is the value of Big Data to your organization and what can you do about it? Those are thebig questions facing companies today. In this webinar, Boris and Paul will answer those and otherrelevant questions and reveal how innovators have successfully tapped the data at their disposalto improve customer relationships, drive growth, achieve market leadership and unlock newrevenue streams. We will also have time at the end of the presentations today for questions.

    Please feel free to submit your questions at any time during the event. If you would like a specificperson to answer your question, dont forget to include the persons name when you submit it,and now here is Boris Evelson who will begin todays presentation for you, Boris?

    2011 Forrester Research, Inc. Reproduction Prohibited1 2009 Forrester Research, Inc. Reproduction Prohibited

    Big Data Analytics: Emerging Techniques andTechnology for Growth and Profitability

    Boris Evelson, Vice President, Principal Analyst

    October 27, 2011

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    2/22

    Boris Evelson: Ron thanks very much for the introduction and Endeca thank you very much forthe opportunity to present together with you. So, good morning, good afternoon everyone, I knowthere are lots of people dialing in from all over the world, so thanks very much for taking your timeto be here with us. You know, its a very exciting time in business intelligence, but whatsparadoxical and interesting is that its never been more exciting, I have personally been in thebusiness intelligence information management data warehousing business for close to 30 yearsand I dont think a year goes by without a new application over these technologies, newchallenges and new technologies. It is never, not exciting times in business intelligence.

    Today we stay, we are standing on a cause for something, completely new and once again veryexciting, because in some of our recent surveys, some of our recent anecdotal conversations withcustomers we have uncovered that most of the firms out there are only using single digit, singlepercentage digits over their data and I am not even talking about unstructured data. This is just atraditional structured data that is buried all over the place. Firms have done pretty good jobextracting that data from their financial systems, maybe their HR systems, maybe their supplychain system and they are just beginning to scratch the surface over their sales and marketingsystems for reporting analytics, but if we look at all over the data, all of the structured data that isstored all over the enterprise, we are really just scratching the surface and what we do knowtoday is that that even these solutions, solutions that only address a tiny percentage of the datathat you have, they are complex, they are expensive, sometimes they are not flexible enough andsometimes they take very long to implement.

    2011 Forrester Research, Inc. Reproduction Prohibited2

    Firms use only 1% to 5% of available data . . .

    What if that number doubled

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    3/22

    So the question that we are asking here, what would happen if that number doubled, what wouldthat do to the cost and the efforts, but much more importantly what would it do to the insights thatyou are getting from all of the data, the possibilities are absolutely amazing. So if you go back to20 or 30 years ago when basically all we did is just a reported on an analyzed financial datapotentially for better, for stability right you know and that may sound a bit boring but that waspragmatic but it is absolutely night and day compared to what we are doing with data today, so

    just look at some of these examples.

    Well literally saving lives and we can process so much data these days so quickly that we canmake life-saving decisions based on that and the whole power of social technology is now

    available for companies to understand who is friends with whom and therefore target their offersto these individuals in a much more focused matter, in a much more focused way. Even if we arenot using social technology, just by examining the gazillion of transactions that that are out therein a point of sale systems and clouds we can understand some social relationships even withoutgoing to databases like Facebook and LinkedIn and Twitter and others public utility companiesare no longer just a service providers there are the analytics providers because they can monitoron a sub-second basis what is it that we are doing with our electricity, oil and gas and waterusage and they are there literally have to more its trillions and trillions of data points every hourand everyday and then previously disconnected analysis such as and insure this companylooking at the profitability and cost of delivering healthcare while a pharmaceutical companylooking at the health benefits over particular drug, what if they could put their information togetherand understand the whole end to end lifecycle over drug, not just from the health benefits point ofview but from the financial benefits point of view.

    2011 Forrester Research, Inc. Reproduction Prohibited3

    Innovators turn more data into more value

    A hospital saves babies' lives using massive streams (100 million data

    points per day) of monitoring data

    A telco taps into the Facebook social groups to market friends-and-family

    plans

    A credit card company retains customers by understanding social

    relationships

    A public utility company performs sophisticated analysis on smart grid

    data (1.5 trillion data points)

    A pharma company is collaborating with a healthcare provider to identify

    patterns in 360 degree view on drug cost/benefits

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    4/22

    So the possibilities are just absolutely amazing and mind-boggling, and our research uncoveredlot more over these great examples and we can aggregate all of these new technologies andused cases for Big Data into these five used cases. Number one is exploration and machine-learning right there is obviously tons of data, coming from devices such as medical devices andsmart grid utility devices that I just described. There is operational prediction, pattern where BigData feeds operational predictive models, so that they can optimize these models in the real time.We also not, no longer afraid to use the words dirty data warehouse or dirty operational datastore, because sometimes having all of the data in one logical place even before you reconcileand cleanse it, still makes lot of sense and still provides tons and tons of benefits. Some of theBig Data technologies allow us to do bulk loads of our data warehouses and operational datastores much faster and last but not least sometimes its not the volume of data but its the speedat over the data changes you know the best examples are always financial markets where by thetime you do anything in a traditional database its already too late, so you need to react to market

    changes with sub-second response time so that you can execute that profitable trade when andwhere it has to be executed.

    2011 Forrester Research, Inc. Reproduction Prohibited5

    Most big data use cases hype its application to analysis of new raw data from social media, sensors ,and web traffic,but we found that firms are being very practica l, with early adopters using it for operating on enterprise data theyalready have.

    5%

    7%

    12%

    13%

    27%

    27%

    35%

    35%

    42%

    72%

    Dont know

    Other

    Scientific/genomic data

    Image (large video/photographic) data

    Locational/geospatial data

    Clickstream

    Unstructured content from email, office documents, etc.

    Social media (Facebook, Twitter, etc.)

    Sensor/machine/device data

    Transactional data from enterprise applications

    What types of data/records are you planning to analyze using big data technologies?

    Base: 60 IT professionals(multiple responses accepted)

    Source: June 2011 Global Big Data Online Survey

    Despite the hype, most firms find big data technologyuseful to operate on data they already have

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    5/22

    So lots and lots of interesting used cases but what was amazing for us in our discovery andresearch into Big Data is that, yes absolutely firms are using all of the new data sources. Theyare leveraging all of the new data sources such as sensor and machine data, such as socialnetworks, data from Facebook and Twitter such as unstructured content from e-mails and/orother office documentation, but what was amazing to us its actually after we thought about itmade a lot of sense is just your basic good old fashion kind of motherhood and apple pietransectional data from ERP applications that type of data is still the keying of whats being usedby Big Data technology which just goes to prove the point that traditional approaches, dataware traditional data warehousing, traditional business intelligence, traditional analytics havepushed their limits in certain senses and when the scalability requires extremes not just involumes, but in speed of processing and lets say dirtiness of the data, thats when people turn tothese new Big Data technologies, very, very interesting revelation.

    So what I just mentioned is indeed something that we see on a daily basis. We do see that thecriticality, the mission criticality or business intelligence and analytics continues to go through theroof, but on the other hand, complexity and rigidity probably is a better word to use here, therigidity and lack of flexibility and expense of scaling these technologies beyond certain point orhave been pulling this market apart in different directions and as you can see there is this rift inthe market, where traditional technologies is just really reaching their limits, but like here we dohave a way to close that gap, we have a way to address this opportunity and we have a way toscale and the way to scale is first and foremost with lots and lots of best practices. This is not asubject of todays, todays webinar, but I highly recommend for all of you to reach out to us for allsorts of best practices because the way you structure your organization, the way you organize

    and govern your business processes around this is infinitely more important than technology, butyou know technology is very important and therefore these two converging arrows best practicesand next generation technology is whats absolutely critical to close this gap and to address theopportunity.

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    6/22

    Now why is traditional business intelligence, traditional data warehousing are kind of meetingtheir limits? Well you know number one, traditional BI is very complex and I am only showing youabout a dozen components here but in any real life situation for a large heterogeneous global,multi product multi service line enterprise, the process of getting from the left side of this picturewhere you have to source raw data to the right side of this picture where you can actually startmaking decision is about 20, 30 sometimes even 40 components and these componentssometimes come from different vendors and even if they come from same vendors thats thetechnology thats been recently acquired, so its not really seamlessly integrated. So people spendton and tons of time just integrating these components right, not actually spending the timelooking at the information and making the decision, so we need to change that.

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    7/22

    The second reason is that no matter what we have been saying for the last 20 to 30 years interms of aligning business in IT really hasnt worked that well specifically for business intelligenceand analytics and there are multiple reasons for that, but the main reason is just prioritize goalsand objectives over typical business person and typical IT person are different and this is notabout rights or wrongs, its just a nature of the businesses that in on an average typical businessperson cares about his or her business requirements and they need to be flexible and agile andthey need to react to whats going on in the business world today, and we on the IT side, we try todo our best to support that, but we also have other priorities. We are tasked with standardizingtechnology and we are tasked with all sorts of planning processes so that we dont want in youknow hundred different directions and we need to try to minimize operational risk of all theseapplications, so by just nature of our kind of roles in life, alignment is not, is not there and thatsproduced lots and lots of tensions and lots and lots of challenges.

    The last kind of but not least is that in traditional business intelligence and data warehousingtechnology has lots of limitations. One of the major limitations is that technology typically relies onsome kind of a fixed schema and whether you call it schema or you call it a data model, if younoticed if you had a chance to glance at the previous slide, at the bottom of the previous slidethere was a word, two words that I used, one was called a pre discovery and another one wascalled post discovery right. If you think about it every traditional data warehouse or data mart orcube has a data model. That data model means that you cant really do a true what if analysis,because the only what if analysis you can do there is already based. It has to be based oneverything that has been pre-discovered and pre-build and pre-modeled into that data schemaand that data model. So whatever you and your GBA and your business liaison and your

    business analysts have talked about a year ago when you build that data model thats really theonly the thing you can explore and analyze and predict and based on from that data model if youhave new types of decisions to be made I guess what you have to go back and change thatmodel and thats not an easy task.

    We at Forrester are exploring and researching four categories of technologies that are absolutelycritical to take BI into this new era of agility, flexibility and scalability, and we categorized them asall sorts of technologies to make BI more automated and I remember those 30 or 40 componentshow do you really either reduce the number of components or make them more integrated, moreautomated. How do you make BI and data warehousing and predictive analytics unstructured

    2011 Forrester Research, Inc. Reproduction Prohibited9

    Capabilities necessary for limitless BI and DW

    Adaptive data models

    Exploration and analytics

    Advanced data visualization

    Source: March 31, 2011, Trends 2011 And Beyond: Business Intelligence Forrester report

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    8/22

    data analytics in real time versus batch time, batch cycle BI or more unified as opposed to usingdifferent technologies? How do you make business intelligence and analytics more pervasivebecause our research shows that still in majority of enterprises less than 10% of people, now theyare using enterprise grade BI applications and last but not least, my original point on the slide youknow what are the technologies that we need to have to make the data models more kind ofadaptive to the current business realities, not business realities that were there a month, a quarteror year ago when we build that data model but being able to respond to the businessrequirements that are happening around us today.

    Just to kind of drill into a point that I made earlier that traditional transection oriented relational

    databases are poor fit for analytics and business intelligence. We all of you who are GBAs or dataarchitects on the phone you do know that you have to really jump through the hooks to make itwork for analytics. You go through lengthy and complex exercise to de-normalize your datamodels or we call them to flatten them out, because we want to minimize the number of joints aswe do in database is when we do queries you know that you also spend a lot of time building inthe seas and aggregate tables and OLAP cubes for optimization and we know how to do that justthat it takes a long time and when the requirements change, guess what we got to go back to thebasics and rebuild all those and that is never an easier or fast task and last but not least theserelational databases, no matter what we do they are still a poor fit for unstructured data content.Itsalmost like fitting trying to fit a square peg into a round hole, when we tried to do that and alsodiverse data structures. Remember relational databases were invented 30 years ago for financialdata primarily where the attributes of any financial transaction are very simple. Its either a debit ora credit, it belongs to certain chart of accounts and it has a time stamp one at right thats it and

    thats very easy to describing the relational structure but in the modern world where we aredealing with manufacturers or retailers or wholesalers or distributors that are dealing with millionsof products and then each product has completely different set of attributes.

    2011 Forrester Research, Inc. Reproduction Prohibited10

    OLTP RDBMS are a poor fit for BI

    In order to tune OLTP RDBMS for BI, one has to:

    Denormalize data models to optimize reporting and analysis.

    Build indexes to optimize queries.

    Build aggregate tables to optimize summary queries.

    Build OLAP cubes to further optimize analytic queries. Additionally, Forrester does not see a bright future for OLTP RDBMS to be

    able to handle:

    Unstructured content.

    Diverse data structures (unbalanced, ragged hierarchies, for

    example).

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    9/22

    Imagine if you are Home Depot and you are selling I dont know kitchen appliances and you areselling I dont know garden supplies right, the description over these transactions and descriptionof this products have completely different attributes. So you either have to deal with unbalancedand rugged hierarchies or you build multiple data marts for each data that its again as I said its

    just like trying to fit a square peg into a round hole. So luckily for us the users of theseapplications, there is a new breed of technology out there, a one category of technology is justyou know business intelligence specific database management systems, and here we are tryingto present its own you know three dimensions right where we are kind of showing you if your datais very different or desperate and if your requirements change very quickly and if you have certainhigh scalability and scalability requirements, these are the three or four types of differentdatabase technologies and architectures versus traditional, relational database technologies thatare now at your disposal. Then what, so whats interesting is that we really need to underscorethat is not just about volume.

    2011 Forrester Research, Inc. Reproduction Prohibited11

    Forrester tracks four types of BI-specific DBMS

    Source: May 27, 2011, Its The Dawning Of The Age Of BI DBMS Forrester report

    2011 Forrester Research, Inc. Reproduction Prohibited12

    10%

    3%

    30%

    38%

    52%

    58%

    75%

    Other

    Dont know

    Cost: big data solutions being less exp ensive than

    traditional ETL/DW/BI solutions

    Velocity of change and scope/requirementsunpredictability

    Data diversity, variety

    Analysis-d riven requirements (big dat a) versusrequirements-driven analysis (t raditional BI/DW)

    Data volume

    In traditional BI and DW applications, requirements come first and applications come later. In other worlds,requirements drive applications. Big data turns this model upside down, where free-form exploration using big datatechnology to prove a certain hypothesis or to f ind a pattern often results in specifica tions for a more traditional BI/DWapplication.

    What are the main business requirements or inadequacies of earlier-generation BI/DW/ETLtechnologies, applications, and architecture that are causing you to consider or implement big data?

    Base: 60 IT professionals(multiple responses accepted)

    Source: June 2011 Global Big Data Online Survey

    Cost is also a factor in many cases , and dealing with data using big data technologies is simply cheaper and fas terthan other methods.

    Big volume is the top concern, but velocity, diversity,cost, and new analytic requirements are also important

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    10/22

    In a recent survey as you see in front of you we are showing, you have data you know aboutextreme scales of data volume are at the top priority for people who are using these Big Data orother alternative database technologies, but it s also a data diversity variety. Its the velocityand/or data change, its also cost right and the very interesting one, the second one is that its about analysis driven requirements versus kind of experience driven analysis right. If you thinkabout it in a traditional, in a traditional environment you have to get the requirements first andthen you build your data warehouse and in Big Data environment its almost like achicken andthe egg syndrome. In order for you to gather the requirements you need to understand whats outthere, but in order for you to understand whats out there you need to have some type of a modelin mind but you cant have a model until you explore it right. So you see this is circle that reallycan only be addressed by the new Big Data technology and thats why we say analysis orexploration based requirements definition.

    With that in mind we have updated our, the previous slide that I have shown you specifically forBig Data and it is characterized by five Vs. Right you only see four Vs on this slide, but I willintroduce the fifth V on the next slide. So its obviously about volume you see that on thehorizontal axis and itsabout velocity, very important point here, velocity is not just the speed ofthe data change, but its the speed of the requirements. Right the speed of the change of datachanges right, that thats the way to think about it. These are your x and y axis and as you cansee as the variety and the diversity of variability of data also gets more complex. The space fortraditional BI gets squeezed even more and opportunities for a Big Data analytics become moreand more important.

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    11/22

    Now the fifth V here is really value. I think its time to value, because as you can see in atraditional approach, you need to do all sorts of integration and cleansing and that being apples tooranges which in a 20% of the cases and 20% over the applications is absolutely a must rightwhen we are looking at a financial application or two plus two has to equal four. There is noquestion about it but in other applications such as brand management, brand analysis you knowtwo plus two doesnt have to equal, but we just need to get a high level understanding of whatpeople are saying about us out there, and we dont want to spend tons of time you knowidentifying similar keys, primary foreign keys. You are doing data de-duplication because kind ofaccurate enough is good enough for a brand management analysis right, so you can get to avalue much quicker so therefore thats the fifth V in big there and as I mentioned that that arrowthat kind of goes down from Big Data to traditional BI is the point that I made earlier thatsometimes the results of your Big Data exploration get manifested into requirements for a

    traditional business intelligence application.

    2011 Forrester Research, Inc. Reproduction Prohibited15

    BI DBMS and big data address different use cases

    Source: May 27, 2011, Its The Dawning Of The Age Of BI DBMS Forrester report

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    12/22

    There is unfortunately some hype out there in terms of what these new technologies do in, sohere is one way to look at them. Before you kind of plunge into evaluating different technologies,make sure that you are looking at apples to apples right and you are not comparing apples tooranges, because depending on your data volume and depending on whether you want to reuseyour existing traditional BI infrastructure and whether your data is really relational and notrelational and how much unstructured content you have out there or all of these different types oftechnologies have their strength and weaknesses.

    So here is one simple way for you to kind of separate fact from fiction. Probably a much moreimportant point to make is that while the vendors like Endeca have made great progress, so

    where its helping you with technologies, with these new types of solutions. No one out therereally has all the answers about how do we organize this right. We understand quite a bit aboutthe technology but how do we create you know what kind of organizational structure we createaround this; you know what kind of methodologies do we use. Is our methodology that we havecreated then tested for the last let say ten years for developing business intelligence, datawarehousing, ETL processes both software development and lifecycle methodology and projectmanagement methodology, are they one and the same for Big Data?

    2011 Forrester Research, Inc. Reproduction Prohibited16

    Do you run your big data initiative using the

    same or different PMO standards versus

    BI/DW/ETL?

    Do you run your big data initiatives using the

    same or different SDLC standards versus

    BI/DW/ETL?

    7%

    28%

    27%

    38%

    Other

    Dont

    know

    Different

    Same

    3%

    35%

    37%

    25%

    Other

    Dont

    know

    Different

    Same

    Base: 60 IT professionals Base: 60 IT professionals

    Source: June 2011 Global Big Data Online Survey

    No one has the answers yet. Although some companiesattempt to fit big data into standard SDLC and PMOmethodologies, Forrester believes that big data requiresdifferent approaches

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    13/22

    Well, I dont have the answer what I am trying to show you is that you know folks are all over theplace there you know some use same technologies, some use different and you know goodsignificant number still dont know how to do this. You know some other interesting questions thatalways come up if you are using a known database type of Big Data technology lets say you areusing hadoop type files systems and then processing. What happens when you want to reprocessyour query right? Now hadoop has no persistence, so if you run your analysis and you discoversay you think you have discovered something amazing and you just want to verify it before youplunge your significant investment into this new project, you want to rerun that analysis but guesswhat when you rerun it even a minute later, results have changed because the data has changed.How do you persist Big Data you know how do you store it, how do you store it how do you, youknow how do you look at security and compliance and disaster recovery and all the things that weare used to in a traditional data warehouse and business intelligence of the environment, how doyou do this in the Big Data environment. You know all I guess we are going to find out in the next

    couple of years as firms kind of try and fail or try and succeed, but today no one really has theanswers yet.

    2011 Forrester Research, Inc. Reproduction Prohibited17

    2%

    5%

    12%

    23%

    25%

    33%

    Other

    No

    Yes, for compliance

    Yes, for compliance and reprocessing

    Dont know

    Yes, for reprocessing, more analysis

    Do you intend to retain your raw big data after the exploration/analysis

    stage?

    No one has the answers yet. Most firms intend toretain their big data for both reprocessing andcompliance reasons

    Base: 60 IT professionals

    Source: June 2011 Global Big Data Onl ine Survey

    2011 Forrester Research, Inc. Reproduction Prohibited18

    Big data is a big deal . . . its not business as usual

    Firms are reconsidering the single version of the truth idea

    Big data will redefine security, privacy, and compliance rules

    IT will not control big data technology environments

    One-size-fits-all technology standards will not be flexible enough

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    14/22

    So Big Data is indeed big deal right and I hope that I have shown you that for multiple dimensionsand less human, its, it is indeed pushing the limitations or traditional technologies andmethodologies and approaches, but most importantly it is really not business as usual all righttaking from a famous story you are not in Kansas any more right. You look around yourself andthe good old truth that we all thought to be unshakable and thats no longer the case. Forexample a single version of the truth is no longer an absolute. It is absolutely now a relativecontext driven kind of approach right because as I mentioned to a CFO two plus two absolutelyequals four, even if it takes a week or a month end closing process that cost million dollars everymonth to run to calculate that number four, but there is no option right. We have to come up withthat number four but to a marketing person who just woke up in the morning and sees a newcompetitive threat form from major competitor and who needs to immediately pour out a newcampaign to address that threat and who needs to immediately do a new customer segmentationto you know for raiser focus of that managed campaign process, you know two plus two equals3.9 or 4.1 is a perfectly acceptable answer and because even if he gets at 80% right, but hesends out that campaign that morning right thats when its going to count if the campaign is basedon 100% accurate data, but send out even a day later its really meaningless, so at that pointsingle version of the tool becomes meaningless.

    We already talked about all of the challenges that Big Data presents when we define security andcompliance right. You cant really explore something without some kind of model out there right.

    You need to understand some structures but in order for you to provide structures you need toput security on it because once there is a structure, people who are not authorized to accesscertain structures shouldnt be able to access but in order for you to create those structures andto put those authorizations, you need to understand what they are but in order for you tounderstand what they are unique you need to right once again you are in that kind of viciouscircle or what comes first chicken or the egg. How can you secure something if you dontunderstand whats out there but in order for you to understand whats out there you need toexplore it? So chief compliance offices, chief risk offices all sorts of regulatory bodies andenterprises have a huge task in front of them.

    IT is absolutely no longer in charge. IT should be in charge for all of the traditional tasks such asthe data preparation and data cleansing and disaster recovery and more stable environments likedata warehouses, but in the Big Data world where everything changes on a dime where

    requirements change on a dime traditional IT approaches will not work. So it has to change itsmentality, it has to let the rains out a little bit and it has to understand and embrace thatbusinesses do need to run Big Data, a new world of agile information exploration more on theirown and we on IT side need to either embrace it and become true partners or we are going to beoutdated very quickly and last but not least, there is a realization that one size fits all technology,doesnt really work today. The extremes that we are talking about absolutely call for specific toolsfor specific used cases.

    When you are doing transaction processing on billions of rows of data you absolutely need thedata, database technologies thats optimized for transaction processing and even you need to runa fixed report on billions of rows. You need another type over database that is optimized just forthat and when you need to do exploration on not just billions but trillions data points and its not100% accurate analysis, it is indeed exploration you need to understand whats out there before

    you can even start to understand what is it that you are going to analyze. You need to get thethird type of technology optimized just for that. I dont think that in the near future, we will haveone platform that will let us do all that. So specific tools for specific tasks I think is going to be thename of the game in the near future.

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    15/22

    How do you get started with Big Data? Well you obviously need to start with very specific used

    cases right. You absolutely dont want to go there just because everyone is doing it. So take alook at the requirements, take a look at what is that you think you could do if all of the limitationsof your traditional environments werent there right. You know I am sure you had lots ofdiscussions with your business counterparts over the last couple of years where your answer wasWell we cant do it because its too complex or we can do it because its too expensive, I willcome back to those used cases and see if you indeed can do that now with the new technology,but number two absolutely understand that why is it that you are going there. If its really justabout volume you know maybe adding more RAM, adding more CPUs more sockets etc., to yourexisting data warehouse maybe thats a much simpler solution to your problem.

    You know we really see when is the intersection of two or more over those Vs that we talkedabout, when its not just volume but its velocity and its not just volume but its variety of data, itsnot velocity of data, but its velocity of change of data. Thats when you kind of should ask yourself

    a question, Is my traditional data warehouse the right platform for it? just like with anything elseobviously big bang approach never works, you need to start simple and small. You need toidentify the low hanging fruits just like with any traditional BI technology you know it can veryeasily lose momentum and enthusiasm and support over your business stakeholders. So youneed to deliver value very quickly and the best way to do this is as I said you know Start smallbut think big and I deliver something quickly literally within weeks, hopefully even within days.They should understand the value and that should drive their decision to invest in more scalable,more integral enterprise grade technologies. You obviously need to have some kind ofgovernance policies right. We unfortunately at this point as I said early we cant advice you as towhat those policies are, you know but you should put some constraints around it obviously if youtake just anybody and let them lose in at the entire universe over your data then you know theoperational risk complications are so huge that I cant even begin to talk about it.

    2011 Forrester Research, Inc. Reproduction Prohibited19

    Recommendations

    1. Identify opportunities, and have a what if we could conversation with

    your business

    2. Clearly understand why traditional BI/DW cant solve a problem

    3. Start simple, small, and scalable

    4. Develop a set of governance policies

    5. Develop a business case with tangible ROI

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    16/22

    So please do put some constraints around it and please share what you find out with us because

    we definitely at Forrester wants to start build a knowledge base of what works there and whatdoesnt work and absolutely no business person will ever understand use of any technologywithout a rock solid business case that you absolutely need to do that. Unfortunately lots of yourpeers and competitors arent still doing a good job of supporting their Big Data initiatives with verykind of tangible business case; lots of people dont even know how to measure success.

    So you absolutely have to have that, what are the success factors and what is the actual valueand put some, put some hard numbers around it. We do have little bit of advice and little bit ofhelp. We have created this effort, model way you can take all of the steps that you use in atraditional data, warehouse data mart business intelligence environment, all the way from datasourcing to implementation, all of the steps that you usually execute and put a value on each stepand then we compared, we compared the final results to, if you implemented the same type ofinitiative using these new types of agile and Big Data type technologies and as you can see, we

    2011 Forrester Research, Inc. Reproduction Prohibited20

    Do you have a business case for

    the big data initiative in place?

    3%

    12%

    12%

    13%

    22%

    32%

    47%

    Other

    With qualitative metrics tied to ITperformance

    With quantitative metrics tied to ITperformance

    Dont know

    No specific measurementmethodology in place

    With qualitative metrics tied tobusiness performance

    With quantitative metrics tied tobusiness performance

    How do you plan to measure the

    success of the big data initiative?

    Base: 60 IT professionals Base: 60 IT professionals(multiple responses accepted)

    7%

    8%

    10%

    22%

    25%

    28%

    Other

    Dont know

    Yes, with a proven ROI

    Yes, with intangible benefitsonly

    No business case

    Yes, with a projected ROI

    Almost 50% of firms surveyed are undertaking big dataprojects with no business case or intangible benefitsonly; business performance is the most commonsuccess goal

    Source: June 2011 Global Big Data Online Survey

    2011 Forrester Research, Inc. Reproduction Prohibited21

    Recommendations: Use Forresters BusinessIntelligence DBMS Effort Estimation Model

    Enter project/application parameters.

    Adjust Forrester estimates for % savings for all of the substeps under datapreparation, data modeling, data usage.

    Initial effort with row

    RDBMS-based BI$750,000

    Initial effort savings

    Total effortData

    preparationData

    modelingData

    usageColumnar RDBMS 28% $207,000 34% $102,000 25% $75,000 20% $30,000In-memory index 31% $232,500 8% $24,000 54% $162,000 31% $46,500Inverted index 31% $235,500 8% $24,000 54% $162,000 33% $49,500Associative 30% $222,000 8% $24,000 54% $162,000 24% $36,000

    Ongoing yearly effortwith row RDBMS-based BI

    $1,658,100

    Ongoing effortsavings

    Total effortData

    preparationData

    modelingData

    usageColumnar RDBMS 44% $735,375 4 6% $12,750 41% $75,750 45% $646,875In-memory index 68% $1,131,375 11% $3,000 79% $146,250 68% $982,125Inverted index 68% $1,129,125 11% $3,000 79% $146,250 68% $979,875Associative 61% $1,016,625 11% $3,000 79% $146,250 60% $867,375

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    17/22

    are definitely projecting significant savings right, I dont recommend looking at these numbers ashard numbers, but come to us, get a copy of this model plug in your own numbers and hopefullyyou will have similar results and hopefully it will be a good input into your business case.

    So with that in mind, hopefully we whet your appetitive about Big Data and what can I do for youand kind of loaded you to some potential gut shells and what you should be looking out for, hereis information where you can find me and Forrester colleagues when you have more questionsbut at this point let me turn this back to Endeca, so Paul take it away.

    Paul Sonderegger: Boris, thank you very much. That was a fantastic analysis of Big Data

    landscape and you know the thing that I want to cover here is actually just a piece of thatlandscape and that is, give a couple of examples of Big Data, in daily decisions and kind of valuethats created in doing that.Now so at Endeca the main thing is that we do is turn Big Data intobetter daily decisions and we have actually been doing this for a long time, for well before thispractice had a name, well before Big Data was a buzz word. We actually worked on what isarguably the original Big Data problem; online exploration in e-commerce.

    So if you think back to the early 2000s, and you think about what was going on with e-commerceyou had this problem where there was great amounts of diverse data and content, product out ofthe catalogue, contents on products from content management system, as well as reviews startedcoming up and then of course there is offline transactional data from enterprise systems thatought to be used to improve the user experience and so we are bringing together all of thatdiverse data and content and then making it available to consumers so they could make better

    buying decisions. So we were making this diversity of data, this big variety available to peoplewith no training in technology and people who have had a big variety of questions. Everyconsumer thought about, oh say Home Depots, product inventory slightly differently andcompletely differently from the way Home Depot thought about it. So we were solving these BigData problem and daily decisions in e-commerce a long time ago and what we have found is thatsome of the innovations that we created in the user experience as well as in data integrationapply directly to solving Big Data in their decisions at work and so we have two products EndecaInfront which is a customer experience management platform and Endeca Latitude which is foragile BI and its Endeca latitude we are going to talk about here.

    2009 Forrester Research, Inc. Reproduction Prohibited

    Thank you

    Boris Evelson

    +1 617.613.6297

    [email protected]

    http://blogs.forrester.com/boris_evelson

    Twitter: @bevelson

    www.forrester.com

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    18/22

    Both of these products though are based on the MDEX engine and we will talk in a little bit aboutwhat makes that MDEX engine different, but first I want to hit well what is the matter and to dothat I want to talk about three examples, three examples of our customers who have solved theseBig Data problems at work and as, as we are talking about these examples the thing is that Iwould like you to think about is this is really BI beyond the warehouse. So as Boris described theBI industry is very, very sophisticated, has worked out enormous difficulties around merging largevolumes of data, doing analysis on large volumes of data, but there is a key requirement thatcomes from traditional BI technologies, traditional relational BI technologies and that is, that youbuild the model first and then fill it with conforming data. That approach is in conflict with the bigvariety of Big Data, not necessarily in consult with the big volume of Big Data, but certainly withthe big variety and the big velocity of change of Big Data, and so what we are going to talk abouthere is BI beyond the warehouse where you have great variety of information. Its constantlychanging and it needs to serve people who have no training in technology, but they havequestions that matter to them.

    So if we take a look at some of these Big Data problems at work, then the first one is that I wantto mention is some work we gave to Toyota and as many of you may know in 2010 so youalready had a very, very big product recall. That was actually the biggest in their history and themain problem that the main problem Toyota faced was here is a company whose brand is basedon the outstanding work they have done in creating high quality products for decades and now

    that brand identity, the very soul of the company was under attack by these claims of unintendedacceleration with Toyotas and so the challenge they have was to somehow figure out how toisolate the root causes behind these claims, in order to restore the brand, but to do that they hadto pull together a great variety of data. So more than 12 different systems each with their ownschemas each with their own data models and in some cases with long form of text that has nodata model at all and so this included extract out of the vehicle warehouse, it included structureddata out of enterprise applications including Oracle system that was a quality touch point thatcaptures information about the manufacturing processes on the manufacturing floor. It alsoincluded data from National Highway Transportation Safety Authority NHTSA and these wereclaims and in those claims there were some field of data, so some structured data but there arelong form of text to do the description of the problems themselves and this all needed to bebrought together somehow and very, very quickly.

    In addition, in addition to the difficulty of bringing together that diverse data, there was hugevariety in the questions that had to be asked; so which parts are we talking about, which parts arementioned in those claims, which suppliers provided those parts but which factories, are in whichvehicles? No one knew ahead of time that any of these questions would have to be answeredand no one of course knew ahead of time that these questions would matter so much. So withthat kicks, CIO of Toyota, North American Motor Sales, when he tells the story he you know theway he tells the story is that he found with Endeca Latitude a technology that he believes wouldmake a difference. So at a time like that where there is great anxiety, great uncertainty and this isgenerally not a good time to introduce new technology into the organization, but he felt that hehad found a technology that would make a difference because it dramatically reduced the costand time and effort of integrating diverse data and content together and it dramatically reducedthe time, cost and effort to the users in this case design engineers in making sense of that dataso that they could get fast answers to complete the new questions, as that latitude app was

    instrumental in housing Toyota established that in fact there was nothing demonstrably wrongwith those pedals or with the electronics to control them.

    So the second customer, second company who has solved a Big Data problem at work is LandOLakes and Land OLakes had a similarly large problem. Their problem is that they want tofigure out how to analyze seed performance, corn especially to feed a growing world. They knowthat they need to help farmers increase the yield of their planted acres in order to feed a growingpopulation which is going to hit seven billion this October, and so in this case the diversity of data,this great variety of data they had to bring together included data from their transactionwarehouses as well as information from their marketing programs which indicated to which

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    19/22

    farmers they had marketed, which products in the past. It also included information that was fromoutside the company, government acreages reports, which indicate how much land is actuallyunder cultivation, and there were other stories as well, that came from Land OLakes own testplots, plots of land at various different locations in the U.S where they test these seed variants, sothey can demonstrate planters the yield you get with this kind of seed at this kind of latitude,because the users of this application were sales people and distributors who are trying topersuade farmers that they are to change the seed they use.

    These farmers cannot risk a poor growing season and so the data has to tell a very clear storyand it has to do it persuasively, but of course there is a huge variety of questions which seeds,which soils, which farms, which locations and so again you get this issue of sort of doubleuncertainty which Boris described. You cant know what kinds of questions will be asked of thedata and so you know exactly what data you are going to have but you are also not sure whatdata you should pull together in order to answer the questions that may come up. The way toresolve that double uncertainty is to exploring this information, but again it has to be cost effectiveand that exploration has to be easy to use.

    So the last example I mentioned here was with the British Transport Police and the problem theyare trying to solve is, how to identify threats faster to keep more citizen safe. So here they areusing a technology as part of securing London for the 24/12 Olympic Games and this is a highly

    different example because here is the scope of the data being brought together, is not as largerthan the other cases but of course is the magnitude of the problem is immense. So here we aretalking about data out of the command warehouse we are talking about stop and search reportsfrom local policing forces as well as local service gazettes, that have information about particularevents in neighborhoods and things like that, so that the officers can get faster answers to newquestions about whats going on at the street level, which people, which places, which events,which threats and again you get immense variety in the da ta thats coming from different sourcesand all the different schemas, but then also big variety in the questions and this is whatcharacterizes Big Data problems at work, its BI beyond the warehouse data that does notconform to a nice neat model, to answer questions which cannot be fully anticipated.

    So how do you do that; how do you implement such a solution like that? Well we are going to bejust a little window into that. We take an approach that we call agile delivery and it really means

    an iterative approach to provisioning these latitude applications and what we mean by that is youdont have to have all the requirements upfront, remember you know the problem that Borisdescribed in a Big Data world, well you have double uncertainty. You cant exactly be sure whichquestions you are going to ask until you know which data you have available to you and you cantexactly know which data you want to make available and how you want to link it together and soafter you have an understanding of what kinds of questions would be asked a bit, so the chickenand the egg problem and the way to resolve that is to load the data as is, start to explore it andthen add new data sources as the business realizes, Oh now that you show me these twosources together I just remember there is a database under Georges desk, he has been here for20 years, he has been collecting notes on whats been going on in the field. Can you add that inand the answer needs to be Yes we can do it really quickly. In addition the business will say Ahnow you show me see this I want to see the data in a different work and you show it to me on amap and the answer needs to be Yes and it needs to that answer needs to be delivered very

    quickly that new visualization needs to appear very quickly.

    So this is the approach we take, one of these iterative turns very, very rapid in adding sourceschanging the visualizations and the business in the BI team sit together. They work together handin hand because they are in a situation where neither can specify all the requirements that matterahead of time. So here is the effect in service of delivery time and FTEs full time emplo yees todeploy these projects. One of things that Toyota had said about their project is that they expectthat with traditional relational BI technologies, the Latitude App that we did for them would havetaken 55 weeks but we have taken over a year and they said we didnt have the kind of time, butwith Endeca Latitude it actually took 12 weeks and if you will notice one of the big difference is

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    20/22

    here between the two bars is the way work changes and the big change is because of the MDEXengine has got a technology, you dont spend time building a predetermined model but ratherpour the data in and then expose it, so that the quality engineers and the BI team together canget a look at it. Land OLakes for their project, they have to make that it would have taken 15 fulltime equivalents, with relational technologies but with Endeca Latitude instead required onlyseven.

    So I had mentioned that we would talk about, we will talk about the technology itself what makesthis possible and so on, and so I want to touch on that just for a moment. The key thing is thisMDEX engine inside latitude, has the hardest latitude platform is the MDEX engine and it is apost relational innovation. It is a hybrid search analytical database it borrows some of the bestideas from the world of search, some of the best ideas from the world of analytical databases andunifies them into a new architecture. So inside that MDEX engine there is no predeterminedmodel for the data, but instead it has a dynamic schema. The engine derives its model of the datafrom the incoming records and documents themselves and then on top of that it providesintegrated navigation analytics search on that highly diverse and changing data and of course weare using latest ideas in analytical databases to deliver really some fast performance interactiveresponse time, because we learned in serving consumers thats whats required. Thats what isrequired to support speed of thought analysis. So this is in-memory performance but not memorybalance because we are often working with data sets that exceed the limits of memory, and this is

    all wrapped up, this engine is wrapped in a scalable reliable and secure platform.

    First I want to stop here is actually where we began which is whats going on in the industry atlarge. So Big Data is a huge opportunity whether you work with Endeca to try to capture orwhether you work with someone else, there is a huge opportunity and the reason is that whatshappening with the advent of Big Data is that the real world is increasingly reflected, increasinglyreflects the connectivity of the digital one and the digital world increasingly reflects the diversity ofthe real one and in a situation like that relational technologies which require that you build amodel first and then fill it with conforming data are at odds with this new world and there is wholegeneration of technologies coming out now that take a post relational approach and this is the keyin order to reduce both the time, cost and effort of integrating diverse and changing data togetherand to reduce the time, cost and effort to the user of making sense of it, to make better dailydecisions, and so with that I will hand it back to Ron, Ron

    Ron Powell: Great excellent, Boris and Paul excellent presentation. We will now move on toQ&A, and again if you have a question, please submit it and put Boriss name or Pauls name infront if you want to direct it directly to them. So we will start with the first question Boris, it seemsthat Big Data is a natural fit for cloud based for a cloud based data marketplace perhapsincreasing the data value when third parties have interest. If you think this is in the cards, if socan you comment on how you see this evolving?

    Boris Evelson: Well thats interesting. Once again one of the hypes out there that we willuncover it in our research is that Big Data is mostly about open source technology and a Big Datais mostly about cloud technologies but majority of our respondents said No we are running ourBig Data initiatives in-house and we are running them on some proprietary technology So again Iam not advocating one versus another I just want to make sure that the listeners understand that

    its not really about open source versus proprietary, and its not about a cloud versus on premise, itis all about what is your business used case, what kind of questions you are trying to ask and youknow then and only then what kind of a technology has the best fit.

    So indeed if you are in a situation where most of the data that you are going to be analyzingalready resides in the cloud such as social for example is probably little reason to bring it in-house for processing unless, unless you are really trying to build 360-degree view of a customerwhere all of the financial and HR and supply chain and your own sales and marketing data isinside your firewall right, then old business social data is just one small component. So again youknow cloud is an absolutely great technology for the right use case and I just dont want the

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    21/22

    listeners on the phone to think about cloud as a panacea you know its all about what is it youwant to do, how you are going to do it and you know at the end of that line of questioning youknow where you are going to do it thats when that question needs to be asked.

    Ron Powell: Great, great. Paul this question for you how did you tackle, how do you tackle datathat exceeds a memory bound just as you mentioned something that maybe bound but youstopped short could you please elaborate and clarify?

    Paul Sonderegger: Sure. So we have you know intellectual property thats, that exploits the fullmemory hierarchy all the way from right next to the processor through DRM all the way down tothe disk, and so we are constantly optimizing what is kept in memory. Recent queries, recentintermediate results, recent full sets of results, and so by doing that we are able to deliverinteractive response time in search and navigation and analytics on data that is so large, it cantbe fully held in memory. Some of it has to be held on disk as well and this is something that weran into in the e-commerce world because search industries for example are just simply too largeto hold in memory. So we actually ran into this problem, years ago and so it has been developingand refining the intellectual property we used to fully exploit the flow memory hierarchy in additionto exploiting the 64 bit addressable space that comes from these new commodity supercomputers.

    Ron Powell: Great another question for you Paul, would you say that Endeca is an index-basedimplementation of the federated distributed model?

    Paul Sonderegger: Well there are a number of terms in there that would require definition but letme put it this way. A good way to think of that MDEX engine is that it is acting as a very highspeed cash on top of the source systems underneath, that MDEX engine does not become atransactional system of record in you know at any time, rather it updates at the same rate as theunderlying transactional systems. So we are indexing the data from the systems, we are indexingthe structure from those underlying systems but that representation in the engine does notbecome a transactional system of record.

    Ron Powell: Okay now this question is how does Big Data management deflect and touredintervention on data sets? In other words when is relational data decoupled or shielded to prevent

    infiltration of skewed sets stemming from names, measure uppers, I dont know which one youwould like to handle that, Paul

    Paul Sonderegger: Well thats Boriss question.

    Ron Powell: Boris, go right ahead.

    Boris Evelson: Yeah, thats a loaded question obviously you know the person who asked thatquestion definitely reached out to me via email. I would need to understand more details behindit. I think the question was about how do you, how do you merge you know traditional relationaldata sets with Big Data, a type of kind of unstructured, un-modeled data sets, again I think thereare no easy answer there. I dont think there is any direct way to do this other than once you havegone through the exercise of exploration, then as an output of your exploration exercise you

    found some relational like structures in your Big Data environment. You know, you indeed turnthem into relational structures and you then either federate it with your traditional datawarehousing you physically load it into the data warehouse. I know its not an exact answer to thatcomplex question but thats the best I could do it in such a short time.

    Paul Sonderegger: Yeah, let me just add one thing to that. One of our customers, a CIO told usthat his idea of how latitude compliments is existing BI infrastructure is very straightforward. If itsin warehouse they already have the tools in this case it was OBIEE to query that data and then ifits outside the warehouse, or if its going to be used by the people with no expertise in BItechnology, its latitude.

  • 8/10/2019 Big Data Analytics- Emerging Techniques and Technology for Growth and Profitability (1)

    22/22

    Ron Powell: Great, great. Well we are just out of time so I want to thank, you and Boris, Paul andBoris for such great presentation. I want to thank you for that and I would also like to thankeveryone for attending the web seminar today and it will be available on the Bitpipe at TechTargetand please we will follow-up with the rest of the questions that were not answered but if you dohave a question please e-mail either Boris or Paul directly, thank you.