Push the Limits of Flexible and Powerful Analytics

  • Executive Summary
  • Cloud Computing and Big Data
  • Two Worlds Converge

The growth of cloud computing and the emergence of big data systems have been two of the most disruptive technology trends over the last few years. These developments have changed the way technology organizations operate and deliver value to their stakeholders.

With the right unified, end-to-end, data integration, business intelligence and machine learning orchestration platform, organizations can quickly deliver big data processing in the cloud and on the premises.

Executive Summary

Cloud computing has allowed enterprises to optimize both IT operations and the rapid creation of new services. This is achieved by significantly reducing the need to invest in on-premises hardware, software and technical skill. At the same time, big data technologies have enabled organizations to generate value from data assets like never before.

With the right unified, end-to-end, data integration, business intelligence and machine learning orchestration platform, organizations can quickly deliver big data processing in the cloud and on the premises.

This white paper covers:

  • How open source big data technologies and platform categories have gained rapid adoption.
  • Key technology components that are enabling extraction of value from massive, diverse data on cloud platforms.
  • Sample solution architecture, which illustrates how the different technologies can be leveraged to drive business outcomes.
  • NASDAQ Case Study, which describes how the company employed a cloud-based solution with Hitachi Vantara’s Pentaho platform to manage huge volumes of data and drive business insight.

Cloud Computing and Big Data

Two of the most disruptive technology trends over the last 10 years have been the growth of cloud computing and the emergence of big data systems. These developments have changed the way technology organizations operate and deliver value to their stakeholders.

At a basic level, cloud computing has allowed enterprises to optimize IT operations by significantly reducing the need to invest in on-premises hardware and software, not to mention the staff required maintain these systems. The cloud affords businesses a new level of flexibility, as they can acquire applications, infrastructure and computing power in a way that is much more closely matched with the timing and duration of their project needs.

Further, by pooling infrastructure across many customers, cloud vendors are able to provide services that are highly elastic and scalable. This means it is much more financially and operationally manageable for enterprises to address unanticipated peaks and troughs in infrastructure needs. Overall, cloud adoption continues to show momentum, as the public IT cloud services market is expected to grow five times faster than the IT industry as a whole.

At the same time, big data technologies have enabled organizations to generate value from data assets like never before. Historically, data that was high in volume, diverse in structure, and rapidly changing posed difficult challenges for enterprises that were used to working with traditional relational database technology.

However, new technical paradigms, such as defining schema on read when accessing data, massively parallel processing, microservices and stream processing have provided many new opportunities. These options include the abilities to reduce the overhead required to get raw data into a data store, to deal with data in motion, and to make robust and flexible architectures.

They drastically increase the speed and efficiency of processing large amounts of data. Making unstructured and semistructured data much more accessible for businesses combined with these new paradigms make whole new generations applications, business models and efficiencies available.

These innovations have also begun to unleash actionable analysis on a variety of previously challenging data sources, including web logs, documents and text, and machine sensors. Even, “dark” data (data locked in corporate silos with little analytic access) has been given new life through these new technologies. As open source big data technologies have matured into commercially supported products, we have seen several platform categories start to gain rapid adoption, especially for next-generation applications and analytics.

  • Apache Hadoop based distributions: Frameworks for large-scale data storage and high-performance processing across a distributed file system, ideal for high volume unstructured data.
  • Not-only-SQL (NoSQL) stores: NoSQL databases are agile and can include geographical distributed scale-out architecture. The main types of NoSQL stores are document databases, graph stores, key value stores, wide column stores and multimodal stores.

Two Worlds Converge

On an infrastructure-as-a-service (IaaS) level, it makes sense that enterprises would turn to cloud providers who have expertise in managing and maintaining extremely scalable and flexible computing and storage infrastructure.

Big data systems help organizations solve hard problems, but they normally require a significant upfront and ongoing IT investment. This type of venture includes a potentially large number of server machines as well as employees with skills that may be hard to come by, such as Java or MapReduce skills.

At the same time, the sheer amount of data in more ambitious multi-petabyte projects may lead teams to rethink whether keeping everything in-house is the best strategy. Finally, the time element is also important: Procuring, installing, configuring and testing the required technology doesn’t happen overnight.

On an infrastructure-as-a-service (IaaS) level, it makes sense that enterprises would turn to cloud providers who have expertise in managing and maintaining extremely scalable and flexible computing and storage infrastructure.

While on-premises data systems are by no means going away, research indicates that “cloud platforms are ideal deployment options for elastic and transient workloads built in modern application architectures.” This suggests that organizations can effectively push the limits of analytics at scale by tapping into big data systems hosted on cloud  infrastructure.

Now, more advanced platform-as-a-service (PaaS) versions of data processing engines, Hadoop-asa-Service or NoSQL-as-a-Service have enabled far better integration with other cloud-based application stacks.

A survey of enterprise decision-makers reported that over a quarter of organizations have already started utilizing public cloud resources for big data analytics projects and another quarter plan to do so going forward. While many of these early cloud projects involve high volumes of structured data, there are several key technology components that are already enabling extraction of value from massive, diverse data on cloud infrastructure.

  • Cloud analytical databases: These cloud-based services, such as Amazon RedShift, are elastic data warehouses optimized for analytics with existing business intelligence (BI) tools. In addition to leveraging enhancements like massively parallel processing and columnar storage to boost performance, this type of analytical database also includes management and monitoring of the solution by the provider. Users are able to avoid many of the costs related to setting up and managing a traditional data warehouse.
  • Hadoop and NoSQL services: Hadoop services can also be hosted or run as a platform in the cloud, which avoids the need for on-premises infrastructure and reduces reliance on in-house Hadoop-specific staffing to support big data use cases. Given on-premises startup costs and cluster hardware expansion over time, it’s easy to see where the cloud can provide value. Some Hadoop cloud offerings also include managed services, like job troubleshooting, software installation, testing and more.
  • Data integration and analytics: While adoption of “cloud BI” tools has increased, Hitachi Vantara’s Pentaho platform is unique. Pentaho provides a cloud-deployable platform that supports end-to-end data integration and business analytics for big data stores, including the cloud analytical databases and hosted or platform Hadoop services discussed above. This data can be blended with a variety of other cloud-based data for further insight. An extract, transform and load (ETL) job can be created within the tool but executed through push-down processing using either Spark or MapReduce without recoding the task. Dealing with streaming data from Apache Kafka, connecting to Amazon S3 using Identity and Access Management, connecting to Google Cloud Storage, Google BigQuery, Microsoft Azure storage and many other approaches are simplified. Even file type such as ORC, Avro and Parquet are catered for.

The next section discusses a sample solution architecture, illustrating how these different technologies can be leveraged to drive business results in practice

To read full download the whitepaper:

Only $1/click

Submit Your Ad Here

 

websecuremedia

Access the latest Information Technology white papers, research, case studies and more covering a wide range of topics like IT Management, Enterprise Management, Information Management , and Internet of Things (IOT).
https://websecuremedia.com/

Leave a Reply