Apache Spark Adoption Fuels Big Data Analytics Growth
- 1 of
-
Apache Spark Adoption Fuels Big Data Analytics Growth
Rising demand for faster big data analytics apps is driving the adoption of Apache Spark, a core technology for modernizing data warehouses. -
Most Important Arributes of Apache Spark
A full 91% cited performance as the top attribute, followed by ease of programming, at 77%; ease of deployment, at 71%; and advanced analytics, at 64%. -
Other Important Apache Spark Features
Real-time streaming (52%) and DataFrames (47%) are additional important features. A full 64% of respondents are running the latest version of Apache Spark. -
Use Cases for Apache Spark
Business intelligence was ranked highest, at 68%, followed by data warehousing (52%), recommendation systems (44%) and log processing (40%). -
Most Common Apache Spark Deployment Models
Nearly half (48%) deploy Apache Spark in stand-alone mode followed by YARN running on Hadoop, at 40%. Just over half of respondents (51%) are running Apache Spark on public cloud. -
Most Common Apache Spark Platforms
A full 75% are running Apache Spark on a Linux/Unix platform, while 47% are running on OS X. The fastest growing platform is Windows (23%), which grew 17 percentage points from 2014. -
Most Used Apache Spark Components
Nearly seven in 10 (69%) are using Spark SQL, followed by DataFrames (62%), MLib + GraphX (58%) and streaming (58%). Three-quarters (75%) are using two or more Apache Spark components. -
Programming Languages Used With Apache Spark
At 71%, Scala is the most widely used programing language, followed by Python (58%), SQL (36%) and Java (31%). Python use is up 49% year-over-year. -
Apache Spark Components Used in Production
Nearly a quarter (24%) cite SQL, followed by DataFrames and advanced analytics (at 15% each), and streaming (14%). SQL use grew 380% from 2014. -
-
What Partners Need to Know About HP, Inc.
View Slideshow » -
MSPs Face Big Cybersecurity Talent Gap
View Slideshow » -
Why Tech Companies Are Eager to Invest in 5G
View Slideshow » -
The Problem With Partner Referral Programs
View Slideshow » -
Microsoft Taps Channel for Digital Business
View Slideshow » -
New Technologies Will Fuel Channel Opportunities
View Slideshow » -
Channel's Transition to the Cloud Requires More Time
View Slideshow » -
Microsoft's Cloud Channel Begins to Mature
View Slideshow » -
Defining MSPs' Goals, Challenges and Tools
View Slideshow » -
Why Metrics Matter to the Channel
View Slideshow »
-
As one of the most vibrant and widely adopted open-source projects, Apache Spark in-memory clusters are driving new opportunities for application development as well as increased consumption of IT infrastructure. Apache Spark is now the most active Apache project, with more than 600 contributions being made in last 12 months by more than 200 organizations. A new survey of 1,417 IT professionals working with Apache Spark conducted by Databricks, a provider of Apache Spark as a cloud service, finds that high-performance analytics applications that can work with massive amounts of big data are driving most of that demand. Apache Spark is now being used to aggregate multiple types of data in-memory versus only pulling data from Hadoop. For solution providers, Apache Spark is significant because it's one of the core technologies used to modernize data warehouses, a huge segment of the IT industry that accounts for multiple billions in revenue. As is often the case with any emerging technology, the expertise needed to transform data warehouses using Apache Spark is presently hard to come by, which may signal a major opportunity for the channel.
What Partners Need to Know About HP, ...
In the channel, HP, Inc. is a storied vendor that has relationships...Watch Now