What is MapR and why should you care?

MapR has made great strides in developing Hadoop for enterprises across the globe. The company's flagship Hadoop distribution enables both public and private organisations to collate and analyse information in real-time – a capability that's becoming essential with the rise of smart devices. 

In this article, we'll detail how MapR works and how it can impact your business. 

Note: If you don't know much about Hadoop, you'd be better off reading our "Hadoop: A guide for the average business professional" series first just to get some context.

Designed for the data centre

According to the company's website, the MapR Converged Data Platform is an enterprise data product that uses Hadoop and Spark, offering high availability. The software delivers immediate node-based recovery, a lower total cost of ownership and support for organisations that require constant access to information.

In case you're not familiar with Spark, it's a general processing engine that runs across Hadoop clusters through YARN. The system is engineered to execute batch processing, interactive queries and machine learning throughout Hadoop-based environments.

MapR's key capability lies in running multiple Hadoop applications over one machine cluster. Many Hadoop distributions (distros) need to run in separate clusters. In contrast, MapR enables companies to run operational and analytic systems over one server array, thereby decreasing data management costs.

Through MapR, administrators can exercise central control over Hive, HBase and Drill.

Available, open and protected

MapR's accessibility complements its converged platform. Essentially, administrators can exercise control over technologies such as Hive, HBase or Drill through a single interface. In addition, users have the freedom to allocate workloads among clusters and pull data sets from any server array.

Using open source technologies provides MapR with a lot of support, but developers subject the technology to rigorous quality assurance processes. However, the company makes a point to validate, test and reinforce Apache project updates before integrating them into its data platform. 

This focus on QA directly translates to high performance and data protection features. MapR uses mirroring, replication and point-in-time snapshots as recovery tools. The system enforces security through wire-level encryption, consistently auditing data permissions and authentication protocols through Kerberos or the Lightweight Directory Access Protocol.

You can probably see why we've taken an interest in this technology. The system uses not only the scalability of Hadoop, but also the defensive tools organisations need from enterprise solutions. We're pretty excited about what MapR will offer in the future and how we can help companies make the most of it.