Data Platforms: The ODP, HAWQ on Hortonworks, and an Update to Pivotal Big Data Suite
On the topic of data platforms, industry announcements flew out of Hadoop Summit 2015 in Europe. To date, the Open Data Platform (ODP) members include GE, Hortonworks, IBM, Infosys, International TELCO, Pivotal, SAS, Altiscale, Capgemini, CenturyLink, EMC, PLDT, Splunk, Teradata, Verizon, VMware, and WANdisco.
Pivotal made big, back-to-back news headlines on several fronts with regards to data. First, we now officially support Pivotal HAWQ, our SQL on Hadoop engine running on Hortonworks Data Platform and it works in the Hortonworks Sandbox with some detailed instructions on deploying via Apache Ambar. At EMC World 2015, Pivotal made a splash with open-source-based improvements to Pivotal Big Data Suite. Then, they announced Pivotal HD 3.0, now aligned with the ODP core. This included updates to many Apache™ elements, including HDFS, YARN, Ambari, Pig, Hive, HBase, Zookeeper, Oozie, Nagios, Ganglia, Ranger, Knox, Tez, and Spark.
Real-Time Big Data: Pivotal GemFire, Stock Prediction Architectures, and 5 Terabytes of Memory
Since Pivotal GemFire, well known as a distributed in-memory database, was pushed out into the open source world as Project Geode last month, several really cool things happened. The website and project source code moved over to the Apache ™ incubator. Some excellent, new content has been put out there—a series of videos on the architecture, a concise history about the problems GemFire was built so solve, and a pointer to a great talk given by Pivotal GemFire’s chief architect on in-memory data grids.
GE also released a performance study white paper on Pivotal GemFire, proving that it is truly industrial strength using a cluster of 46 machines and 5 terabytes of memory to ingest data at a rate of 100,000 time series data points per second over 5 days.