As promised earlier this year, Pivotal released the code for Greenplum Database into Open Source.
Greenplum Database is based on PostgreSQL (was forked from PG 8.2), and features a massive parallel processing system (MPP) to run SQL queries on very large data sets. The code base is licensed under the Apache 2.0 license, and available on GitHub. You can fork the project from there, or submit patches and new features.
One of the main goals of the engineering team is to merge the existing code base with a recent PostgreSQL version. Although many features from newer PostgreSQL versions made it into Greenplum, there are many differences in terms of code. Also Greenplum offers unique features (new query optimizer, SQL support for partitioning, append-optimized tables, columnar storage, storage compression and many more), which over time will be ported to PostgreSQL and submitted for community review.
Most of the development will move into the public (except some internal customer related work), and will be managed using newly created mailinglists on the greenplum.org website.
Google Summer of Code 2014 is wrapped up: Maxence Ahlouche did an excellent job implementing one new algorithm for MADlib and refactored the code base for another one.
I posted a more detailled explanation in the Pivotal blog.
Blogged about how Pivotal Greenplum Database is using all available CPU resources when executing queries.
More in the Pivotal Blog: CPU Usage in Massively Distributed Analytic Data Warehouses
Together with Atri Sharma (former GSoC student) and Pivotal Engineer Hai Qian I'm mentoring Maxence Ahlouche in his Google Summer of Code MADlib project.
In the Pivotal Blog I've posted a more detailed explanation.
Ispirer SQLWays is a nice (although commercial) tool to convert DDL and data from one supported database to another. We are using it regularly, the list of supported databases is impressive: PostgreSQL, Greenplum, Oracle, SQL Server, IBM DB2, MySQL, Sybase, Informix, Teradata, Netezza and some more.
There's just one thing which I always forget: SQLways exports all data by default, which makes the export unnecessary big and slow.
Continue reading "How to not export data in Ispirer SQLWays"