Big data

Speaking of SQL on Hadoop

Greenplum just announced Pivotal HD, their new Hadoop distribution that contains HAWQ, a high-performance relational database running on Hadoop. Here are some highlights.

  • Fully compliant and robust SQL92 and SQL99 support. We also support the SQL 2003 OLAP extensions. 100% compatible with PostgreSQL 8.2.
  • Columnar or row-oriented storage to provide benefits based on different workloads. This is transparent to the user and is specified when the table is created. HAWQ figures out how to shard, distribute, and store the data.
  • Seamless partitioning allows separating tables on a partition key, enabling fast scans of subsets by pruning off portions that are not needed in a query. Common partition schemes are on dates, regions, or anything commonly filtered on.
  • Parallel query optimizer and planner take SQL queries that look like any other, then intelligently looks at table stats to figure out the best way to return data.
  • Table-by-table specification of distribution keys allow design of table schemas to take advantage of node-local JOINs and GROUP BYs.

HAWQ follows MPP architecture. Greenplum claims that HAWQ is hundreds of times faster than Hive and orders of magnitude faster for some queries (group by and joins) than competing SQL-on-Hadoop solutions.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s