These are exciting times with competition between Snowflake and Databricks (and perhaps Microsoft’s Fabric catching up) including consolidation of formats to allow transitioning between systems. The beauty of Snowflake has been it’s ease of use and low-touch management, allowing you to focus on what really matters, the analytics. It’s no secret we love Snowflake for its quick Data Warehouse building (single source of truth) combined with minimal billing – we have DWs with data and storage sitting there on standby costing zero each month when not actually used, not hundreds of pounds each when not even touched!
But it came at a different cost – although you can choose Snowflake’s underlying cloud storage platform between AWS and Azure (Snowflake’s mantra is storage and compute are separate), the actual format was proprietary, so was tricky to access any other way. But they spent a couple of years grappling the Apache Iceberg format and on June 10th 2024 they released it as GA (General Availability, i.e. robustly tested production class), so all your underlying data is in a standard format.
In fact as Apache themselves confirm, Iceberg allows direct SQL querying – treating big data records as though they were SQL tables, including with engines like Spark, Trino, Flink, Presto, Hive and Impala “to safely work with the same tables, at the same time”. This includes through:
- REST: a server-side catalog that’s exposed through a REST API
- Hive Metastore: tracks namespaces and tables using a Hive metastore
- JDBC: tracks namespaces and tables in a simple JDBC database
- Nessie: a transactional catalog that tracks namespaces and tables in a database with git-like version control
Of course (haha) you only really need Iceberg for *huge* table data – but as data streaming tends to generate that over time, it can be a good idea to line it up to begin with! We still recommend sticking with standard OLTP Databases such as SQL Server and PostgreSQL for less than a Terabyte of data (so most critical systems in fact, especially when you can archive old data – including to Iceberg!) but this development with Snowflake really does set the landscape for future workings.