Takeaways
- SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 21 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.
- Ibis is the portable Python dataframe library:
- Fast local dataframes (via DuckDB by default)
- Lazy dataframe expressions
- Interactive mode for iterative data exploration
- Compose Python dataframe and SQL code
- Use the same dataframe API for 20+ backends
- Iterate locally and deploy remotely by changing a single line of code
- A Future Multi-Engine Stack
- An execution engine tailored for different data scales
- <= 1TB: DuckDB and Friends
- 1 - 10TB: Spark, Dask, Ray, etc
-
10TB: GPU-accelerated Processing
- A portable language front end (Ibis, Malloy, PRQL, or a standard transpilable SQL)
- Arrow-native API and wire transport
- An execution engine tailored for different data scales
- Open source developers get feedback after 6 or 12 months.
- Pandas is not going anywhere. Polars does not support time series data yet. Also ChatGPT knows more on Pandas than Polars.
- Essential DS skills:
- SQL
- Designing schema
- Organizing data
- Data storage, partitioning
- Columnar data format innovation to make parquet better he is thinking about.