Indrajit Roy is a Engineering Director at Databricks. At Databricks, Indrajit leads the Lakeflow engine which includes Spark Streaming, materialization, and incremental processing. 

Before that Indrajit used to lead the Napa product at Google, a peta-byte scale data warehouse for Google Ads, DeepMind, and other critical services (see our  VLDB'23 and VLDB'21 papers). Indrajit started his career at HP Labs. He was a Principal Researcher at HP Labs (2010-2016) and led the development of DistributedR, a machine learning framework for the Vertica database. Indrajit received his Ph.D. in computer science from UT Austin, under the supervision of Prof. Emmett Witchel, and his B.Tech. degree from IIT Kanpur.

Email: indrajit.roy[at]databricks[dot]com

Recent projects

Napa : Peta byte scale Google data warehouse, with a focus on materialized views.

NVThreads : Persistence for applications using non-volatile memory

DistributedR

I led the development of Distributed R, an open source HPE project that brings the benefits of parallelism to data scientists.

We released ddR, a CRAN package for Distributed Data-structures in R (HPE Blog). 

Airavat : Security and privacy for MapReduce using differential privacy

Laminar : Integrates OS and JVM techniques to implement information flow control

Professional activities

PC Member: