Homepage

Indrajit Roy is a Engineering Director at Databricks. At Databricks, Indrajit leads the Lakeflow engine which includes Spark Structured Streaming, materialization, and incremental processing.

Before that Indrajit used to lead the Napa product at Google, a peta-byte scale data warehouse for Google Ads, DeepMind, and other critical services (see our VLDB'23 and VLDB'21 papers). Indrajit started his career at HP Labs. He was a Principal Researcher at HP Labs (2010-2016) and led the development of DistributedR, a machine learning framework for the Vertica database. Indrajit received his Ph.D. in computer science from UT Austin, under the supervision of Prof. Emmett Witchel, and his B.Tech. degree from IIT Kanpur.

Email: indrajit.roy[at]databricks[dot]com

Recent projects

Spark Structured Streaming: One of the most widely used streaming infrastructure

Press: Databricks blog post

Enzyme: Incremental view maintenance in Spark Declarative Pipelines

Napa : Peta byte scale Google data warehouse, with a focus on materialized views.

Press: Google Research blog post

NVThreads : Persistence for applications using non-volatile memory

DistributedR

I led the development of Distributed R, an open source HPE project that brings the benefits of parallelism to data scientists.

We released ddR, a CRAN package for Distributed Data-structures in R (HPE Blog).

Press: HP Blog, PCWorld, InformationWeek
Links: Distributed R product page, GitHub repository.
I co-organized the Workshop on Distributed Computing in R(2015). Blog article on the workshop

Airavat : Security and privacy for MapReduce using differential privacy

Laminar : Integrates OS and JVM techniques to implement information flow control

Professional activities

PC Member:

Eurosys 2025
Sigmod 2023 (industrial track)
SOCC 2022
SOCC 2021, Middleware 2021 (industrial track),
Middleware 2020 (industrial track), SOCC 2018
Middleware 2017 (PC co-chair, industrial track), WWW 2017
HotStorage 2016, Eurosys 2016, OOPSLA 2015
IEEE BigData 2016/2014/2013, IEEE Cloud 2013/ 2012, ICNP 2012

Others:

Inaugural member of ACM Future of Computing Academy, 2017
Co-lead of R Consortium working group on distributed computing, 2017

Press:

Article in ACM Ubiquity (2019)
HP blog1, HP blog2, Distributed R announcement (2015)

Interns:

At HP Labs, I was fortunate to work with very productive interns who also made research enjoyable.

Terry Hsu (Purdue)
Pravin Shinde (ETH)
Kyungyong Lee (UFL), Sangman Kim (UT Austin)
Feng Liu (Princeton), Erik Bodzsar (UChicago)
Shivaram Venkataraman (UC Berkeley), Michael Lee (UCSD)

Publications

Conference:

Enzyme: Incremental View Maintenance With Spark Declarative Pipelines
Ritwik Yadav, Supun Abeysinghe, Min Yang, Jeffrey Helt, Manuel Ung, Yuhong Chen, Melody Hu, William Wei, Yiming Yang, Sourav Chatterji, Indrajit Roy, Tahir Fayyaz, Paul Lappas, Bilal Aslam, Ross Bunker, Yannis Papakonstantinou, Michael Armbrust, Shrikanth Shankar. Sigmod 2026, Bangalore, India
Progressive Partitioning for Parallelized Query Execution in Google's Napa. Jun Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokulnath Manoharan, Goetz Grafe, Divy Agrawal, Brad Adelberg, Shilpa Kohler, Indrajit Roy. VLDB 2023, Vancouver, Canada.
Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google. Ankur Agiwal, Kevin Lai, Gokulnath Babu Manoharan, Indrajit Roy et. al. VLDB 2021, Copenhagen, Denmark.
NVthreads: Practical Persistence for Multi-threaded Applications. Terry Ching-Hsiang Hsu, Helge Brugner, Indrajit Roy, Kimberly Keeton, Patrick Eugster. Eurosys 2017, Belgrade, Serbia.
dmapply: A Functional Primitive to Express Distributed Machine Learning Algorithms in R. Edward Ma, Vishrut Gupta, Meichun Hsu, Indrajit Roy. VLDB 2016, New Delhi, India.
Accelerating Graph Applications on Integrated GPU Platforms via Instrumentation-Driven Optimizations. Naila Farooqui, Indrajit Roy, Yuan Chen, Vanish Talwar, Karsten Schwan. Computing Frontiers (CF) 2016, Como, Italy. Best paper award.
Using Data Transformations for Low-Latency Time Series Analysis. Heggang Cui, Kimberly Keeton, Indrajit Roy, Krishnamurthy Viswanathan, Gregory R. Ganger. SOCC 2015, Hawaii, USA.
Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. Shreya Prasad, Arash Fard, Vishrut Gupta, Jorge Martinez, Jeff LeFevre, Vincent Xu, Meichun Hsu, Indrajit Roy. Sigmod 2015, Melbourne, Australia.
Evaluating Integrated Graphics Processors for Data Center Workloads.Sangman Kim, Indrajit Roy, Vanish Talwar. HotPower 2013, Farmington, USA.
Views and Transactional Storage for Large Graphs. Michael Lee, Indrajit Roy, Alvin AuYoung, Vanish Talwar, K.R. Jayaram, Yuanyuan Zhou. Middleware 2013, Beijing, China. Best paper award.
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, Rob Schreiber. Eurosys 2013, Prague, Czech Republic.
Pasture: Secure Offline Data Access Using Commodity Trusted Hardware. Ramakrishna Kotla, Tom Rodeheffer, Indrajit Roy, Patrick Stuedi, Benjamin Wester. OSDI 2012, Hollywood, USA.
Using R for Iterative and Incremental Processing. Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Rob Schreiber. HotCloud 2012, Boston, USA.
Ensuring Operating System Kernel Integrity with OSck. Owen S. Hofmann, Alan Dunn, Sangman Kim, Indrajit Roy, Emmett Witchel. ASPLOS 2011, Newport Beach, USA.
Airavat: Security and Privacy for MapReduce. Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel.NSDI 2010, San Jose, USA.
Laminar: Practical Fine-Grained Decentralized Information Flow Control. Indrajit Roy, Donald E. Porter, Michael D. Bond, Kathryn S. McKinley, Emmett Witchel. PLDI 2009, Dublin, Ireland.
How to Commit Conflicting Transactions in an STM. Hany Ramadan, Indrajit Roy, Maurice Herlihy, Emmett Witchel. PPOPP 2009, Raleigh, USA.
A Primal-Dual Resource Augmentation Analysis of a Constant Approximate Algorithm for Stable Coalitions in a Cluster.Nedialko B. Dimitrov, Indrajit Roy. SPAA 2008, Munich, Germany.
Improved Error Reporting for Software that Uses Black-Box Components. Jungwoo Ha, Christopher J. Rossbach, Jason V. Davis, Indrajit Roy, Hany E. Ramadan, Donald E. Porter, David L. Chen, Emmett Witchel. PLDI 2007, San Diego, USA.
BAR gossip. Harry C. Li, Allen Clement, Edmund L. Wong, Jeff Napper, Indrajit Roy, Lorenzo Alvisi, Michael Dahlin. OSDI 2006, Seattle, USA.

Journal:

Practical Fine-Grained Decentralized Information Flow Control Using Laminar. Donald E. Porter, Michael D. Bond, Indrajit Roy, Kathryn S. McKinley, Emmett Witchel. ACM TOPLAS, Volume 37, Issue 1, November 2014.

Dissertation:

Protecting Sensitive Information from Untrusted Code. Indrajit Roy. Ph.D. Thesis, Department of Computer Science, UT Austin. August, 2010.

Awards

Best paper awards at the international Middleware conference 2013 and Computing Frontiers 2016

Patents

7 awarded, 10+ pending