Indrajit Roy is a Engineering Leader at Google. At Google, Indrajit leads the Napa team which created and manages a peta-byte scale data warehouse for Google Ads, payments, and other critical services (see our VLDB'21 and VLDB'23 Napa papers). Previously, he was a Principal Researcher at HP Labs (2010-2016) where he led the development of DistributedR for the Vertica database. Indrajit received his Ph.D. in computer science from UT Austin, under the supervision of Prof. Emmett Witchel, and his B.Tech. degree from IIT Kanpur.
Email: indrajitroy[at]google[dot]com
Recent projects
Napa : Peta byte scale Google data warehouse, with a focus on materialized views.
NVThreads : Persistence for applications using non-volatile memory
I led the development of Distributed R, an open source HPE project that brings the benefits of parallelism to data scientists.
We released ddR, a CRAN package for Distributed Data-structures in R (HPE Blog).
Press: HP Blog, PCWorld, InformationWeek
Links: Distributed R product page, GitHub repository.
I co-organized the Workshop on Distributed Computing in R(2015). Blog article on the workshop
Airavat : Security and privacy for MapReduce using differential privacy
Laminar : Integrates OS and JVM techniques to implement information flow control
Professional activities
PC Member:
Others:
Inaugural member of ACM Future of Computing Academy, 2017
Co-lead of R Consortium working group on distributed computing, 2017
Press:
Article in ACM Ubiquity (2019)
HP blog1, HP blog2, Distributed R announcement (2015)
Interns:
At HP Labs, I was fortunate to work with very productive interns who also made research enjoyable.
Terry Hsu (Purdue)
Pravin Shinde (ETH)
Kyungyong Lee (UFL), Sangman Kim (UT Austin)
Feng Liu (Princeton), Erik Bodzsar (UChicago)
Shivaram Venkataraman (UC Berkeley), Michael Lee (UCSD)
Publications
Conference:
Progressive Partitioning for Parallelized Query Execution in Google's Napa. Jun Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokulnath Manoharan, Goetz Grafe, Divy Agrawal, Brad Adelberg, Shilpa Kohler, Indrajit Roy. VLDB 2023, Vancouver, Canada.
Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google. Ankur Agiwal, Kevin Lai, Gokulnath Babu Manoharan, Indrajit Roy et. al. VLDB 2021, Copenhagen, Denmark.
NVthreads: Practical Persistence for Multi-threaded Applications. Terry Ching-Hsiang Hsu, Helge Brugner, Indrajit Roy, Kimberly Keeton, Patrick Eugster. Eurosys 2017, Belgrade, Serbia.
dmapply: A Functional Primitive to Express Distributed Machine Learning Algorithms in R. Edward Ma, Vishrut Gupta, Meichun Hsu, Indrajit Roy. VLDB 2016, New Delhi, India.
Accelerating Graph Applications on Integrated GPU Platforms via Instrumentation-Driven Optimizations. Naila Farooqui, Indrajit Roy, Yuan Chen, Vanish Talwar, Karsten Schwan. Computing Frontiers (CF) 2016, Como, Italy. Best paper award.
Using Data Transformations for Low-Latency Time Series Analysis. Heggang Cui, Kimberly Keeton, Indrajit Roy, Krishnamurthy Viswanathan, Gregory R. Ganger. SOCC 2015, Hawaii, USA.
Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. Shreya Prasad, Arash Fard, Vishrut Gupta, Jorge Martinez, Jeff LeFevre, Vincent Xu, Meichun Hsu, Indrajit Roy. Sigmod 2015, Melbourne, Australia.
Evaluating Integrated Graphics Processors for Data Center Workloads.Sangman Kim, Indrajit Roy, Vanish Talwar. HotPower 2013, Farmington, USA.
Views and Transactional Storage for Large Graphs. Michael Lee, Indrajit Roy, Alvin AuYoung, Vanish Talwar, K.R. Jayaram, Yuanyuan Zhou. Middleware 2013, Beijing, China. Best paper award.
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, Rob Schreiber. Eurosys 2013, Prague, Czech Republic.
Pasture: Secure Offline Data Access Using Commodity Trusted Hardware. Ramakrishna Kotla, Tom Rodeheffer, Indrajit Roy, Patrick Stuedi, Benjamin Wester. OSDI 2012, Hollywood, USA.
Using R for Iterative and Incremental Processing. Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Rob Schreiber. HotCloud 2012, Boston, USA.
Ensuring Operating System Kernel Integrity with OSck. Owen S. Hofmann, Alan Dunn, Sangman Kim, Indrajit Roy, Emmett Witchel. ASPLOS 2011, Newport Beach, USA.
Airavat: Security and Privacy for MapReduce. Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel.NSDI 2010, San Jose, USA.
Laminar: Practical Fine-Grained Decentralized Information Flow Control. Indrajit Roy, Donald E. Porter, Michael D. Bond, Kathryn S. McKinley, Emmett Witchel. PLDI 2009, Dublin, Ireland.
How to Commit Conflicting Transactions in an STM. Hany Ramadan, Indrajit Roy, Maurice Herlihy, Emmett Witchel. PPOPP 2009, Raleigh, USA.
A Primal-Dual Resource Augmentation Analysis of a Constant Approximate Algorithm for Stable Coalitions in a Cluster.Nedialko B. Dimitrov, Indrajit Roy. SPAA 2008, Munich, Germany.
Improved Error Reporting for Software that Uses Black-Box Components. Jungwoo Ha, Christopher J. Rossbach, Jason V. Davis, Indrajit Roy, Hany E. Ramadan, Donald E. Porter, David L. Chen, Emmett Witchel. PLDI 2007, San Diego, USA.
BAR gossip. Harry C. Li, Allen Clement, Edmund L. Wong, Jeff Napper, Indrajit Roy, Lorenzo Alvisi, Michael Dahlin. OSDI 2006, Seattle, USA.
Journal:
Practical Fine-Grained Decentralized Information Flow Control Using Laminar. Donald E. Porter, Michael D. Bond, Indrajit Roy, Kathryn S. McKinley, Emmett Witchel. ACM TOPLAS, Volume 37, Issue 1, November 2014.
Dissertation:
Protecting Sensitive Information from Untrusted Code. Indrajit Roy. Ph.D. Thesis, Department of Computer Science, UT Austin. August, 2010.
Awards
Best paper awards at the international Middleware conference 2013 and Computing Frontiers 2016
Patents
7 awarded, 10+ pending