Research

[Not updated. Please see About/CV for the latest.]

Publications

Peer-reviewed conference articles

What’s in a Question: Using Visual Questions as a Form of Supervision, S Ganju, O Russakovsky, A Gupta, Computer Vision and Pattern Recognition, 2017 (spotlight). [Arxiv] [bib] [Github] [Project Page]

CERN Research

Zenodo 2015 Evaluation of Apache Spark as an Analytics framework for CERN’s Big Data Analytics, S Ganju, V Kuznetsov, T Wildish, M Martin Marquez, A Romero Marin,10.5281/zenodo.3186 2015.

[bib] [Github]

Select Projects

Open Advancement of Question Answering Consortium

Developed Question-Answering systems based on an ensemble of Deep Learning and Rule-based systems. Other students followed up with our work on Dependency Traversal RNN’s which resulted in #3 on the SQuAD leaderboard! Congratulations!

Mentor: Eric Nyberg, Matthias Grabmair

What’s in the Future: Generating Videos with Motion Sensitive Adversarial Networks (Course Project, Deep Learning-10807)

Used optical flow and GAN’s to generate future frames using our FlowGAN architecture. Transferred the learned representations for Action Recognition and Static Image Editing.

Sample generation:

FlowGAN FlowGAN

Code and more on Github

Request for Research, OpenAI

Jokes Entity Recognition (JER): Collected 16031 joke-urls licensed under fair use of data. Trained a character-level LSTM language model on collected data and developed JER

Atom Smashing using Machine Learning at CERN (CERN Openlab research project)

Used Apache Spark to streamline different predictive prototypes by gathering information from CMS, ran predictive models and proposed datasets which will become popular over time. Evaluated quality of individual models, performed component analysis and selected best predictive model for new set of data.

Presented at Strata+Hadoop World 2016, San Jose, USA

O’Reilly Blog post I about my talk, also featured on their data newsletter.

O’Reilly Blog post II about my talk, also featured on their data newsletter.

See here for news coverage and Twitter feed.

Automated Pipeline for Machine Learning Problems

Created a Python command line toolkit using scikit-learn, numpy, pandas and matplotlib libraries to solve machine learning problems automatically. Imputation and hyper parameteric optimization placed my model among the top 10% of the Titanic kaggle.com challenge (Rank 198 out of 2035 in July 2014). Experimented with large data sets and deployed on Hadoop cluster over AWS.

Mentor: Anirudh Koul, Data Scientist, Microsoft

Presented at Grace Hopper 2015

News coverage