San Francisco, California, United States
Contact Info
3K followers
500+ connections
Experience & Education
Publications
-
Accelerating Deep Learning Inference via Learned Caches
HotClouds, USENIX ATC
Over the last few years, Deep Neural Networks (DNNs)have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid…
Over the last few years, Deep Neural Networks (DNNs)have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% ofCIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design.
-
Accelerating Deep Learning Inference via Learned Caches
Under Submission
Deep Neural Networks (DNNs) are witnessing in-creased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks,posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.We observe that caching hidden layer outputs of…
Deep Neural Networks (DNNs) are witnessing in-creased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks,posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches- caches that consist of simple ML models that are continuously updated.We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69×on realistic workloads.
-
Can Adversarial Weight Perturbations Inject Neural Backdoors?
CIKM 2020
Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an “adversarial perturbation” has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of “adversarial perturbations” to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a…
Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an “adversarial perturbation” has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of “adversarial perturbations” to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within anℓ∞norm around the original model weights.We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injectedwith a very small average relative change in model weight values for several applications.
-
Doing More by Doing Less: How structured partial backpropagation improves Deep Learning clusters
DistributedML, CoNEXT 2021
Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models
in a distributed fashion. Training is resource-intensive, consuming
significant compute, memory, and network resources. Many prior
works explore how to reduce training resource footprint without
impacting quality, but their focus on a subset of the bottlenecks
(typically only the network) limits their ability to improve overall
cluster utilization. In…Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models
in a distributed fashion. Training is resource-intensive, consuming
significant compute, memory, and network resources. Many prior
works explore how to reduce training resource footprint without
impacting quality, but their focus on a subset of the bottlenecks
(typically only the network) limits their ability to improve overall
cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial
Backpropagation(SPB), a technique that systematically controls
the amount of backpropagation at individual workers in distributed
training. This simultaneously reduces network bandwidth, compute
utilization, and memory footprint while preserving model quality.
To efficiently leverage the benefits of SPB at cluster level, we introduce Jigsaw, a SPB aware scheduler, which does scheduling at the
iteration level for Deep Learning Training(DLT) jobs -
MA-DST: Multi-Attention-Based Scalable Dialog State Tracking
AAAI 2020, NeuRIPS 2020
Task oriented dialog agents provide a natural language inter-face for users to complete their goal. Dialog State Tracking(DST), which is often a core component of these systems,tracks the system’s understanding of the user’s goal through-out the conversation. To enable accurate multi-domain DST,the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context,including long-range cross-domain references. We introduce a novel…
Task oriented dialog agents provide a natural language inter-face for users to complete their goal. Dialog State Tracking(DST), which is often a core component of these systems,tracks the system’s understanding of the user’s goal through-out the conversation. To enable accurate multi-domain DST,the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context,including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to re-solve cross-domain co-references. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.
-
MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
LREC 2020
MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful and one of the largest resources of its kind to-date, MultiWOZ 2.0 has a few shortcomings. Firstly, there is substantial noise in the dialogue state annotations and dialogue utterances which negatively impact the performance of state-tracking models. Secondly, follow-up work (Lee et al., 2019) has augmented the…
MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful and one of the largest resources of its kind to-date, MultiWOZ 2.0 has a few shortcomings. Firstly, there is substantial noise in the dialogue state annotations and dialogue utterances which negatively impact the performance of state-tracking models. Secondly, follow-up work (Lee et al., 2019) has augmented the original dataset with user dialogue acts. This leads to multiple co-existent versions of the same dataset with minor modifications. In this work we tackle the aforementioned issues by introducing MultiWOZ 2.1. To fix the noisy state annotations, we use crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset. This correction process results in changes to over 32% of state annotations across 40% of the dialogue turns. In addition, we fix 146 dialogue utterances by canonicalizing slot values in the utterances to the values in the dataset ontology. To address the second problem, we combined the contributions of the follow-up works into MultiWOZ 2.1. Hence, our dataset also includes user dialogue acts as well as multiple slot descriptions per dialogue state slot. We then benchmark a number of state-of-the-art dialogue state tracking models on the MultiWOZ 2.1 dataset and show the joint state tracking performance on the corrected state annotations. We are publicly releasing MultiWOZ 2.1 to the community, hoping that this dataset resource will allow for more effective models across various dialogue sub problems to be built in the future.
-
Translating Web Search Queries into Natural Language Questions
LREC 2018
Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments.In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query,which has the same question intent as the query.Conversion of keyword based web query into a well formed question has lots of applications in search engines, Community Question Answering (CQA) website and…
Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments.In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query,which has the same question intent as the query.Conversion of keyword based web query into a well formed question has lots of applications in search engines, Community Question Answering (CQA) website and bots communication. We found a synergy between query-to-question problem with standard machine translation (MT) task. We have used both Statistical MT (SMT) and Neural MT(NMT) models to generate the questions from query. We have observed that MT models performs well in terms of both automatic and human evaluation.
Honors & Awards
-
Travel Grant AAAI 2020
AAAI
-
Special CS Scholarship, University of Wisconsin Madison
University of Wisconsin, Madison
-
Excellence Award for Innovation
Microsoft
-
Institute Silver Medal
Indian Institute of Technology, Guwahati
-
Xerox Research Health Challenge
Xerox Research
Invited to present my work on Xerox Research Health Challenge.
-
Institute Merit Scholarship
IIT Guwahati
Awarded Institute Merit Scholarship, consequently for 2014, 2015 and 2016, for being the department topper.
-
Travel Grant MLSys Conference
UW Madison
-
Travel Grant NeuRIPS 2020
-
-
Travel Grant USENIX ATC 2019
UW Madison
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Adarsh K. in United States
-
Adarsh K.
Senior Data Analyst / AI Developer @ Yellow Parka Labs LLC |
-
Adarsh Kashyap K P
-
Adarsh K
looking for job
-
Adarsh k
--
-
Adarsh k
Student at Nagaland University
8 others named Adarsh K. in United States are on LinkedIn
See others named Adarsh K.