Adarsh K.

San Francisco, California, United States Contact Info
3K followers 500+ connections

Join to view profile

Experience & Education

  • Dashworks

View Adarsh’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

  • Accelerating Deep Learning Inference via Learned Caches

    HotClouds, USENIX ATC

    Over the last few years, Deep Neural Networks (DNNs)have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid…

    Over the last few years, Deep Neural Networks (DNNs)have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% ofCIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design.

    See publication
  • Accelerating Deep Learning Inference via Learned Caches

    Under Submission

    Deep Neural Networks (DNNs) are witnessing in-creased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks,posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.We observe that caching hidden layer outputs of…

    Deep Neural Networks (DNNs) are witnessing in-creased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks,posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches- caches that consist of simple ML models that are continuously updated.We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69×on realistic workloads.

    See publication
  • Can Adversarial Weight Perturbations Inject Neural Backdoors?

    CIKM 2020

    Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an “adversarial perturbation” has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of “adversarial perturbations” to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a…

    Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an “adversarial perturbation” has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of “adversarial perturbations” to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within anℓ∞norm around the original model weights.We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injectedwith a very small average relative change in model weight values for several applications.

    See publication
  • Doing More by Doing Less: How structured partial backpropagation improves Deep Learning clusters

    DistributedML, CoNEXT 2021

    Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models
    in a distributed fashion. Training is resource-intensive, consuming
    significant compute, memory, and network resources. Many prior
    works explore how to reduce training resource footprint without
    impacting quality, but their focus on a subset of the bottlenecks
    (typically only the network) limits their ability to improve overall
    cluster utilization. In…

    Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models
    in a distributed fashion. Training is resource-intensive, consuming
    significant compute, memory, and network resources. Many prior
    works explore how to reduce training resource footprint without
    impacting quality, but their focus on a subset of the bottlenecks
    (typically only the network) limits their ability to improve overall
    cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial
    Backpropagation(SPB), a technique that systematically controls
    the amount of backpropagation at individual workers in distributed
    training. This simultaneously reduces network bandwidth, compute
    utilization, and memory footprint while preserving model quality.
    To efficiently leverage the benefits of SPB at cluster level, we introduce Jigsaw, a SPB aware scheduler, which does scheduling at the
    iteration level for Deep Learning Training(DLT) jobs

    See publication
  • MA-DST: Multi-Attention-Based Scalable Dialog State Tracking

    AAAI 2020, NeuRIPS 2020

    Task oriented dialog agents provide a natural language inter-face for users to complete their goal. Dialog State Tracking(DST), which is often a core component of these systems,tracks the system’s understanding of the user’s goal through-out the conversation. To enable accurate multi-domain DST,the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context,including long-range cross-domain references. We introduce a novel…

    Task oriented dialog agents provide a natural language inter-face for users to complete their goal. Dialog State Tracking(DST), which is often a core component of these systems,tracks the system’s understanding of the user’s goal through-out the conversation. To enable accurate multi-domain DST,the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context,including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to re-solve cross-domain co-references. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.

    See publication
  • MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines

    LREC 2020

    MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful and one of the largest resources of its kind to-date, MultiWOZ 2.0 has a few shortcomings. Firstly, there is substantial noise in the dialogue state annotations and dialogue utterances which negatively impact the performance of state-tracking models. Secondly, follow-up work (Lee et al., 2019) has augmented the…

    MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful and one of the largest resources of its kind to-date, MultiWOZ 2.0 has a few shortcomings. Firstly, there is substantial noise in the dialogue state annotations and dialogue utterances which negatively impact the performance of state-tracking models. Secondly, follow-up work (Lee et al., 2019) has augmented the original dataset with user dialogue acts. This leads to multiple co-existent versions of the same dataset with minor modifications. In this work we tackle the aforementioned issues by introducing MultiWOZ 2.1. To fix the noisy state annotations, we use crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset. This correction process results in changes to over 32% of state annotations across 40% of the dialogue turns. In addition, we fix 146 dialogue utterances by canonicalizing slot values in the utterances to the values in the dataset ontology. To address the second problem, we combined the contributions of the follow-up works into MultiWOZ 2.1. Hence, our dataset also includes user dialogue acts as well as multiple slot descriptions per dialogue state slot. We then benchmark a number of state-of-the-art dialogue state tracking models on the MultiWOZ 2.1 dataset and show the joint state tracking performance on the corrected state annotations. We are publicly releasing MultiWOZ 2.1 to the community, hoping that this dataset resource will allow for more effective models across various dialogue sub problems to be built in the future.

    See publication
  • Translating Web Search Queries into Natural Language Questions

    LREC 2018

    Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments.In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query,which has the same question intent as the query.Conversion of keyword based web query into a well formed question has lots of applications in search engines, Community Question Answering (CQA) website and…

    Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments.In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query,which has the same question intent as the query.Conversion of keyword based web query into a well formed question has lots of applications in search engines, Community Question Answering (CQA) website and bots communication. We found a synergy between query-to-question problem with standard machine translation (MT) task. We have used both Statistical MT (SMT) and Neural MT(NMT) models to generate the questions from query. We have observed that MT models performs well in terms of both automatic and human evaluation.

    See publication

Honors & Awards

  • Travel Grant AAAI 2020

    AAAI

  • Special CS Scholarship, University of Wisconsin Madison

    University of Wisconsin, Madison

  • Excellence Award for Innovation

    Microsoft

  • Institute Silver Medal

    Indian Institute of Technology, Guwahati

  • Xerox Research Health Challenge

    Xerox Research

    Invited to present my work on Xerox Research Health Challenge.

  • Institute Merit Scholarship

    IIT Guwahati

    Awarded Institute Merit Scholarship, consequently for 2014, 2015 and 2016, for being the department topper.

  • Travel Grant MLSys Conference

    UW Madison

  • Travel Grant NeuRIPS 2020

    -

  • Travel Grant USENIX ATC 2019

    UW Madison

View Adarsh’s full profile

  • See who you know in common
  • Get introduced
  • Contact Adarsh directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Adarsh K. in United States