skip to main content
research-article
Open access

Deploying Analytics with the Portable Format for Analytics (PFA)

Published: 13 August 2016 Publication History
  • Get Citation Alerts
  • Abstract

    We introduce a new language for deploying analytic models into products, services and operational systems called the Portable Format for Analytics (PFA). PFA is an example of what is sometimes called a model interchange format, a language for describing analytic models that is independent of specific tools, applications or systems. Model interchange formats allow one application (the model producer) to export models and another application (the model consumer or scoring engine) to import models. The core idea behind PFA is to support the safe execution of statistical functions, mathematical functions, and machine learning algorithms and their compositions within a safe execution environment. With this approach, the common analytic models used in data science can be implemented, as well as the data transformations and data aggregations required for pre- and post-processing data. PFA compliant scoring engines can be extended by adding new user defined functions described in PFA. We describe the design of PFA. A Data Mining Group (DMG) Working Group is developing the PFA standard. The current version is 0.8.1 and contains many of the commonly used statistical and machine learning models, including regression, clustering, support vector machines, neural networks, etc. We also describe two implementations of Hadrian, one in Scala and one in Python. We discuss four case studies that use PFA and Hadrian to specify analytic models, including two that are deployed in operations at client sites.

    References

    [1]
    Data Mining Group, "Predictive Model Markup Language (PMML)," www.dmg.org.
    [2]
    R. Grossman, S. Bailey, A. Ramu, B. Malhi, P. Hallstrom, I. Pulleyn, and X. Qin, "The management and mining of multiple predictive models using the predictive modeling markup language," Information and Software Technology, vol. 41, no. 9, pp. 589--595, 1999.
    [3]
    R. L. Grossman, M. Hornick, and G. Mayer, "Data mining standards initiatives," Communications of the ACM, vol. 45, no. 8, pp. 59--61, 2002.
    [4]
    A. Guazzelli, W.-C. Lin, and T. Jena, PMML in action: unleashing the power of open standards for data mining and predictive analytics. CreateSpace, 2012.
    [5]
    T. White, Hadoop: The Definitive Guide, 4th Edition. O'Reilly Media, Inc., 2015.
    [6]
    S. T. Allen, M. Jankowski, and P. Pathirana, Storm Applied: Strategies for real-time event processing. Manning Publications Co., 2015.
    [7]
    Data Mining Group, "Portable Format for Analytics (PFA)," www.dmg.org.
    [8]
    C. E. Rasmussen, "Gaussian processes for machine learning," 2006.

    Cited By

    View all
    • (2023)Cost-Efficient Sharing Algorithms for DNN Model Serving in Mobile Edge NetworksIEEE Transactions on Services Computing10.1109/TSC.2023.324704916:4(2517-2531)Online publication date: 1-Jul-2023
    • (2023)Enabling Machine Learning in Software Architecture Frameworks2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)10.1109/CAIN58948.2023.00021(92-93)Online publication date: May-2023
    • (2023)The pipeline for the continuous development of artificial intelligence models—Current state of research and practiceJournal of Systems and Software10.1016/j.jss.2023.111615199:COnline publication date: 22-Mar-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2016
    2176 pages
    ISBN:9781450342322
    DOI:10.1145/2939672
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. PFA
    2. PMML
    3. deploying analytics
    4. model producers
    5. portable format for analytics
    6. scoring engines

    Qualifiers

    • Research-article

    Conference

    KDD '16
    Sponsor:

    Acceptance Rates

    KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)4

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Cost-Efficient Sharing Algorithms for DNN Model Serving in Mobile Edge NetworksIEEE Transactions on Services Computing10.1109/TSC.2023.324704916:4(2517-2531)Online publication date: 1-Jul-2023
    • (2023)Enabling Machine Learning in Software Architecture Frameworks2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)10.1109/CAIN58948.2023.00021(92-93)Online publication date: May-2023
    • (2023)The pipeline for the continuous development of artificial intelligence models—Current state of research and practiceJournal of Systems and Software10.1016/j.jss.2023.111615199:COnline publication date: 22-Mar-2023
    • (2023)Orfeon: An AIOps framework for the goal-driven operationalization of distributed analytical pipelinesFuture Generation Computer Systems10.1016/j.future.2022.10.008140(18-35)Online publication date: Mar-2023
    • (2022)Pangea: An MLOps Tool for Automatically Generating Infrastructure and Deploying Analytic Pipelines in Edge, Fog and Cloud LayersSensors10.3390/s2212442522:12(4425)Online publication date: 11-Jun-2022
    • (2022)Extracting enhanced artificial intelligence model metadata from software repositoriesEmpirical Software Engineering10.1007/s10664-022-10206-627:7Online publication date: 1-Dec-2022
    • (2022)A model-driven approach to machine learning and software modeling for the IoTSoftware and Systems Modeling10.1007/s10270-021-00967-x21:3(987-1014)Online publication date: 19-Jan-2022
    • (2022)Comparative Analysis of Open Standards for Machine Learning Model DeploymentsICT Systems and Sustainability10.1007/978-981-16-5987-4_51(499-507)Online publication date: 4-Jan-2022
    • (2022)Objective Tests in Automated Grading of Computer Science Courses: An OverviewHandbook on Intelligent Techniques in the Educational Process10.1007/978-3-031-04662-9_12(239-268)Online publication date: 16-Jun-2022
    • (2021)Integrating PMML and PFA with Asset Administration Shell for Interoperable Smart FactoriesJournal of the Korean Institute of Industrial Engineers10.7232/JKIIE.2021.47.3.24247:3(242-254)Online publication date: 15-Jun-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media