In conjunction with the International Parallel and Distributed Processing Symposium (IPDPS)

ParLearning 2014

The 3rd International Workshop on

Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

May 23, 2014

Phoenix, AZ, USA

In Conjunction with IPDPS 2014


Data-driven computing needs no introduction today. The case for using data for strategic advantages is exemplified by web search engines, online translation tools and many more examples. The past decade has seen 1) the emergence of multicore architectures and accelerators as GPGPUs, 2) widespread adoption of distributed computing via the map-reduce/hadoop eco-system and 3) democratization of the infrastructure for processing massive datasets ranging into petabytes by cloud computing. The complexity of the technological stack has grown to an extent where it is imperative to provide frameworks to abstract away the system architecture and orchestration of components for massive-scale processing. However, the growth in volume and heterogeneity in data seems to outpace the growth in computing power. A "collect everything" culture stimulated by cheap storage and ubiquitous sensing capabilities contribute to increasing the noise-to-signal ratio in all collected data. Thus, as soon as the data hits the processing infrastructure, determining the value of information, finding its rightful place in a knowledge representation and determining subsequent actions are of paramount importance. To use this data deluge to our advantage, a convergence between the field of Parallel and Distributed Computing and the interdisciplinary science of Artificial Intelligence seems critical. From application domains of national importance as cyber-security, health-care or smart-grid to providing real-time situational awareness via natural interface based smartphones, the fundamental AI tasks of Learning and Inference need to be enabled for large-scale computing across this broad spectrum of application domains.

Many of the prominent algorithms for learning and inference are notorious for their complexity. Adopting parallel and distributed computing appears as an obvious path forward, but the mileage varies depending on how amenable the algorithms are to parallel processing and secondly, the availability of rapid prototyping capabilities with low cost of entry. The first issue represents a wider gap as we continue to think in a sequential paradigm. The second issue is increasingly recognized at the level of programming models, and building robust libraries for various machine-learning and inferencing tasks will be a natural progression. As an example, scalable versions of many prominent graph algorithms written for distributed shared memory architectures or clusters look distinctly different from the textbook versions that generations of programmers have grown with. This reformulation is difficult to accomplish for an interdisciplinary field like Artificial Intelligence for the sheer breadth of the knowledge spectrum involved. The primary motivation of the proposed workshop is to invite leading minds from AI and Parallel & Distributed Computing communities for identifying research areas that require most convergence and assess their impact on the broader technical landscape.


  • Foster collaboration between HPC community and AI community
    • Applying HPC techniques for learning problems
    • Identifying HPC challenges from learning and inference
  • Explore a critical emerging area with strong academia and industry interest
  • Great opportunity for researchers worldwide for collaborating with Academia and Industry


  • 8:20-8:30 Open Remark
  • 8:30-9:30 Keynote 1
    • Professor Eric Xing (CMU), On The Algorithmic and System Interface of BIG LEARNING
  • 9:30-10:10 Session 1
    • Large Scale Deep Learning On Xeon Phi, Lei Jin; Yihua Huang
    • YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark, Hongjian Qiu; Rong Gu; Chunfeng Yuan; Yihua Huang
  • 10:10-10:30 Coffee break
  • 10:30-12:00 Session 1 (Cont')
    • Wait-Free Primitives for Initializing Bayesian Network Structure Learning on Multicore Processors, Hsuan-Yi Chu; Yinglong Xia; Anand Panangadan; Viktor K. Prasanna
    • gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles, Karl Jansson; Håkan Sundell; Henrik Boström
    • Parallel Bayesian Network Modelling for Pervasive Health Monitoring System, Xiujuan Qian; Yongli Wang; Xiaohui Jiang
  • 12:00-1:00 Lunch Time
  • 1:00-2:00 Keynote 2
      Dr. Simon Kahan, University of Washington, TITLE: Grappa: chaos, order, and easier cluster computing
  • 2:00-3:00 Session 2
    • Portfolio-based Selection of Robust Dynamic Loop Scheduling Algorithms Using Machine Learning, Nitin Sukhija; Brandon Malone; Srishti Srivastava; Ioana Banicescu; Ciorba Florina Monica
    • A 3D Streaming Scheme for Fly-Through in Large-scale P2P DVEs, Guisong Yang; Wei Wang; Naixue Xiong; Xingyu He
  • 3:00-3:30 Coffee break
  • 3:30-5:00 Session 3
    • Large Scale Discriminative Metric Learning,Peter Kirchner; Berthold Reinwald; Matthias Boehm; Daby Sow; Michael Schmidt; Deepak S. Turaga; Alain Biem
    • The Empirical Research of Virtual Enterprise Knowledge Transfer's Effectiveness Faced to The Independent Innovation Ability, Bo Yang
    • A Distributed Speech Algorithm for Large Scale Data Communication Systems, Naixue Xiong; Guoxiang Tong; Jian Tan; Fangfang Lv
  • 5:00-5:05 Closing remark


Professor Eric P. Xing
Machine Learning Department & Language Technology Institute & Computer Science Department, School of Computer Science, Carnegie Mellon University (CMU)

Title: On The Algorithmic and System Interface of BIG LEARNING

Abstract: In many modern applications built on massive data and using high-dimensional models, such as web-scale content extraction via topic models, genome-wide association mapping via sparse regression, and image understanding via deep neural network, one needs to handle BIG machine learning problems that threaten to exceed the limit of current infrastructures and algorithms. While ML community continues to strive for new scalable algorithms, and several attempts on developing new system architectures for BIG ML have emerged to address the challenge on the backend, good dialogs between ML and system remain difficult --- most algorithmic research remain disconnected from the real system/data they are to face; and the generality, programmability, and theoretical guarantee of most systems on ML programs remain largely unclear. In this talk, I will present Petuum -- a general-purpose framework for distributed machine learning, and demonstrate how innovations in scalable algorithms and distributed systems design work in concert to achieve multiple orders of magnitude of scalability on a modest cluster for a wide range of large scale problems in social network (mixed-membership inference on 100M node), personalized genome medicine (sparse regression on 100M dimensions), and computer vision (classification over 20K labels), with provable guarantee on correctness of distributed inference.

Bio: Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, data mining, vision. Professor Xing has published over 150 peer-reviewed papers, and is an associate editor of the Journal of the American Statistical Association, Annals of Applied Statistics, the IEEE Transactions of Pattern Analysis and Machine Intelligence, the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning journal. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, the United States Air Force Young Investigator Award, and the IBM Open Collaborative Research Faculty Award.

Dr. Simon Kahan
University of Washington

Title: Grappa: chaos, order, and easier cluster computing

Abstract: Systems demand chaotic parallelism while components demand order. Graceful transformations between the two is necessary for high performance. Grappa performs these transformations. Grappa is a new latency-tolerant runtime system for distributed-memory commodity clusters that provides a shared-memory programming model for in-memory computation similar to what TBB and Cilk provide on single node platforms. Grappa implementations of map/reduce, the GraphLab API, and a Raco backend show promising performance in comparison to the specialized platforms Spark, GraphLab, and Shark, respectively. In addition, Grappa supports general computation, including complex irregular applications that have poor locality and data-dependent load distribution. Source is available for download from github.

Bio: Simon Kahan holds affiliate positions in the Computer Science and Engineering department at the University of Washington, at the Institute for Systems Biology, and at the Northwest Institute for Advanced Computing. He has held positions as a research scientist at the Pacific Northwest National Laboratory, senior member of technical staff at Google, and principal engineer at Cray Inc. He received his PhD in Computer Science from the University of Washington in 1991 and BS and MS degrees in Electrical Engineering from UC Berkeley in 1983 and 1985.


Authors are invited to submit manuscripts of original unpublished research that demonstrate a strong interplay between parallel/distributed computing techniques and learning/inference applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:

  • Learning and inference using large scale Bayesian Networks
  • Large scale inference algorithms using parallel TPIC models, clustering and SVM etc.
  • Parallel natural language processing (NLP).
  • Semantic inference for disambiguation of content on web or social media
  • Discovering and searching for patterns in audio or video content
  • On-line analytics for streaming text and multimedia content
  • Comparison of various HPC infrastructures for learning
  • Large scale learning applications in search engine and social networks
  • Distributed machine learning tools (e.g., Mahout and IBM parallel tool)
  • Real-time solutions for learning algorithms on parallel platforms

More detail. PDF version


    Workshop Paper Due

    December 30, 2013(Extended to: January 10, 2014)

    Author Notification

    February 27, 2014

    Camera-ready Paper Due


Submitted manuscripts may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. More format requirements will be posted on the IPDPS web page ( shortly after the author notification Authors can purchase up to 2 additional pages for camera-ready papers after acceptance. Please find details on Students with accepted papers have a chance to apply for a travel award. Please find details at

Submit your paper using EDAS portal for ParLearning:

Camera-ready paper should be submitted to the IEEEConfPublishing Portal. See instructions on the IPDPS webpage


All papers accepted by the workshop will be included in the proceedings of the IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), indexed in EI and possibly in SCI.

Accepted papers with proper extension will be recommended to publish in the Journal of Parallel & Cloud Computing (PCC) and Journal of Internet Technology (Special Issue on “Security and Privacy in Cloud Network Environments”), indexed by SCI-E.


General Co-chairs:

    Abhinav Vishnu, Pacific Northwest National Laboratory, USA

    Yinglong Xia, IBM T.J. Watson Research Center, USA

Publicity Co-chairs:

    George Chin, Pacific Northwest National Laboratory, USA

    Hoang Le, Sandia National Laboratories, USA

Program Committee:

    Co-Chair: Yihua Huang, Nanjing Universtiy, China

    Co-Chair: Naixue Xiong, Colorado Technical University, USA

    Vice co-chair: Makoto Takizawa, Hosei University, Japan

    Vice co-chair: Ching-Hsien (Robert) Hsu, Chung Hua University, Taiwan

    Vice co-chair: Jong Hyuk Park, Kyungnam University, Korea

    Vice co-chair: Sajid Hussain, Nashville, Tennessee, USA

    Haimonti Dutta, Columbia University, USA

    Jieyue He, Southeast University, China

    Sutanay Choudhury, Pacific Northwest National Laboratory, USA

    Yi Wang, Tecent Holding Lt., China

    Zhijun Fang, Jiangxi University of Finance and Economics, China

    Wenlin Han, University of Alabama, USA

    Wan Jian, Hangzhou Dianzi University, China

    Daniel W. Sun, NICTA, Australia

    Danny Bickson, GraphLab Inc., USA

    Virendra C. Bhavsar, University of New Brunswick, Canada

    Zhihui Du, Tsinghua University, China

    Ichitaro Yamazaki, University of Tennessee, Knoxville, USA

    Gwo Giun (Chris) Lee, National Cheng Kung University, Taiwan

    Lawrence Holder, Washington State University, USA

    Vinod Tipparaju, AMD, USA

    Nishkam Ravi, NEC Labs, USA

    Renato Porfirio Ishii, Federal University of Mato Grosso do Sul (UFMS), Brazil


Title PaperID
Portfolio-based Selection of Robust Dynamic Loop Scheduling Algorithms Using Machine Learning Parlearning01
Wait-Free Primitives for Initializing Bayesian Network Structure Learning on Multicore Processors Parlearning02
gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles Parlearning03
Large Scale Deep Learning On Xeon Phi Parlearning04
A 3D Streaming Scheme for Fly-Through in Large-scale P2P DVEs Parlearning05
YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark Parlearning06
Parallel Bayesian Network Modelling for Pervasive Health Monitoring System Parlearning07
The Empirical Research of Virtual Enterprise Knowledge Transfer's Effectiveness Faced to The Independent Innovation Ability Parlearning08
Large Scale Discriminative Metric Learning Parlearning09
A Distributed Speech Algorithm for Large Scale Data Communication Systems Parlearning10


Should you have any questions regarding the workshop or this webpage, please contact parlearning ~AT~ googlegroups DOT com.