ParLearning 2014
The 3rd International Workshop on
Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics
May 23, 2014
Phoenix, AZ, USA
In Conjunction with IPDPS 2014
Note: upcoming ParLearning workshop (2016) webpage
Data-driven computing needs no introduction today. The case for using data for strategic advantages is exemplified by web search engines, online translation tools and many more examples. The past decade has seen 1) the emergence of multicore architectures and accelerators as GPGPUs, 2) widespread adoption of distributed computing via the map-reduce/hadoop eco-system and 3) democratization of the infrastructure for processing massive datasets ranging into petabytes by cloud computing.
The complexity of the technological stack has grown to an extent where it is imperative to provide frameworks to abstract away the system architecture and orchestration of components for massive-scale processing. However, the growth in volume and heterogeneity in data seems to outpace the growth in computing power. A "collect everything" culture stimulated by cheap storage and ubiquitous sensing capabilities contribute to increasing the noise-to-signal ratio in all collected data. Thus, as soon as the data hits
the processing infrastructure, determining the value of information, finding its rightful place in a knowledge representation and determining subsequent actions are of paramount importance. To use this data deluge to our advantage, a convergence between the field of Parallel and Distributed Computing and the interdisciplinary science of Artificial Intelligence seems critical. From application domains of national importance as cyber-security, health-care or smart-grid to providing real-time situational awareness
via natural interface based smartphones, the fundamental AI tasks of Learning and Inference need to be enabled for large-scale computing across this broad spectrum of application domains.
Many of the prominent algorithms for learning and inference are notorious for their complexity. Adopting parallel and distributed computing appears as an obvious path forward, but the mileage varies depending on how amenable the algorithms are to parallel processing and secondly, the availability of rapid prototyping capabilities with low cost of entry. The first issue represents a wider gap as we continue to think in a sequential paradigm. The second issue is increasingly recognized
at the level of programming models, and building robust libraries for various machine-learning and inferencing tasks will be a natural progression. As an example, scalable versions of many prominent graph algorithms written for distributed shared memory architectures or clusters look distinctly different from the textbook versions that generations of programmers have grown with. This reformulation is difficult to accomplish for an interdisciplinary field like Artificial Intelligence for the sheer breadth of the
knowledge spectrum involved. The primary motivation of the proposed workshop is to invite leading minds from AI and Parallel & Distributed Computing communities for identifying research areas that require most convergence and assess their impact on the broader technical landscape.
HIGHLIGHTS
-
Foster collaboration between HPC community and AI community
-
Explore a critical emerging area with strong academia and industry interest
-
Great opportunity for researchers worldwide for collaborating with Academia and Industry
ADVANCED PROGRAM
- 8:20-8:30 Open Remark
- 8:30-9:30 Keynote 1
- Professor Eric Xing (CMU), On The Algorithmic and System Interface of BIG LEARNING
- 9:30-10:10 Session 1
- Large Scale Deep Learning On Xeon Phi, Lei Jin; Yihua Huang
- YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark, Hongjian Qiu; Rong Gu; Chunfeng Yuan; Yihua Huang
- 10:10-10:30 Coffee break
- 10:30-12:00 Session 1 (Cont')
- Wait-Free Primitives for Initializing Bayesian Network Structure Learning on Multicore Processors, Hsuan-Yi Chu; Yinglong Xia; Anand Panangadan; Viktor K. Prasanna
- gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles, Karl Jansson; Håkan Sundell; Henrik Boström
- Parallel Bayesian Network Modelling for Pervasive Health Monitoring System, Xiujuan Qian; Yongli Wang; Xiaohui Jiang
- 12:00-1:00 Lunch Time
- 1:00-2:00 Keynote 2
Dr. Simon Kahan, University of Washington, TITLE: Grappa: chaos, order, and easier cluster computing
- 2:00-3:00 Session 2
- Portfolio-based Selection of Robust Dynamic Loop Scheduling Algorithms Using Machine Learning, Nitin Sukhija; Brandon Malone; Srishti Srivastava; Ioana Banicescu; Ciorba Florina Monica
- A 3D Streaming Scheme for Fly-Through in Large-scale P2P DVEs, Guisong Yang; Wei Wang; Naixue Xiong; Xingyu He
- 3:00-3:30 Coffee break
- 3:30-5:00 Session 3
- Large Scale Discriminative Metric Learning,Peter Kirchner; Berthold Reinwald; Matthias Boehm; Daby Sow; Michael Schmidt; Deepak S. Turaga; Alain Biem
- The Empirical Research of Virtual Enterprise Knowledge Transfer's Effectiveness Faced to The Independent Innovation Ability, Bo Yang
- A Distributed Speech Algorithm for Large Scale Data Communication Systems, Naixue Xiong; Guoxiang Tong; Jian Tan; Fangfang Lv
- 5:00-5:05 Closing remark
KEYNOTE SPEAKER
Professor Eric P. Xing
Machine Learning Department & Language Technology Institute & Computer Science Department, School of Computer Science,
Carnegie Mellon University (CMU)
Title: On The Algorithmic and System Interface of BIG LEARNING
Abstract: In many modern applications built on massive data and using high-dimensional models, such as web-scale content extraction via topic models, genome-wide association mapping via sparse regression, and image understanding via deep neural network, one needs to handle BIG machine learning problems that threaten to exceed the limit of current infrastructures and algorithms. While ML community continues to strive for new scalable algorithms, and several attempts on developing new system architectures for BIG ML have emerged to address the challenge on the backend, good dialogs between ML and system remain difficult --- most algorithmic research remain disconnected from the real system/data they are to face; and the generality, programmability, and theoretical guarantee of most systems on ML programs remain largely unclear. In this talk, I will present Petuum -- a general-purpose framework for distributed machine learning, and demonstrate how innovations in scalable algorithms and distributed systems design work in concert to achieve multiple orders of magnitude of scalability on a modest cluster for a wide range of large scale problems in social network (mixed-membership inference on 100M node), personalized genome medicine (sparse regression on 100M dimensions), and computer vision (classification over 20K labels), with provable guarantee on correctness of distributed inference.
Bio: Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, data mining, vision. Professor Xing has published over 150 peer-reviewed papers, and is an associate editor of the Journal of the American Statistical Association, Annals of Applied Statistics, the IEEE Transactions of Pattern Analysis and Machine Intelligence, the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning journal. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, the United States Air Force Young Investigator Award, and the IBM Open Collaborative Research Faculty Award.
Dr. Simon Kahan
University of Washington
Title: Grappa: chaos, order, and easier cluster computing
Abstract: Systems demand chaotic parallelism while components demand order. Graceful transformations between the two is necessary for high
performance. Grappa performs these transformations. Grappa is a new latency-tolerant runtime system for distributed-memory commodity clusters that provides a shared-memory programming model for in-memory computation similar to what TBB and Cilk provide on single node platforms. Grappa implementations of map/reduce, the GraphLab API, and a Raco backend show promising performance in comparison to the specialized platforms Spark, GraphLab, and Shark, respectively. In addition, Grappa supports general computation, including complex irregular applications that have poor locality and data-dependent load distribution. Source is available for download from github.
Bio: Simon Kahan holds affiliate positions in the Computer Science and Engineering department at the University of Washington, at the Institute for Systems Biology, and at the Northwest Institute for Advanced Computing. He has held positions as a research scientist at the Pacific Northwest National Laboratory, senior member of technical staff at Google, and principal engineer at Cray Inc. He received his PhD in Computer Science from the University of Washington in 1991 and BS and MS degrees in Electrical Engineering from UC Berkeley in 1983 and 1985.
CALL FOR PAPERS
Authors are invited to submit manuscripts of original unpublished research that demonstrate a strong interplay between parallel/distributed computing techniques and learning/inference applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:
-
Learning and inference using large scale Bayesian Networks
-
Large scale inference algorithms using parallel TPIC models, clustering and SVM etc.
-
Parallel natural language processing (NLP).
-
Semantic inference for disambiguation of content on web or social media
-
Discovering and searching for patterns in audio or video content
-
On-line analytics for streaming text and multimedia content
-
Comparison of various HPC infrastructures for learning
-
Large scale learning applications in search engine and social networks
-
Distributed machine learning tools (e.g., Mahout and IBM parallel tool)
-
Real-time solutions for learning algorithms on parallel platforms
More detail. PDF version
IMPORTANT DATE
|
December 30, 2013(Extended to: January 10, 2014)
|
|
|
|
|
PAPER GUIDELINES
Submitted manuscripts may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. More format requirements will be posted on the IPDPS web page (www.ipdps.org) shortly after the author notification Authors can purchase up to 2 additional pages for camera-ready papers after acceptance. Please find details on www.ipdps.org. Students with accepted papers have a chance to apply
for a travel award. Please find details at www.ipdps.org.
Submit your paper using EDAS portal for ParLearning: http://edas.info/N15817
Camera-ready paper should be submitted to the IEEEConfPublishing Portal. See instructions on the IPDPS webpage
PROCEEDINGS
All papers accepted by the workshop will be included in the proceedings of the IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), indexed in EI and possibly in SCI.
Accepted papers with proper extension will be recommended to publish in the Journal of Parallel & Cloud Computing (PCC) and Journal of Internet Technology (Special Issue on "Security and Privacy in Cloud Network Environments"), indexed by SCI-E.
ORGANIZATION
General Co-chairs:
Abhinav Vishnu, Pacific Northwest National Laboratory, USA
Yinglong Xia, IBM T.J. Watson Research Center, USA
Publicity Co-chairs:
George Chin, Pacific Northwest National Laboratory, USA
Hoang Le, Sandia National Laboratories, USA
Program Committee:
Co-Chair: Yihua Huang, Nanjing Universtiy, China
Co-Chair: Naixue Xiong, Colorado Technical University, USA
Vice co-chair: Makoto Takizawa, Hosei University, Japan
Vice co-chair: Ching-Hsien (Robert) Hsu, Chung Hua University, Taiwan
Vice co-chair: Jong Hyuk Park, Kyungnam University, Korea
Vice co-chair: Sajid Hussain, Nashville, Tennessee, USA
Haimonti Dutta, Columbia University, USA
Jieyue He, Southeast University, China
Sutanay Choudhury, Pacific Northwest National Laboratory, USA
Yi Wang, Tecent Holding Lt., China
Zhijun Fang, Jiangxi University of Finance and Economics, China
Wenlin Han, University of Alabama, USA
Wan Jian, Hangzhou Dianzi University, China
Daniel W. Sun, NICTA, Australia
Danny Bickson, GraphLab Inc., USA
Virendra C. Bhavsar, University of New Brunswick, Canada
Zhihui Du, Tsinghua University, China
Ichitaro Yamazaki, University of Tennessee, Knoxville, USA
Gwo Giun (Chris) Lee, National Cheng Kung University, Taiwan
Lawrence Holder, Washington State University, USA
Vinod Tipparaju, AMD, USA
Nishkam Ravi, NEC Labs, USA
Renato Porfirio Ishii, Federal University of Mato Grosso do Sul (UFMS), Brazil
LIST OF ACCEPTED PAPERS
Title |
PaperID |
Portfolio-based Selection of Robust Dynamic Loop Scheduling Algorithms Using Machine Learning |
Parlearning01 |
Wait-Free Primitives for Initializing Bayesian Network Structure Learning on Multicore Processors |
Parlearning02 |
gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles |
Parlearning03 |
Large Scale Deep Learning On Xeon Phi |
Parlearning04 |
A 3D Streaming Scheme for Fly-Through in Large-scale P2P DVEs
|
Parlearning05 |
YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark
|
Parlearning06 |
Parallel Bayesian Network Modelling for Pervasive Health Monitoring System
|
Parlearning07 |
The Empirical Research of Virtual Enterprise Knowledge Transfer's Effectiveness Faced to The Independent Innovation Ability
|
Parlearning08 |
Large Scale Discriminative Metric Learning
|
Parlearning09 |
A Distributed Speech Algorithm for Large Scale Data Communication Systems
|
Parlearning10 |
CONTACT
Should you have any questions regarding the workshop or this webpage, please contact parlearning ~AT~ googlegroups DOT com.
|