Academia.eduAcademia.edu

Parallel Algorithms

description5,530 papers
group10,500 followers
lightbulbAbout this topic
Parallel algorithms are computational procedures designed to execute multiple operations simultaneously across multiple processors or cores, optimizing performance and efficiency. They leverage concurrent processing to solve problems more quickly than traditional sequential algorithms, particularly in large-scale data processing and complex computations.
lightbulbAbout this topic
Parallel algorithms are computational procedures designed to execute multiple operations simultaneously across multiple processors or cores, optimizing performance and efficiency. They leverage concurrent processing to solve problems more quickly than traditional sequential algorithms, particularly in large-scale data processing and complex computations.

Key research themes

1. How can divide-and-conquer paradigms be generalized and optimized for parallel algorithm design?

This research theme explores the formal generalization of the divide-and-conquer (D&C) programming pattern to encompass a broader class of parallel algorithms. It investigates functional specifications, potential optimizations, and equivalences to other classical parallel programming skeletons. Understanding and formalizing D&C parallelization expands its applicability and guides efficient implementation strategies.

Key finding: This paper presents a new formal functional specification generalizing the divide-and-conquer pattern, proving that many classical parallel programming patterns can be derived through parameter instantiation. It provides a... Read more
Key finding: Focusing on a parallel bag-of-tasks propagation algorithm used in image interpolation, this study develops stochastic automata network-based analytical models to predict performance of different parallel implementation... Read more
Key finding: This work examines the parallelization of computationally intensive vision algorithms—specifically data clustering and image matching—on workstation clusters. It presents design and communication strategies using PVM and... Read more

2. What frameworks and algorithmic modifications improve parallel scalability and efficiency in combinatorial and optimization problems?

This theme focuses on algorithmic design and practical parallelization techniques tailored for large, complex combinatorial optimization problems, such as vertex cover, traveling salesman, and genetic algorithm based searches. It includes decomposition strategies, load balancing, parallel tree search methodologies, and hybrid models integrating stochastic and deterministic computations on multicore and cluster architectures.

Key finding: The paper demonstrates a parallel fixed-parameter tractable approach for large-scale vertex cover problems utilizing balanced search space decomposition and kernelization through polynomial-time preprocessing steps. Careful... Read more
Key finding: This study presents a parallel branch-and-bound algorithm optimized for cluster multicore systems to solve the traveling salesman problem exactly. By employing multithreading to concurrently explore subproblems and leveraging... Read more
Key finding: Introducing a fully asynchronous parallel genetic algorithm (PGA) designed for MIMD architectures, the work shows that distributed, intelligent search with localized selection outperforms traditional genetic algorithms with... Read more

3. How can parallel computing architectures and programming models accelerate high-dimensional data processing and machine learning workflows?

This theme investigates the integration of parallel hardware (such as multicore CPUs, GPUs, and distributed clusters) with tailored algorithms and programming models to speed up data-intensive tasks including machine learning, biometric recognition, and large-scale matrix computations. It covers GPU-based acceleration, parallel statistical computing methods, and parallel stochastic gradient descent approaches for recommendation systems.

Key finding: This survey and methodological discussion emphasizes various parallel statistical computing techniques including parallel regression algorithms, parallel bootstrap, and parallel Markov Chain Monte Carlo. It underscores... Read more
Key finding: By implementing a multimodal biometric system combining face and iris recognition on Nvidia CUDA-enabled GPUs, the authors achieve more than threefold speed improvements over CPU-based processing without loss of accuracy. The... Read more
Key finding: This paper presents a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm for training latent factor recommendation models on large-scale sparse matrices. The algorithm combines parallel data... Read more
Key finding: Targeting multicore CPU platforms, this work proposes novel parallelization of integer multiplication algorithms through a delayed carry mechanism which decouples loop iterations to enable concurrent partial sums computation.... Read more
Key finding: Demonstrating the use of workstation clusters and message passing (PVM) for parallelizing clustering and image matching, this paper highlights how decomposing data-intensive vision tasks into concurrent subproblems enables... Read more

All papers in Parallel Algorithms

We focus on agent-based simulations where a large number of agents move in the space, obeying to some simple rules. Since such kind of simulations are computational intensive, it is challenging, for such a contest, to let the number of... more
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the... more
This work concerns a general technique to enrich parallel version of stochastic simulators for biological systems with tools for online statistical analysis of the results. In particular, within the FastFlow parallel programming... more
The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow... more
We discuss the efficient implementation of a collective operation called reduce-scatter, which is defined in the MPI standard. The reduce-scatter is equivalent to the combination of a reduction on vectors of length n with a scatter of the... more
The Optical Transpose Interconnection System (OTIS) is a recently proposed model of computing that exploits the special features of both electronic and optical technologies. In this paper we present efficient algorithms for packet... more
Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are 3-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In... more
Mesh connected computers have become attractive models of com- puting because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable... more
This paper presents an effective and efficient parallel algorithm to improve the quality of surface meshes representing models generated by the application of the Boolean and assembly operations to predefined primitives, such as spheres... more
Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics -a scalable parallel algorithm to simulate the spread of contagion in large, realistic social... more
Los algoritmos más reconocidos para la multiplicación de matrices en forma paralela requieren cómputo intensivo y gran capacidad de almacenamiento. Una grilla de transputers provee un medio propicio para distribuir la carga de cómputo y... more
Primeiramente, como de praxe de minha parte, agradeço a Deus, seja lá como ele for, por existir. Agradeço aos meus pais e minha irmã por me conhecerem o suficiente para confiarem a mim meu trabalho e não a responsabilidade de cuidar de... more
A aplicacao em logistica de distribuicao e diversa, a exemplo do planejamento de transporte e entrega de mercadorias ou no roteamento de dados em redes de telecomunicacoes. Dado a amplitude e capilaridade desses problemas, trabalhos vem... more
T he reduction of noise emission has become a key commercial argument for heli- copter manufacturers, such as Airbus Helicopters. For years, Airbus helicopters has placed emphasis on the good acoustic behavior of its helicopters, as... more
In this paper, we present a method to construct column weight two Low-Density Parity-Check (LDP C) codes (namely, Cycle codes) from arbitrary graphs and we obtain a new class of girth twelve LDP C codes from complete graphs. Also, we use... more
Prediction of the translation initiation site is of vital importance in bioinformatics since through this process it is possible to understand the organic formation and metabolic behavior of living organisms. Sequential algorithms are not... more
The parallelization of mining algorithms under MapReduce (MR) became a reality in the last years, but algorithms for training single decision trees, like ID3 [1]or C4.5 [2], remain unexplored. Decision trees continue to play an important... more
An “any time” algorithm has to deliver its answer any time it is needed; it has to provide an answer or a solution in certain (often small) amount of time, which is not known in advance. Thus, the algorithm must have always “an... more
This letter presents an improved Toom's algorithm that allows hardware savings without slowing down the processing speed. We derive formulae for the number of multiplications and additions required to compute the linear convolution of... more
Knowledge and information spanning multiple information sources, multiple media, multiple versions and multiple communities challenge the capabilities of existing knowledge and information management infrastructures by far -primarily in... more
In [KUW1] we have proposed the setting of independence systems to study the relation between the computational complexity of search and decision problems. The universal problem that captures this relation, which we termed the $S$-search... more
This paper studies parallel search algorithms within the framework of independence systems. It is motivated by earlier work on parallel algorithms for concrete problems such as the determination of a maximal independent set of vertices or... more
The ability of many processors to simultaneously read from the same cell of shared memory can give additional power to a parallel random access machine. In this paper, we describe a natural Boolean function of n variables and show that... more
In this paper we compare the power of the two most commonly used concurrent-write models of parallel computation, the COMMON PRAM and the PRIORITY PRAM. These models differ in the way they resolve write conflicts. If several processors... more
Shared memory models of parallel computation (e.g. parallel RAMs) that allow simultaneous read/write access are very natural and already widely used for parallel algorithm design. The various models differ from each other in the mechanism... more
There are two main components in microwave tomography to detect abnormalities in breasts: Genetic Algorithm (GA) and Finite-Difference Time-Domain (FDTD). Both GA and FDTD are time-consuming, but, they are data-parallel in nature. In this... more
In this paper, we present a variational framework for joint disparity and motion estimation in a sequence of stereo images. The problem involves the estimation of four dense fields: two motion fields and two disparity fields. In order to... more
En este trabajo se presenta el metodo basado en modelos de Redes de Petri para el analisis y paralelizacion eficiente de aplicaciones programadas con un paradigma secuencial. Primeramente, se realiza el modelo de la aplicacion secuencial.... more
It is well known that LDPC decoding is computationally demanding and one of the hardest signal operations to parallelize. Beyond data dependencies that restrict the decoding of a single word, it requires a large number of memory accesses.... more
In this work we will consider asynchronous iteration algorithms. As is well known in multiprocessor computers the parallel application of iterative methods often shows poor scaling and less than optimal parallel efficiency. The ordinary... more
This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The... more
In recent years, we have witnessed the proliferation of applications that generate thousands of terabytes of data per day, due to the explosive increase in storage capacity across various devices. As a consequence, a new concept called... more
La visión artificial representa hoy un área de gran utilidad e interés para los investigadores más allá de que sus técnicas se remiten a más de 3 décadas de desarrollos. Esto se debe a la expansión tecnológica que ha permitido una... more
Output queued switches are appealing because they have better latency and throughput than input queued switches. However, they are difficult to build: a direct implementation of an N × N output-queued switch requires the switching fabric... more
This paper presents results for the queue-read, queue-write asynchronous p arallel random access machine qrqw asynchronous pram model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which w as... more
Clinical evaluation of electroencephalogram (EEG) is important for understanding and monitoring the electrical activity present in the brain. In collusion with engineering advances, the movement towards portable, rapid and low-cost EEG... more
In this paper, we consider a randomized greedy algorithm for independent sets in r-uniform d-regular hypergraphs G on n vertices with girth g. By analyzing the expected size of the independent sets generated by this algorithm, we show... more
An adjusted trinomial model for pricing both European and American arithmetic average-based Asian options is proposed. The Kamrad and Ritchken trinomial tree governs the underlying asset evolution. The algorithm selects a subset of the... more
The rapid advancements in quantum computing have introduced new paradigms in solving classical computational problems, such as sorting and searching. These two fundamental operations are central to numerous computer science applications,... more
This paper presents the concept and design of exhaustive-parallel search algorithm for Network on-Chip. The proposed parallel algorithm searches minimal path between source and destination in a forward-wave-propagation manner. The... more
The PageRank algorithm for determining the importance of Web pages has become a central technique in Web search. This algorithm uses the Power method to compute successive iterates that converge to the principal eigenvector of the Markov... more
In this paper, we proposed a new parallel algorithm: Parallel Regularized Multiple-Criteria Linear Programming (PRMCLP) to overcome the computing and storage requirements increased rapidly with the number of training samples. Firstly, we... more
In this paper we present DFScala, a library for constructing and executing dataflow graphs in the Scala language. Through the use of Scala this library allows the programmer to construct coarse grained dataflow graphs that take advantage... more
We present sequential and parallel algorithms for Frontier A* (FA*) algorithm augmented with a form of Delayed Duplicate Detection (DDD). The sequential algorithm, FA*-DDD, overcomes the leak-back problem associated with the combination... more
We present sequential and parallel algorithms for Frontier A* (FA*) algorithm augmented with a form of Delayed Duplicate Detection (DDD). The sequential algorithm, FA*-DDD, overcomes the leak-back problem associated with the combination... more
Download research papers for free!