Login

yaseed · 08-17-2017, 02:49 AM

[attachment=3186]

Accelerating Image Processing Pipelines in a Hardware/Software Environment

Presented By:
Heather Quinn, Dr. Miriam Leeser, Northeastern University
Dr. Laurie Smith King
College of the Holy Cross
Outline

Background
Image processing and hardware
The cost of codesign systems
Image processing pipelines
An example
The Pipeline Assignment Problem
Solving Pipeline Assignment
Goal of Project
Accelerate image processing tasks through efficient use of FPGAs
Combine already designed components at runtime to implement series of transformations (pipelines)
Image Processing Tasks
Good candidates for FPGAs
Algorithms are often explicitly parallel
Input and output data large but individual pixels are small
Data access regular
Hardware Systems
Using hardware incurs execution costs not present in software systems
hardware initialization
communicating image
reprogramming
Efficient Use of FPGAs
Software algorithm s runtime for small images less than the hardware costs
Profiling the hardware and software runtimes for different image sizes determines the crossover point
Deciding at runtime to execute in software or hardware is simple for one algorithm processing one image
Image Processing Pipelines
Series of image processing algorithms applied to an image
Each algorithm has a software and hardware implementation
Finding the crossover point for a pipeline is complicated
Exponential number of implementations
Reprogramming costs
Need a strategy to find a fast pipeline implementation at runtime
An Example
Possible Implementations
Blue boxes are hw/sw boundaries
Red boxes are fixing image edges
Green Boxes are reprogramming
Median Filter and Edge Detection Profiles
Median Filter Edge Detection Profiles
Problem Statement
Inputs: a profiled library of image processing components, a pipeline, and an image
Output: an assignment of each component to a hardware or software implementation
The Library of Components
Each component has two implementations: hardware and software
Each implementation has known runtimes for a set of images
Interpolation used for rest of images
Each hardware implementation has a known area size
Each component interface is image in/image out
Assumptions
Reprogramming and communication costs incurred at sw/hw boundaries
Might need to fix image edge
in between components
Problems sizes of 20 or fewer stages
500 ms to make a decision
Solving Pipeline Assignment
Exhaustively
ILP
Greedy
Local Search
Experiments
Related Problems
Codesign partitioning problem
ILP: Niemann and Marwedel
Simulated Annealing: Cosyma
Iterative: Ptolemy
Parallel computing scheduling
Local Search: UNM
Exhaustive
Find: Optimal solutions
How: Search entire problem space
Algorithm Runtime: O(2N), where N is the number of pipeline stages
ILP
Find: Optimal solutions
How: AMPL model running on CPLEX
Need: ILP formulation of the problem statement
Algorithm Runtime: Unknown
Greedy
Find: Sub-optimal solutions
How: Make optimal decisions for each pipeline stage based on hardware area usage and speedup values
Algorithm Runtime: O(N), where N is the number of pipeline stages
Local Search
Find: Sub-optimal solutions
How: Improve upon initial solutions (found through greedy or randomly)
Algorithm Runtime: Runs for user supplied amount of time
Experiments
Synthetic components arranged into pipelines of length 1 to 20
Exhaustive algorithm run to completion
Used as a baseline for solution quality
Timed to find 500 ms boundary
ILP solver constrained to 500 ms
Ability to solve dependent on components
Local Search returns best solution found within time limit
Results
Optimal solutions in 500 ms
Exhaustive: up to 11 stages
ILP: all pipelines up to 13 stages, some pipelines up to 18 stages, and none larger
Sub-optimal solutions in 500 ms
Greedy and local search: all problem sizes
Strategy
Exhaustive and ILP up to 13 stages
Greedy or local search for more than 13 stages
Optimal Solutions in 500 ms
Conclusion
Defined pipeline assignment
Introduced 4 possible ways to solve the problem at runtime
Found that 3 algorithms to efficiently solve different problem sizes
Future Work
ADAPT: Algorithm that calls exhaustive, ILP and local search algorithms to solve pipeline assignment problem based on problem size
Decision Time: Study how the amount of time allotted affects ADAPT results
Virtex II Pro: Add support for using embedded Power PC cores
References

[1] R. Niemann and P. Marwedel, Hardware/Software Partitioning using Integer Programming, Proceedings of the European Design and Test Conference, Paris, France, 1996, pp. 473-480.
[2] J. Henkel, R. Ernst, U. Holtmann, and T. Benner, Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-synthesis, Proceedings of International Conference on Computer-Aided Design, 1994.
[3] A. Kalavade and E. A. Lee, A Gobal Criticality/Local Phase Driven Algorithm for the Constrainted Hardware/Software Partitioning Problem, Proceedings of CODES/CASHE 1994, Third International Workshop on Hardware/Software Codesign, Grenoble, France, Sept. 22-24, 1994, pp 42-48.
[4] M.-Y. Wu,W. Shu and J. Gu, Efficient Local Search for DAG Scheduling, IEE Transactions on Parallel and Distributed Systems, vol 12, num 6