a interesting Parallel Programing Problem: asciify-string


X

Xah Lee

here's a interesting problem that we are discussing at comp.lang.lisp.

〈Parallel Programing Problem: asciify-string〉
http://xahlee.org/comp/parallel_programing_exercise_asciify-string.html

here's the plain text. Code example is emacs lisp, but the problem is
general.

for a bit python relevancy… is there any python compiler that's
parallel-algorithm aware?

-----------------------------------------------
Parallel Programing Problem: asciify-string

Here's a interesting parallel programing problem.

Problem Description

The task is to change this function so it's parallelable. (code
example in emacs lisp)

(defun asciify-string (inputStr)
"Make Unicode string into equivalent ASCII ones."
(setq inputStr (replace-regexp-in-string "á\\|à\\|â\\|ä" "a"
inputStr))
(setq inputStr (replace-regexp-in-string "é\\|è\\|ê\\|ë" "e"
inputStr))
(setq inputStr (replace-regexp-in-string "í\\|ì\\|î\\|ï" "i"
inputStr))
(setq inputStr (replace-regexp-in-string "ó\\|ò\\|ô\\|ö" "o"
inputStr))
(setq inputStr (replace-regexp-in-string "ú\\|ù\\|û\\|ü" "u"
inputStr))
inputStr
)

Here's a more general description of the problem.

You are given a Unicode text file that's a few peta bytes. For certain
characters in the file, they need to be changed to different char.
(For example of practical application, see: IDN homograph attack â—‡
Duplicate characters in Unicode.)

One easy solution is to simply use regex, as the above sample code, to
search thru the file sequentially, and perform the transfrom of a
particular set of chars, then repeat for each char chat needs to be
changed.

But your task is to use a algorithm parallelizable. That is, in a
parallel-algorithm aware language (e.g. Fortress), the compiler will
automatically span the computation to multiple processors.

Refer to Guy Steele's video talk if you haven't seen already. See: Guy
Steele on Parallel Programing.
Solution Suggestions

A better way to write it for parallel programing, is to map a char-
transform function to each char in the string. Here's a pseudo-code in
lisp by Helmut Eller:

(defun asciify-char (c)
(case c
((? ? ? ?) ?a)
((? ? ? ?) ?e)
((? ? ? ?) ?i)
((? ? ? ?) ?o)
((? ? ? ?) ?u)
(t c)))

(defun asciify-string (string) (map 'string #'asciify-string string))

One problem with this is that the function “asciify-char†itself is
sequential, and not 100% parallelizable. (we might assume here that
there are billions of chars in Unicode that needs to be transformed)

It would be a interesting small project, if someone actually use a
parallel-algorithm-aware language to work on this problem, and report
on the break-point of file-size of parallel-algorithm vs sequential-
algorithm.

Anyone would try it? Perhaps in Fortress, Erlang, Ease, Alice, X10, or
other? Is the Clojure parallel aware?

Xah
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top