|
Home > Archive > Distributed Computing > May 2006 > distributed processing
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
distributed processing
|
|
| Barney 2006-05-13, 7:11 pm |
| I am working with a set of table fragments of about 30 rows by 10 columns
each from a large table populated with base five digits. The total number of
rows is 5^10 or 9,765,625.
The time to process each group is about 1 second or about 90 hours to
process the whole table. If I use other computers I can cut the time in
proportion to time/number of computers such that the task can be completed
in minute if I had 5,425 computers and in about 30 seconds if I had 10,850
computers, etc.
Is there a methodology for establishing such an Internet based distributed
processing project?
| |
| russell kym horsell 2006-05-14, 1:12 am |
| Barney <intelpentiumpeglegnewstwtelecomnet> wrote:
> I am working with a set of table fragments of about 30 rows by 10 columns
> each from a large table populated with base five digits. The total number of
> rows is 5^10 or 9,765,625.
> The time to process each group is about 1 second or about 90 hours to
> process the whole table. If I use other computers I can cut the time in
> proportion to time/number of computers such that the task can be completed
> in minute if I had 5,425 computers and in about 30 seconds if I had 10,850
> computers, etc.
> Is there a methodology for establishing such an Internet based distributed
> processing project?
It's a good idea to take into account times other than computer time.
Given that you can "instantly" establish a group of "internet computers"
to do your job, sit down and calculate where your break even point is
given network communication is 10s to 100s of times slower than local
networks, and local networks are 10s to 100s of times slower than
computation. (Any pedants in the audience can add other levels in the
hierarchy .
You'll probably find a problem that *seems* to benefit from 10,000
remote computers will not actually benefit from using 9,990 of them.
Further, given it takes time to assemble a group of remote nodes willing
to do your work, the number of nodes you actually can benefit from for small
tasks is even smaller.
As Gustavson's Law shows, you have to think big (or "how big", I guess)
to benefit from wide-area distributed computing. If you only have a small
problem (e.g. expected to run for a few days on an "average" cpu), it's better
to get yourself a quad mobo with dual-core chips and ramp up the clock as far
as it will go.
| |
| Barney 2006-05-14, 7:11 pm |
| > It's a good idea to take into account times other than computer time.
> Given that you can "instantly" establish a group of "internet computers"
> to do your job, sit down and calculate where your break even point is
> given network communication is 10s to 100s of times slower than local
> networks, and local networks are 10s to 100s of times slower than
> computation. (Any pedants in the audience can add other levels in the
> hierarchy .
Although I used completion time as the basis for using more than one
computer the task at hand is not so much for the purpose of obtaining
instant or rapid completion but completion in a reasonable amount of time.
The real handicap here is that every fragment has to be processed before the
job is complete.
> You'll probably find a problem that *seems* to benefit from 10,000
> remote computers will not actually benefit from using 9,990 of them.
One of the phenomena of large scale division is diminishing returns, i.e.
while it initially takes only one additional computer to cut the work load
in half it takes 5000 additional computers to cut the workload of 5000
computers in half.
> Further, given it takes time to assemble a group of remote nodes willing
> to do your work, the number of nodes you actually can benefit from for
> small
> tasks is even smaller.
One idea that seems to work well for other group oriented tasks is the idea
of reciprocity, i.e, use of the group to accomplish the same task for other
data and the sharing of the results by all members of the group.
> As Gustavson's Law shows, you have to think big (or "how big", I guess)
> to benefit from wide-area distributed computing. If you only have a small
> problem (e.g. expected to run for a few days on an "average" cpu), it's
> better
> to get yourself a quad mobo with dual-core chips and ramp up the clock as
> far
> as it will go.
Even for some "small" tasks the problem can not be solved by
multi-processor/muti-core systems because the memory constraints still limit
array index size to 2^31. Hopefully this limitation could be solved using a
large network if it could support a large array. Do you know of any methods
whereby a larger array can be created using virtual memory and then
expanding this virtual memory to allow or include remote nodes?.
|
|
|
|
|