|
Home > Archive > Unix Programming > January 2008 > fork and pointers
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Syren Baran 2007-12-23, 7:18 pm |
| Hi,
i want a pointer that intially points to the same memory region for the
child as for the parent process. Whats the simple way to do this?
| |
| John Tsiombikas 2007-12-23, 7:18 pm |
| On 2007-12-23, Syren Baran <syren@gmx.de> wrote:
>
> Hi,
> i want a pointer that intially points to the same memory region for the
> child as for the parent process. Whats the simple way to do this?
Are you talking about shared memory? Or are you talking about the same
virtual address? The second is trivial, so I guess you mean shared
memory.
Just have the parent mmap /dev/zero with MAP_SHARED, then fork.
--
John Tsiombikas (Nuclear / Mindlapse)
http://nuclear.sdf-eu.org/
| |
| Syren Baran 2007-12-23, 7:18 pm |
| John Tsiombikas schrieb:
> On 2007-12-23, Syren Baran <syren@gmx.de> wrote:
>
> Are you talking about shared memory? Or are you talking about the same
> virtual address? The second is trivial, so I guess you mean shared
> memory.
Yes, i mean shared memory.
>
> Just have the parent mmap /dev/zero with MAP_SHARED, then fork.
Hmm. Problem is the pointer points to a class, which itself makes
extensive use of pointers.
Is it sufficient to just copy the class into the shared memory? Somehow
i doubt the pointers stored in the shared memory point to the same
physical address for the child-processes.
| |
| Barry Margolin 2007-12-23, 7:18 pm |
| In article <476ed5cc$0$17533$9b4e6d93@newsspool4.arcor-online.net>,
Syren Baran <syren@gmx.de> wrote:
> John Tsiombikas schrieb:
> Yes, i mean shared memory.
> Hmm. Problem is the pointer points to a class, which itself makes
> extensive use of pointers.
> Is it sufficient to just copy the class into the shared memory? Somehow
> i doubt the pointers stored in the shared memory point to the same
> physical address for the child-processes.
This is not an easy thing to do.
You may be able to use a custom allocator, so that you can allocate your
class objects within the shared memory. However, ALL the objects that
the class references, and that they reference, and so on have to do
this. Otherwise, you'll eventually run into some pointers that point to
process-local memory.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Barry Margolin 2007-12-23, 7:18 pm |
| In article <476ed5cc$0$17533$9b4e6d93@newsspool4.arcor-online.net>,
Syren Baran <syren@gmx.de> wrote:
> John Tsiombikas schrieb:
> Yes, i mean shared memory.
> Hmm. Problem is the pointer points to a class, which itself makes
> extensive use of pointers.
> Is it sufficient to just copy the class into the shared memory? Somehow
> i doubt the pointers stored in the shared memory point to the same
> physical address for the child-processes.
As I mentioned in my last post, this is pretty difficult.
Could you use threads instead of forking? All threads share the same
virtual memory address space, so you don't need to do anything special
to access objects.
BTW, whether you use threads or processes, you'll need to implement
mutual exclusion of all the shared objects.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Syren Baran 2007-12-23, 7:18 pm |
| Barry Margolin schrieb:
>
>
> As I mentioned in my last post, this is pretty difficult.
>
> Could you use threads instead of forking? All threads share the same
> virtual memory address space, so you don't need to do anything special
> to access objects.
Threads definatly seem like the better approach (i also had a look at
clone, but that would limit the program to linux), so i guess i´ll have
some reading to do.
> BTW, whether you use threads or processes, you'll need to implement
> mutual exclusion of all the shared objects.
I´m already using mutex locks for the timing critical operations, but
thanks for the reminder.
| |
| John Tsiombikas 2007-12-24, 7:33 am |
| On 2007-12-23, Syren Baran <syren@gmx.de> wrote:
>
> Barry Margolin schrieb:
> Threads definatly seem like the better approach (i also had a look at
> clone, but that would limit the program to linux), so i guess i´ll have
> some reading to do.
There is no point in using a platform-specific syscall for threads.
Every modern UNIX system supports the pthread (POSIX thread) API. Use
that.
--
John Tsiombikas (Nuclear / Mindlapse)
http://nuclear.sdf-eu.org/
| |
| Syren Baran 2007-12-25, 1:39 am |
| John Tsiombikas schrieb:
>
> There is no point in using a platform-specific syscall for threads.
> Every modern UNIX system supports the pthread (POSIX thread) API. Use
> that.
I´ll use pthreads anyway. Still considering clone for a third version
plattform specific enhancement. Since the man page for clone states
thats this syscall is intented for thread implementation my best guess
would be that pthread only adds a minimal creation overhead and no
switching overhead. If anyone can confirm my assumption i wont even
consider this "stupid" optimisation (spares time for other things after
all ;) ).
| |
| William Pursell 2007-12-25, 1:39 am |
| On Dec 23, 11:44 pm, Syren Baran <sy...@gmx.de> wrote:
> Barry Margolin schrieb:
>
[vbcol=seagreen]
>
>
>
> Threads definatly seem like the better approach (i also had a look at
> clone, but that would limit the program to linux), so i guess i=B4ll have
> some reading to do.
threads are often easier to code. IMO, they are seldom the
better approach, and the "easier" is illusory. Which is to say
that it is really easy to do your initial coding using threads, but
the complications that it introduces generally cause maintenance
issues that overwhelm the initial savings. If all you need is
a common piece of memory, take the time to put the data
in shared memory and execute a fork().
| |
| Syren Baran 2007-12-25, 1:24 pm |
| William Pursell schrieb:
>
> threads are often easier to code. IMO, they are seldom the
> better approach, and the "easier" is illusory. Which is to say
> that it is really easy to do your initial coding using threads, but
> the complications that it introduces generally cause maintenance
> issues that overwhelm the initial savings. If all you need is
> a common piece of memory, take the time to put the data
> in shared memory and execute a fork().
I´ve been using threads extensivly for a long time in Java. I know the
complications they can cause and how to deal with them. So i´ll use them
instead of writing some memory management for shared memory. If i only
wanted to use some simple (e.g. arrays) data in shared memory, that
approach might be usefull, but not if i have a lot of intervined classes
and pointers.
| |
| Rainer Weikusat 2007-12-25, 7:19 pm |
| William Pursell <bill.pursell@gmail.com> writes:
> On Dec 23, 11:44 pm, Syren Baran <sy...@gmx.de> wrote:
[...]
[vbcol=seagreen]
[...]
[vbcol=seagreen]
> threads are often easier to code.
Easier than what? Multi-threaded applications a certainly not 'easier
to code' than single-threaded applications, by virtue of the simple
fact that they have to deal with something (multiple threads of
control) single-threaded applications don't need to.
> IMO, they are seldom the better approach,
Better than what? And because of what?
> and the "easier" is illusory.
It's wrong.
> Which is to say that it is really easy to do your initial coding
> using threads,
And this, too. It is not generally 'easier' to write malfunctioning
code than it is to write functioning code.
> but the complications that it introduces generally cause
> maintenance issues that overwhelm the initial savings.
There is no 'complication' introduced by having multiple threads
within the same address space which doesn't already exist with
multiple cooperating single-threaded processes ...
> If all you need is a common piece of memory, take the time to put
> the data in shared memory and execute a fork().
.... especially, if those multiple cooperating single-threaded
processes additionally share parts of their address space. Just a
difference in quantity, not one in quality. But this basically
only means that ignoring potential concurrency issues is less likely
to lead to easily observable software defects for the multi-process
case. IOW, the chance to get away with buggy code is somewhat
higher then.
| |
| David Schwartz 2007-12-26, 1:39 am |
| On Dec 23, 12:42 pm, Syren Baran <sy...@gmx.de> wrote:
> Hi,
> i want a pointer that intially points to the same memory region for the
> child as for the parent process. Whats the simple way to do this?
On modern operating systems, after a 'fork', all pointers point to the
same memory region for both the parent and the child.
DS
| |
| Gordon Burditt 2007-12-26, 1:39 am |
| >> i want a pointer that intially points to the same memory region for the
>
>On modern operating systems, after a 'fork', all pointers point to the
>same memory region for both the parent and the child.
But that's *NOT* shared memory and it won't stay that way (with copy-on-write)
as soon as either process writes the memory.
If you want shared (read/write) memory, use shared memory: mmap()
or sysvshm.
| |
| David Schwartz 2007-12-26, 1:39 am |
| On Dec 25, 8:33 pm, gordonb.0w...@burditt.org (Gordon Burditt) wrote:
[vbcol=seagreen]
[vbcol=seagreen]
> But that's *NOT* shared memory and it won't stay that way (with
> copy-on-write)
> as soon as either process writes the memory.
The OP specifically asked for pointers that *initially* point to the
same memory region.
DS
| |
| William Pursell 2007-12-26, 7:33 am |
| On Dec 25, 10:50 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> William Pursell <bill.purs...@gmail.com> writes:
>
> [...]
>
>
>
> [...]
>
>
> Easier than what? Multi-threaded applications a certainly not 'easier
> to code' than single-threaded applications, by virtue of the simple
> fact that they have to deal with something (multiple threads of
> control) single-threaded applications don't need to.
Threads are often perceived to be easier to implement
than multiple cooperating single threaded applications.
>
>
> Better than what? And because of what?
Writing a multi-threaded application is usually
not better than writing multiple cooperating
single threaded applications. The reason
being that single threaded apps cannot stomp
on each other's address space without effort
on the part of the developer. eg, it is more
difficult to screw things up.
>
>
> It's wrong.
So it seems we are in agreement....
>
>
> And this, too. It is not generally 'easier' to write malfunctioning
> code than it is to write functioning code.
Uh, I think we are in agreement here. This is why
I stated that it is illusory to believe that threading is
simpler.
>
>
> There is no 'complication' introduced by having multiple threads
> within the same address space which doesn't already exist with
> multiple cooperating single-threaded processes ...
Yes, there is. A coding error in a multi threaded application
is more difficult to track down. The address space wall
between single-threaded processes makes it easier to
write correct code.
>
>
> ... especially, if those multiple cooperating single-threaded
> processes additionally share parts of their address space. Just a
> difference in quantity, not one in quality. But this basically
> only means that ignoring potential concurrency issues is less likely
> to lead to easily observable software defects for the multi-process
> case. IOW, the chance to get away with buggy code is somewhat
> higher then.
I am certainly not suggesting that concurrency issues can
be ignored. What I am saying is that if two distinct jobs are
being done, and the choice is to perform those two jobs either
in two threads or in two processes, it is generally safer and
easier and more correct to write it as two processes, given
that it is generally the case the the overlap of the data
that they need to share is generally a small percentage
of the overall data being used. If Bob only needs access
to 5 % of the address space to do his job (Bob being the
"sub"-thread), then it is silly to give him access to the
full address space, where any mistakes he makes can
screw up everything.
In my experience, many people decide on a multi-threaded
design instead of a multi-process design simply because
they don't know how to implement the multi-process design.
That is not a good reason for that decision.
| |
| Syren Baran 2007-12-26, 7:33 am |
| David Schwartz schrieb:
> On Dec 25, 8:33 pm, gordonb.0w...@burditt.org (Gordon Burditt) wrote:
>
>
>
>
> The OP specifically asked for pointers that *initially* point to the
> same memory region.
Yes, thats true. But i did mean to change some of the pointers myself,
not to have this done by OS to create a copy.
>
> DS
| |
| Syren Baran 2007-12-26, 7:33 am |
| William Pursell schrieb:
> On Dec 25, 10:50 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
Actually, the same code could be considered buggy if used in conjunction
with threads while beeing correct if used with processes. While correct
code (from a multiple thread point of view) would be considered bloated
and have lots of unnecesarry instructions when used for multiple processes.[vbcol=seagreen]
>
> I am certainly not suggesting that concurrency issues can
> be ignored. What I am saying is that if two distinct jobs are
> being done, and the choice is to perform those two jobs either
> in two threads or in two processes, it is generally safer and
> easier and more correct to write it as two processes, given
> that it is generally the case the the overlap of the data
> that they need to share is generally a small percentage
> of the overall data being used. If Bob only needs access
> to 5 % of the address space to do his job (Bob being the
> "sub"-thread), then it is silly to give him access to the
> full address space, where any mistakes he makes can
> screw up everything.
Reducing the decision on wether to use threads or processes to the
percentage of overlapping data is just plain stupid.
Even if over 90% of the address space is shared its easy if its just one
blob. Far more important questions are "How complex are the shared
data structures?" and "How often are the data structures modified?".
>
> In my experience, many people decide on a multi-threaded
> design instead of a multi-process design simply because
> they don't know how to implement the multi-process design.
> That is not a good reason for that decision.
Using multiple processes is not necesarrily easier and usually less
efficient. Even if you use a mmap´ed region you will often need to know
when data structures have changed. Means either checking often or
sending signals. Other IPC-mechanisms such as pipes or unix socks add a
large overhead to the communication.
But with that last sentance of yours we are definetly in agreement.
Using a solution just because its the only solution one knows is not a
well informed decision.
| |
| Rainer Weikusat 2007-12-26, 1:25 pm |
| William Pursell <bill.pursell@gmail.com> writes:
> On Dec 25, 10:50 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
[...]
[vbcol=seagreen]
>
> Threads are often perceived to be easier to implement
> than multiple cooperating single threaded applications.
A multi-process application I happen to know quite well (because I
wrote all parts of it) happens to be what provides secured shell
access to 'our' appliances. At the base of this are two applications,
one of the running on an appliance and one running on the computer
manageing it, which negotiate and update GSSAPI security contexts based
on the MIT Kerberos implementation and provide a control channel to
(optionally) reliably transmit encrypted, integrity protected and
authenticated messages from appliance to management computer or vice
versa. Then, there is an interface program, which, when running on the
management computer with sufficient privileges, frames data read from
its controlling terminal (or a file or a -c option) into control
messages, which are subsequently transmitted to a particular
appliance, based on its MAC address, by the other program running on
the management computer I mentioned so far. Its counterpart on the
appliance receives, verifies and acknowledges these messages and sends
them, based on a 6bit number contained in some header field, to a
specific AF_UNIX datagram socket. All defined datagram sockets are
monitored by another program which creates a process executing a
specific file with the datagram socket connected to its standard input
upon activity on a socket. This program waits until the newly created
process has terminated before it again starts to act on a particular
socket. The program started for the 'shell access' application forks a
shell connected to the slave end of a pty, opens a pipe to another
program responsible for output, and relays data between the datagram
socket, the pty and the pipe going to the output program. That,
lastly, reads messages from its standard input, using a simple
protocol ('message length' followed by 'message data'), again frames
the into control messages and uses the services provided by the GSSAPI
client program to cause them to be transmitted to the manageing
computer, where they are finally forwarded to the interface program
using a similar mechanism than the on already described to map the
same 6bit number to an AF_UNIX datagram socket.
It would certainly not have been easier to implement this as two
multithreaded programs only (actually, the GSSAPI client and server
programs are multithreaded, but that's an implementation detail).
The shell features, has, of course, just been added to this
arrangement because the control message part was already there to
support a set of other applications, mostly for configuring
appliances.
>
> Writing a multi-threaded application is usually
> not better than writing multiple cooperating
> single threaded applications. The reason
> being that single threaded apps cannot stomp
> on each other's address space without effort
> on the part of the developer. eg, it is more
> difficult to screw things up.
Using invalid pointers in one of the cooperating applications will
'screw things up' just as easy, and the whole composed of these
application will then not function as desired, too.
[...]
>
> Yes, there is. A coding error in a multi threaded application
> is more difficult to track down.
This would a least depend on the error.
> The address space wall between single-threaded processes makes it
> easier to write correct code.
Code being affected by 'the address space wall' is incorrect code
because it tries to do invalid memory accesses.
| |
| David Schwartz 2007-12-26, 7:21 pm |
| On Dec 26, 12:53 am, William Pursell <bill.purs...@gmail.com> wrote:
> Threads are often perceived to be easier to implement
> than multiple cooperating single threaded applications.
That's because for a applications that require significant
cooperation, they are much easier.
> Writing a multi-threaded application is usually
> not better than writing multiple cooperating
> single threaded applications. The reason
> being that single threaded apps cannot stomp
> on each other's address space without effort
> on the part of the developer. eg, it is more
> difficult to screw things up.
That's nonsense. If a major consideration of your design is how easy
or hard it is to screw things up, you are operating outside of your
area of competence. Obviously if you don't know how to write a multi-
threaded application, it's easy to screw one up.
Getting multiple processes to cooperate is, for many applications,
extremely hard.
DS
| |
| Barry Margolin 2007-12-27, 1:37 am |
| In article <47723b3b$0$16668$9b4e6d93@newsspool3.arcor-online.net>,
Syren Baran <syren@gmx.de> wrote:
> Reducing the decision on wether to use threads or processes to the
> percentage of overlapping data is just plain stupid.
> Even if over 90% of the address space is shared its easy if its just one
> blob. Far more important questions are "How complex are the shared
> data structures?" and "How often are the data structures modified?".
If the shared structures are C++ classes, you don't have much control
over the complexity. There are also often hidden pointers, such as the
vtable pointer.
Pointers are a particular problem when sharing memory between processes,
because things are not in the same location in each process. If you
want to use a shared memory block, you usually have to replace all the
intra-object pointers with offsets, preventing you from using C's
built-in dereference mechanism directly.
You also have to implement your own allocator, you can't just use
malloc() or new().
Shared memory is fine for arrays of simple data. The further you get
from that, the easier it will be to code a multi-threaded application in
a single address space.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Rainer Weikusat 2007-12-27, 7:33 am |
| Barry Margolin <barmar@alum.mit.edu> writes:
> In article <47723b3b$0$16668$9b4e6d93@newsspool3.arcor-online.net>,
> Syren Baran <syren@gmx.de> wrote:
>
> If the shared structures are C++ classes, you don't have much control
> over the complexity. There are also often hidden pointers, such as the
> vtable pointer.
>
> Pointers are a particular problem when sharing memory between processes,
> because things are not in the same location in each process.
It should be possible to attach it to the same location in each
process.
| |
| James Antill 2007-12-27, 1:24 pm |
| On Wed, 26 Dec 2007 15:48:21 -0800, David Schwartz wrote:
> That's nonsense. If a major consideration of your design is how easy or
> hard it is to screw things up, you are operating outside of your area of
> competence. Obviously if you don't know how to write a multi- threaded
> application, it's easy to screw one up.
I'm not sure what you are saying here, I can't imagine that you are
arguing that it's not worth doing "X" if the only benefit is that it makes
it harder to screw up? I would say that affects everything and is one of
my biggest considerations be it via. commenting, using assert(), putting
arguments in a "good" order or writing unit tests.
I mean I could say the exact same thing about using or not using memory
protection ... but noone is using DOS anymore, and I find it hard to
believe that current C/C++ threading can survive for similar reasons.
I guess you might be arguing that you believe the amount of benefit you
see on not screwing up from not using threads is lower than the benefit
you see from not having to design the interactions before writing them.
But if so it was badly put.
--
James Antill -- james@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
| |
| Rainer Weikusat 2007-12-27, 1:24 pm |
| James Antill <james-netnews@and.org> writes:
> On Wed, 26 Dec 2007 15:48:21 -0800, David Schwartz wrote:
>
>
> I'm not sure what you are saying here, I can't imagine that you are
> arguing that it's not worth doing "X" if the only benefit is that it makes
> it harder to screw up?
If "not doing X" significantly increases the chances that you "don't
screw up", the obvious implication is that you don't really know how
to do X. But in itself, this does not communicate anything about X not
related to you.
But in the given context, this is basically just a red herring,
because the purpose (and technical effect) of running different
processes in different address spaces is not to make it easier for the
person working on the code of the program executed in one process to
'not screw up' this code, but to make it impossible for this process
to 'screw up' anything except itself, ie the kernel or other processes
executing other programs.
[...]
> I mean I could say the exact same thing about using or not using memory
> protection ...
And that would be wrong, as outlined above.
> but noone is using DOS anymore, and I find it hard to believe that
> current C/C++ threading can survive for similar reasons.
There is still a fairly vocal group of people around, eg among people
affliated to teaching what Germany has instead of 'computer science'
that studying 'computer science' outside the domain of electrical
currents, ie software, is "a waste of time" because it will certainly
go away on its own Real Soon Now[tm]. This sentiment has at least
existed since the mid-1960s and it is generally not an uncommon
prejudice sufficiently old people tend to have regarding developments
younger than they are.
Indepedently of this, a multi-threaded process is not the same as a
couple of traditional processes crammed into a single address space
for some weird reason, this would be just a badly fitting metaphor
someone whose mental universe consists of single-threaded processes
could use to describe this to himself. It is a single program capable
of exploiting concurrency within itself. An example I already cited
would be a POP3-interception proxy running on the appliances I do most
programming for: After starting, the processes intializes itself just
enough to enter into 'POP3 transaction state' and while the network
communication necessary for authorization takes place, a second thread
initializes what will only ever be used after transaction state has
been entered. Assuming this ever takes place, one thread is
responsible for downloading a mail message and storing it locally so
that the AV-scanner (an external server process started on demand) can
scan the message for viruses, and as soon as enough of the message has
been received for the spam detector to work, which is a server process
running on another computer 'on the internet', the receiving thread
kicks of a second thread which talks to this server to get the mail
classified. By the time the AV-scanner has checked the mail, the
former receiving thread either uses the already determined result of
the spam check or waits for the second thread to complete its task.
> I guess you might be arguing that you believe the amount of benefit you
> see on not screwing up from not using threads is lower than the benefit
> you see from not having to design the interactions before writing
> them.
By which weird strain of fancy have you arrived at the superstition
that working interactions among multiple threads of control inside a
single process are so much simpler than interactions among independent
processes that there would be no need to 'design' them?
And how does this mix with the other claim that they are actually so
hideously more complicated that ever getting them right is a hopeless
quest?
| |
| James Antill 2007-12-27, 7:25 pm |
| On Thu, 27 Dec 2007 19:55:08 +0100, Rainer Weikusat wrote:
> James Antill <james-netnews@and.org> writes:
>
> If "not doing X" significantly increases the chances that you "don't
> screw up", the obvious implication is that you don't really know how to
> do X.
This is not an obvious implication, to me, at all. The much more obvious
implication is that X is _hard_, or at least getting it correct is.
>
> There is still a fairly vocal group of people around, [that]
> software, is "a waste of time" because it will certainly go
> away on its own Real Soon Now[tm].
If they have a better alternative, then I'd guess they could be right.
> Indepedently of this, a multi-threaded process [...]
> is a single program capable of exploiting
> concurrency within itself. An example I already cited would be a
> POP3-interception proxy running on the appliances I do most programming
> for: After starting, the processes intializes itself just enough to
> enter into 'POP3 transaction state' and while the network communication
> necessary for authorization takes place, a second thread initializes
> what will only ever be used after transaction state has been entered.
The above can obviously be implemented via. poll(), possibly with a
helper application if you need something like PAM[1] (see vsftpd).
> Assuming this ever takes place, one thread is responsible for
> downloading a mail message and storing it locally
This is just network IO -> disk, no threads needed here.
> [...] and as soon as enough of the message has been received [...]
> the receiving thread kicks of a second
> thread which talks to this server to get the mail classified.
More IO, more poll().
> By the
> time the AV-scanner has checked the mail, the former receiving thread
> either uses the already determined result of the spam check or waits for
> the second thread to complete its task.
And this is just overhead for calling pthread_create() or whatever.
None of the above works faster, than the obvious alternative, no matter
how many cores you have. So the only advantage of "threads" are that you
don't have to design the "state machine" to handle the events ... you
just create a bunch of threads, and pray you don't have any lurking
threading issues (and like the DOS example, history isn't on your side).
[1] Ie. something you can't alter which requires interactive blocking
IO, but then that almost certainly requires it's own address
space/sandbox anyway ... but it seemed fair to mention it.
--
James Antill -- james@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
| |
| David Schwartz 2007-12-28, 1:37 am |
| On Dec 27, 8:35 am, James Antill <james-netn...@and.org> wrote:
> On Wed, 26 Dec 2007 15:48:21 -0800, David Schwartz wrote:
[vbcol=seagreen]
> I'm not sure what you are saying here, I can't imagine that you are
> arguing that it's not worth doing "X" if the only benefit is that it makes
> it harder to screw up? I would say that affects everything and is one of
> my biggest considerations be it via. commenting, using assert(), putting
> arguments in a "good" order or writing unit tests.
You misunderstand me. I'm saying that if a major consideration of your
design is how easy or hart it is to screw thing up, you are operating
outside of your area of competence. I don't worry about threads
screwing things up because I am competent to use them. If I use them
and screw things up, it's because *I* screwed things up.
Threads don't really change anything just because you used them. The
only things that really change are things like how 'malloc' and
'errno' work internally. Everything else changes because you choose to
change it. And if you changed things such that they break, that's the
fault of your change, not the fault of the threading.
> I mean I could say the exact same thing about using or not using memory
> protection ... but noone is using DOS anymore, and I find it hard to
> believe that current C/C++ threading can survive for similar reasons.
I don't see what that has to do with anything. Sure, we won't be using
C/C++ threading when something better comes around, but nothing better
has come around yet.
Perhaps process pool architectures will mature to the point where they
can seriously compete with threading. Who knows?
We can speculate about the future, list the things wrong with today's
tools, and hope future tools don't have those problems. But that's not
a reason not to use the tools we have today where they solve real
problems.
> I guess you might be arguing that you believe the amount of benefit you
> see on not screwing up from not using threads is lower than the benefit
> you see from not having to design the interactions before writing them.
> But if so it was badly put.
I have no idea what you're saying there. I can't imagine that could
possibly be clearer than what I said, whatever it means.
If your main problem with threads is that things break when you use
them, that says a lot about your competence with threads. It may be a
good argument why you shouldn't use threads. Maybe you don't encounter
the kinds of problems that threads help with, so there's no
significant benefit to you becoming competent.
But I am a competent threads programmer. I use them every day. They
very rarely cause me problems because I design my libraries and
applications to be multi-threaded from the ground up. I don't have to
worry about a lot of problems other programmers struggle with like
getting performance out of high-end hardware, process stalls, or
bursty performance because threads solve all of these problems if you
use them correctly.
DS
| |
| David Schwartz 2007-12-28, 1:37 am |
| On Dec 27, 1:59 pm, James Antill <james-netn...@and.org> wrote:
> This is not an obvious implication, to me, at all. The much more obvious
> implication is that X is _hard_, or at least getting it correct is.
Threading is best for real-world hard problems. It takes the hardest
parts of those problems and makes them easier. The only reason
threading seems hard is because it is most suitable for use on
problems that are already hard.
> The above can obviously be implemented via. poll(), possibly with a
> helper application if you need something like PAM[1] (see vsftpd).
No, for two reasons:
1) If you hit a page fault, your entire process stalls. Under load,
the stall is often irrecoverable. Imagine if one connection triggers a
rare error condition and the code for handling that error never
faulted in from disk yet, and the disk is busy.
2) That would make every single line of code performance critical,
rather than the 20% of the application that would otherwise be
critical. An unexpected block on any line of code would be fatal.
You think threading is hard? Try solving those two problems without
it.
>
> This is just network IO -> disk, no threads needed here.
Really? Because disk reads are very tricky to make non-blocking. Even
writes can block if the buffers are full. You think it's a good idea
to stall all the clients when the disk is busy? (Assuming there are
clients that don't require any disk access to service.)
> None of the above works faster, than the obvious alternative, no matter
> how many cores you have. So the only advantage of "threads" are that you
> don't have to design the "state machine" to handle the events ... you
> just create a bunch of threads, and pray you don't have any lurking
> threading issues (and like the DOS example, history isn't on your side).
What about page faults? What about disk I/O?
I get the picture you don't write software that does the type of
things threads are best for.
DS
| |
| Rainer Weikusat 2007-12-28, 7:32 am |
| James Antill <james-netnews@and.org> writes:
> On Thu, 27 Dec 2007 19:55:08 +0100, Rainer Weikusat wrote:
>
> This is not an obvious implication, to me, at all.
> The much more obvious implication is that X is _hard_, or at least
> getting it correct is.
At best, when going beyond a single-person example, this would be
'getting X right is hard for a certain set of people', that this
can be generalized to all people and that there isn't some other
common attribute Y of the observed population which could be the
reason for the observation would need to be proven and/or researched.
>
> If they have a better alternative, then I'd guess they could be
> right.
There alternative is selectively ignoring would should not be.
>
> The above can obviously be implemented via. poll(),
And the above can obviously be implemented by printing the mail on
paper, sending it to New Jersey by air mail, having someone compare it
manually against the signature database, attach the result to a
carrier pidgin going (slowly) to New Hampshire, hammering it into a
stone transported by train to Austin, Texas, transcribe it again onto
paper there, send it by ship to Borneo and have someone type it into a
terminal there, which sends an e-mail message with the result back to
the appliance.
Not to mention that the functionality itself could be implemented by
just doing a phonecall instead of sending an e-mail and that each
phonecall could be replaced by a letter etc.
Humans are really good in overcoming technical limitations through
creativity. But this is not a point in favor of technical limitations.
> possibly with a helper application if you need something like PAM[1]
> (see vsftpd).
This doesn't make sense in the given context: Subsystem initialization
cannot be accomplished by 'a helper application', because that would
run in an own address space and an interception proxy of this type has
no use for 'local authentication', because it just
pseudo-transparently intercepts traffic between 'real client' and
'real server' and modifies it somehow (like replacing mails containing
viruses with virus alerts).
>
> This is just network IO -> disk, no threads needed here.
Since something must move the data from one file descriptor to another
(and do some translations in between), at least one thread of control
is obviously needed.
>
>
> More IO, more poll().
Or more carrier pidgeons going from New Jersey to other New England
states. In this particular case, it would be carrier pidgeons carrying
army trunks, for the reson of certain other technical limitations I
didn't mention.
But in the end, every problem can be solved with a large switch
statement and liberal use of unconditional gotos.
[...]
> None of the above works faster, than the obvious alternative,
'Works faster' is of no concern for this particular application (and
least not to this degree). And such a claim would, of course, need to
be proven experimentally, nevertheless.
> no matter how many cores you have.
And this would be decidedly wrong: Two processors could actually send
data to two different 'other applications' in parallell, using two
different protocols, but the 'large switch statement liberally using
unconditional gotos' could only be executed on one at any given time.
> So the only advantage of "threads" are that you
> don't have to design the "state machine" to handle the events ...
Exactly. Instead of implementing a form of cooperative userspace
threading to work on existing tasks pseudo-concurrently, the already
existing pthreads implementation being part of the C-library is used
to accomplish the same end. This comes at the price of a higher
overhead in an area which doesn't matter for this application.
> just create a bunch of threads, and pray you don't have any lurking
> threading issues
As I had already written in my original posting: It does not work this
way.
| |
| William Pursell 2007-12-30, 1:38 am |
| On Dec 27, 6:55 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
> And how does this mix with the other claim that they are actually so
> hideously more complicated that ever getting them right is a hopeless
> quest?
I suspect this refers to my initial foray into this thread. If
so, you misinterpret me. I do not claim that it is hopeless
to ever correctly implement a multi-threaded app. I
simply claim that, very often, people choose to go
multi-threaded when they shouldn't. Further, I believe
that, in the majority of cases, it is more appropriate
to use multiple cooperating single threaded apps.
I would estimate that 85% of the
multi-threaded applications I have written
have been subsequently re-designed (by me) to
run as single threaded apps, where the overall
design has been simpler and that simplicity has
lead to greater run-time efficiency. I'm not
arguing that this is a global truth, and am
perhaps using the word "claim" in a way that
does not directly correlate to the way in which
you are reading it. I am simply presenting my
experience in an attempt to provide the OP with
some data on which to make a design decision.
In the past, I would implement a multi-threaded
version, and subsequent experimentation led to
design simplifications which eventually led to
breaking the app up. Now, I generally design
things to be multi-processed from the start,
and I find my life to be more fulfilling
as a result. 
| |
| Rainer Weikusat 2007-12-30, 1:27 pm |
| William Pursell <bill.pursell@gmail.com> writes:
> On Dec 27, 6:55 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
> I suspect this refers to my initial foray into this thread.
There is no reason to suspect this and the suspicion is wrong.
[...]
> I would estimate that 85% of the
> multi-threaded applications I have written
> have been subsequently re-designed (by me) to
> run as single threaded apps, where the overall
> design has been simpler and that simplicity has
> lead to greater run-time efficiency. I'm not
> arguing that this is a global truth, and am
> perhaps using the word "claim" in a way that
> does not directly correlate to the way in which
> you are reading it. I am simply presenting my
> experience in an attempt to provide the OP with
> some data on which to make a design decision.
> In the past, I would implement a multi-threaded
> version, and subsequent experimentation led to
> design simplifications which eventually led to
> breaking the app up.
Without an actual example, this only communicates that you have a
tendency to get initial designs wrong :->.
| |
| Syren Baran 2008-01-01, 1:36 am |
| William Pursell schrieb:
> On Dec 27, 6:55 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
> I suspect this refers to my initial foray into this thread. If
> so, you misinterpret me. I do not claim that it is hopeless
> to ever correctly implement a multi-threaded app.
Well, used the multithreaded approach to often anyway, in languages that
dont even know fork.
>I am simply presenting my
> experience in an attempt to provide the OP with
> some data on which to make a design decision.
That is highly valued.
I dont see a problem with the threaded approach, well, except for the
fact that i have be pretty fussy with garbage collection, e.g.
destructors, delete´s and free´s. But than again again, thats the
experience i hope to enjoy ;)
| |
| David Schwartz 2008-01-01, 1:28 pm |
| On Dec 29 2007, 11:02 pm, William Pursell <bill.purs...@gmail.com>
wrote:
> I suspect this refers to my initial foray into this thread. If
> so, you misinterpret me. I do not claim that it is hopeless
> to ever correctly implement a multi-threaded app. I
> simply claim that, very often, people choose to go
> multi-threaded when they shouldn't.
I agree, but this is generally because of limitations in their own
competency rather than issues with multi-threading itself.
> Further, I believe
> that, in the majority of cases, it is more appropriate
> to use multiple cooperating single threaded apps.
Actually, my experience has been that they should instead find someone
competent to write multi-threaded programs.
> I would estimate that 85% of the
> multi-threaded applications I have written
> have been subsequently re-designed (by me) to
> run as single threaded apps, where the overall
> design has been simpler and that simplicity has
> lead to greater run-time efficiency. I'm not
> arguing that this is a global truth, and am
> perhaps using the word "claim" in a way that
> does not directly correlate to the way in which
> you are reading it. I am simply presenting my
> experience in an attempt to provide the OP with
> some data on which to make a design decision.
Then you must be doing some things very horribly wrong. Because I have
never even seen that happen, much less had it happen to me personally.
The only exceptions were:
1) When I had to deal with other people's code that for one reason or
another could not easily be changed, and that code was either never
designed to be multi-threaded or horribly mis-designed.
2) When I had to deal with platforms that had mediocre thread support
(such as FreeBSD before it had real threads).
> In the past, I would implement a multi-threaded
> version, and subsequent experimentation led to
> design simplifications which eventually led to
> breaking the app up. Now, I generally design
> things to be multi-processed from the start,
> and I find my life to be more fulfilling
> as a result. 
How do you synchronize the processes? I do believe that one day multi-
process design will offer huge advantages over multi-threaded design.
But multi-process is even less mature than multi-thread.
Suppose you are designing a multi-process web server to handle 16,000
connections. You can't run 16,000 processes. So a single process will
have to have more than one connection. What if the process that has
connection 12 needs to do something that's going to take a long time
for connection 13. Do you grab a free process, hand off connection 12
and its context, and then work on 13? Or do you manage all the
descriptors and context as shared state in shared memory? How do you
synchronize it?
Multi-process is really just multi-thread with extra problems in
synchronization and handoff of state and descriptors. Nothing about
multi-thread requires you to allow more than one thread to access an
object.
DS
| |
| Rainer Weikusat 2008-01-01, 1:28 pm |
| David Schwartz <davids@webmaster.com> writes:
> On Dec 29 2007, 11:02 pm, William Pursell <bill.purs...@gmail.com>
[...]
>
> How do you synchronize the processes? I do believe that one day multi-
> process design will offer huge advantages over multi-threaded design.
> But multi-process is even less mature than multi-thread.
UNIX(*) network servers have supported multi-processing a long time
before UNIX(*) even got threading support, real world examples:
apache or Samba, which both use a 'process per connection'-model
(apache2 only by default).
| |
| David Schwartz 2008-01-01, 7:23 pm |
| On Jan 1, 8:58 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
[vbcol=seagreen]
> UNIX(*) network servers have supported multi-processing a long time
> before UNIX(*) even got threading support, real world examples:
> apache or Samba, which both use a 'process per connection'-model
> (apache2 only by default).
I'm not talking about process-per-connection, I'm talking about multi-
process. Essentially the same thing as multi-thread, but using
processes instead of threads.
DS
| |
| David Schwartz 2008-01-02, 1:38 am |
| On Dec 27 2007, 2:17 am, Rainer Weikusat <rweiku...@mssgmbh.com>
wrote:
> Barry Margolin <bar...@alum.mit.edu> writes:
[vbcol=seagreen]
> It should be possible to attach it to the same location in each
> process.
True, but sometimes it isn't.
DS
| |
| James Antill 2008-01-02, 7:23 pm |
| On Tue, 01 Jan 2008 08:47:50 -0800, David Schwartz wrote:
>
> How do you synchronize the processes? I do believe that one day multi-
> process design will offer huge advantages over multi-threaded design.
> But multi-process is even less mature than multi-thread.
>
> Suppose you are designing a multi-process web server to handle 16,000
> connections. You can't run 16,000 processes. So a single process will
> have to have more than one connection. What if the process that has
> connection 12 needs to do something that's going to take a long time for
> connection 13. Do you grab a free process, hand off connection 12 and
> its context, and then work on 13? Or do you manage all the descriptors
> and context as shared state in shared memory? How do you synchronize it?
I guess you might design the server that way for multi-threading,
assuming that the random blocking is fine (as another thread will go
service your other connections), but you just don't do it that way with
multi-procs.
Both lighttpd and and-httpd already implement multi-process as a
different model where you have N non-blocking parts which can speak to
M blocking parts. So if connection 12 needs to do something that takes
a long time it gets handed off to another proc. for that while the main
process services other connections.
--
James Antill -- james@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
| |
| David Schwartz 2008-01-03, 1:44 am |
| On Jan 2, 2:13 pm, James Antill <james-netn...@and.org> wrote:
> I guess you might design the server that way for multi-threading,
> assuming that the random blocking is fine (as another thread will go
> service your other connections), but you just don't do it that way with
> multi-procs.
Right, that's because multi-process is not mature yet. It may get that
way one day.
> Both lighttpd and and-httpd already implement multi-process as a
> different model where you have N non-blocking parts which can speak to
> M blocking parts. So if connection 12 needs to do something that takes
> a long time it gets handed off to another proc. for that while the main
> process services other connections.
The problem is that handing off to another process is a very ugly deal
right now. And if tight synchronization is required, it's very tricky.
There is simply no way to deal with page faults. On the bright side,
they only stall some other connections rather than all of them. So at
least if you're under heavy load from multiple sources, you can
continue to get useful work done while the fault is handled.
DS
|
|
|
|
|