Unix Programming - Specifying the ID when creating a message queue

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > February 2005 > Specifying the ID when creating a message queue





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Specifying the ID when creating a message queue
Tingo

2005-02-03, 7:52 am

Hi all,

Is it possible to create a message queue with a specific ID in C? I
want to do this because I'm trying to write a piece of software which
restores communicating processes (which communicate through message
queues) when there is a machine failure. When restarting the machine I
need to setup the message queues as they were originally.

I find that given the same key, when creating a queue, the queue ID is
not always the same value. Is there any way to specify this value? The
only solution I can think of at the moment is to store the original
queue ID as a variable and repeatedly create queues until the original
message queue ID is used.

Thanks in advance.

Jens.Toerring@physik.fu-berlin.de

2005-02-03, 5:53 pm

Tingo <ting.hau@gmail.com> wrote:
> Is it possible to create a message queue with a specific ID in C? I
> want to do this because I'm trying to write a piece of software which
> restores communicating processes (which communicate through message
> queues) when there is a machine failure. When restarting the machine I
> need to setup the message queues as they were originally.


> I find that given the same key, when creating a queue, the queue ID is
> not always the same value. Is there any way to specify this value? The
> only solution I can think of at the moment is to store the original
> queue ID as a variable and repeatedly create queues until the original
> message queue ID is used.


Why do you care about the ID? Using the same key (probably generated
via ftok()) is good enough to allow all programs using the message
queue to identify the one they need. The key is what you have control
over and what identifies a certain message queue, the ID is just some
number the OS gives you (probably just an index into an array stored
in the kernels memory) according to an algorithm that could be quite
different on different systems. Thus insisting on getting the same ID
is like requiring that your program always runs with the same PID or
to have a certain inode number for a file. And I don't think you're
going to have much luck repeatedly generating new message queues in
order to get a certain ID - it might work most of the time but when
the ID you want is already in use by another process you lose. Perhaps
if you explain why you think you need the same ID it will become
clearer what you're trying to do and if there is another way to
achieve what you want.
Regards, Jens
--
\ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Tingo

2005-02-05, 5:49 pm

Thanks Jens.

I think I need the queue ID because when a process is restored whilst
it was still running, it will have no knowledge that the queue ID has
changed. So how does the process access the queue if it is not aware of
the queue ID change? When restoring foreign applications, there is no
way to force them to use ftok to get the new queue ID.

I understand that if the ID is already in use, then a new queue cannot
be created with that ID, so I think this just might have to be a
limitation of the software.

Jens.Toerring@physik.fu-berlin.de

2005-02-06, 5:53 pm

Tingo <ting.hau@gmail.com> wrote:
> I think I need the queue ID because when a process is restored whilst
> it was still running, it will have no knowledge that the queue ID has
> changed.


Sorry, but I don't understand what "a process is restored whilst it was
still running" is supposed to mean...

When one or all of your processes using the message queue die and get
restarted the message queue is still there, and getting at it with the
common key should work just fine. The message queue only vanishes when
it gets either actively deleted (using msgctl() with IPC_RMID or the
ipcrm utility) or when the machine is rebooted - if none of this
happens it will still be there, even when all the processes that used
it are dead (message queues like shared memory or semaphores are stored
in the kernel, they don't belong to a specific process).

> So how does the process access the queue if it is not aware of
> the queue ID change?


Well, a process waiting on a queue that got deleted will return with
an error and errno set to EIDRM. All further accesses to the deleted
queue should result in an error with errno set to EINVAL. That way
the application can figure out that the queue got removed behind its
back. But what should change the queue ID in the first place?

> When restoring foreign applications, there is no way to force them
> to use ftok to get the new queue ID.


How do these foreign applications get at the message queue at all?
They use either a well-known key to specify which queue they want or
IPC_PRIVATE if they always create new ones. No non-braindead applica-
tion will ever care about the ID of the queue.

If the foreign application is creating a new message queue your programs
are accessing and the foreign application dies and you want to restart
it while you keep your programs running then you must delete the old
message queue after the foreign application got restarted and created
a new one and handle failure to read from or write to the old queue in
your programs - have them access the new one instead. But I probably
still do not understand what exactly the problem is you are trying to
solve...
Regards, Jens
--
\ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Tingo

2005-02-09, 5:57 pm


Jens.Toerring@physik.fu-berlin.de wrote:
> Sorry, but I don't understand what "a process is restored whilst it

was
> still running" is supposed to mean...


I'm sorry if I haven't explained things clearly. I'm trying to write a
program that takes checkpoints of other running (communicating)
processes. At the moment only message queues are being considered. The
idea is that, particularly
for long-running applications the user can save intermediate states of
his running programs, to be restored at some later time. Incidentally
this should cater for failure by allowing the user to restore the last
successful checkpoint. What I meant to say in the last post is that
"processes are periodically checkpointed as they run", i.e. the state
of the communicating processes are saved, including the queues. The
queues need to be saved as well, because a user should be able to stop
their computation (and subsequently clean up any queues) and restore
from the checkpoint. This could be useful in the case where a system
needs to be shutdown for maintenance.


> When one or all of your processes using the message queue die and get
> restarted the message queue is still there, and getting at it with

the
> common key should work just fine. The message queue only vanishes

when
> it gets either actively deleted (using msgctl() with IPC_RMID or the
> ipcrm utility) or when the machine is rebooted - if none of this
> happens it will still be there, even when all the processes that used
> it are dead (message queues like shared memory or semaphores are

stored
> in the kernel, they don't belong to a specific process).


The software being written tries to cater for machine failure, so an
instance where the machine requires rebooting would not be unusual. It
is very likely that the message queues would be destroyed.

> Well, a process waiting on a queue that got deleted will return with
> an error and errno set to EIDRM. All further accesses to the deleted
> queue should result in an error with errno set to EINVAL. That way
> the application can figure out that the queue got removed behind its
> back. But what should change the queue ID in the first place?


> How do these foreign applications get at the message queue at all?
> They use either a well-known key to specify which queue they want or
> IPC_PRIVATE if they always create new ones. No non-braindead applica-
> tion will ever care about the ID of the queue.


A checkpoint represents a program's state, which was recorded at some
point as it ran. If a program has created a queue with a key, to send
or receive from that queue, the ID is needed. When the program is
restored from a checkpoint the newly restored process is going to use
the same ID as it did when the checkpoint was taken. Therefore the
recreated queue requires the same ID for the given key. The alternative
to this would be to change the value of the ID in the restored process
to the ID of the newly replaced message queue, however this isn't a
direction I want to pursue.

> If the foreign application is creating a new message queue your

programs
> are accessing and the foreign application dies and you want to

restart
> it while you keep your programs running then you must delete the old
> message queue after the foreign application got restarted and created
> a new one and handle failure to read from or write to the old queue

in
> your programs - have them access the new one instead. But I probably
> still do not understand what exactly the problem is you are trying to
> solve...
> Regards, Jens
> --
> \ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
> \__________________________ http://www.toerring.de


Tingo

2005-02-09, 5:57 pm

Jens.Toerring@physik.fu-berlin.de wrote:
> Sorry, but I don't understand what "a process is restored whilst it

was
> still running" is supposed to mean...


I'm sorry if I haven't explained things clearly. I'm trying to write a
program that takes checkpoints of other running (communicating)
processes. At the moment only message queues are being considered. The
idea is that, particularly
for long-running applications the user can save intermediate states of
his running programs, to be restored at some later time. Incidentally
this should cater for failure by allowing the user to restore the last
successful checkpoint. What I meant to say in the last post is that
"processes are periodically checkpointed as they run", i.e. the state
of the communicating processes are saved, including the queues. The
queues need to be saved as well, because a user should be able to stop
their computation (and subsequently clean up any queues) and restore
from the checkpoint. This could be useful in the case where a system
needs to be shutdown for maintenance.


> When one or all of your processes using the message queue die and get
> restarted the message queue is still there, and getting at it with

the
> common key should work just fine. The message queue only vanishes

when
> it gets either actively deleted (using msgctl() with IPC_RMID or the
> ipcrm utility) or when the machine is rebooted - if none of this
> happens it will still be there, even when all the processes that used
> it are dead (message queues like shared memory or semaphores are

stored
> in the kernel, they don't belong to a specific process).


The software being written tries to cater for machine failure, so an
instance where the machine requires rebooting would not be unusual. It
is very likely that the message queues would be destroyed.

> Well, a process waiting on a queue that got deleted will return with
> an error and errno set to EIDRM. All further accesses to the deleted
> queue should result in an error with errno set to EINVAL. That way
> the application can figure out that the queue got removed behind its
> back. But what should change the queue ID in the first place?


> How do these foreign applications get at the message queue at all?
> They use either a well-known key to specify which queue they want or
> IPC_PRIVATE if they always create new ones. No non-braindead applica-
> tion will ever care about the ID of the queue.


A checkpoint represents a program's state, which was recorded at some
point as it ran. If a program has created a queue with a key, to send
or receive from that queue, the ID is needed. When the program is
restored from a checkpoint the newly restored process is going to use
the same ID as it did when the checkpoint was taken. Therefore the
recreated queue requires the same ID for the given key. The alternative
to this would be to change the value of the ID in the restored process
to the ID of the newly replaced message queue, however this isn't a
direction I want to pursue.

> If the foreign application is creating a new message queue your

programs
> are accessing and the foreign application dies and you want to

restart
> it while you keep your programs running then you must delete the old
> message queue after the foreign application got restarted and created
> a new one and handle failure to read from or write to the old queue

in
> your programs - have them access the new one instead. But I probably
> still do not understand what exactly the problem is you are trying to
> solve...
> Regards, Jens
> --
> \ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
> \__________________________ http://www.toerring.de


Jens.Toerring@physik.fu-berlin.de

2005-02-10, 5:58 pm

Tingo <ting.hau@gmail.com> wrote:

> Jens.Toerring@physik.fu-berlin.de wrote:
> was
[vbcol=seagreen]
> I'm sorry if I haven't explained things clearly. I'm trying to write a
> program that takes checkpoints of other running (communicating)
> processes.


I was already fearing something like this. Checkpointing can be
extremely difficult...

> At the moment only message queues are being considered. The
> idea is that, particularly
> for long-running applications the user can save intermediate states of
> his running programs, to be restored at some later time. Incidentally
> this should cater for failure by allowing the user to restore the last
> successful checkpoint. What I meant to say in the last post is that
> "processes are periodically checkpointed as they run", i.e. the state
> of the communicating processes are saved, including the queues. The
> queues need to be saved as well, because a user should be able to stop
> their computation (and subsequently clean up any queues) and restore
> from the checkpoint. This could be useful in the case where a system
> needs to be shutdown for maintenance.


Here the problem is that the message queue doesn't belong to one of
the processes. It's in the kernel and is independent from any of the
processes once it has been created. And as far as I can see saving
a message queue would involve reading all message, which get destroyed
while doing that, so they wont be available for the processes anymore.
To save a message queue you probably have to put all proceesses that
use it to sleep (otherwise some of the processes might get in the way
while you try to save it, changing it) read all messages from it. Then
recreate the queue by resending the messages in the same sequence - but
there's still the problem that you can't set all the fields of the
structure associated with the message queue to the original values and
if one of the programs relies on these fields it may not work correctly.
Finally you have to wake up all the programs, probably saving their
current state at that moment (that seems to be the only moment when it
can done).

Since the timing here is non-deterministic (you never know when a process
is going to read from the message queue and there might be situations
where two processes want to read the same message at the same time) what
happens afterwards in the competing processes depends on which of them
comes first) you can't guarantee that things work exactly the same way
after all programs are restarted from one of the checkpoints. Did you
consider what happens when you restart from the same ceckpoint twice and
in the first case process A gets the message it's competing for with
process B but in the second case, due to slight timing differences,
process B gets it instead? Is that acceptable?

I guess there's a lot of headache coming your way to get that right
under all possible circumstances...

> the common key should work just fine. The message queue only vanishes
> stored
[vbcol=seagreen]
> The software being written tries to cater for machine failure, so an
> instance where the machine requires rebooting would not be unusual. It
> is very likely that the message queues would be destroyed.


If the machine gets rebooted the message queue doesn't exist anymore.
Definitely.

[vbcol=seagreen]
[vbcol=seagreen]
> A checkpoint represents a program's state, which was recorded at some
> point as it ran. If a program has created a queue with a key, to send
> or receive from that queue, the ID is needed. When the program is
> restored from a checkpoint the newly restored process is going to use
> the same ID as it did when the checkpoint was taken. Therefore the
> recreated queue requires the same ID for the given key. The alternative
> to this would be to change the value of the ID in the restored process
> to the ID of the newly replaced message queue, however this isn't a
> direction I want to pursue.


Why? It can easily use instead the ID it gets when it uses the same key.
And as far as I can see, that's the only sane approach. Since you have
to do a lot of work on restart anyway (you've got to reopen files and
put the postion to the correct places, you have to reallocate all the
memory needed etc. etc.), so that would be only a small additional task.

Regards, Jens
--
\ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Tingo

2005-02-16, 5:57 pm

Thank you for taking the time to reply to my posts. I've taken on board
the issues you've raised and I think I have a good idea for an
implementation.

Thanks again.

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com