Reading code of a function?
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Programming > Reading code of a function?




Pages (2): [1] 2 »   Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Reading code of a function?  
Michael B Allen


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-17-07 06:33 AM

Is there any way to reliably read the actual code of a function?

If the function is statically linked you can simply access the symbol. For
I can 'hexdump' the hash_str code like:

hexdump(stdout, hash_str, 128, 16);
output:
00000:  55 89 e5 83 ec 0c c7 45 fc 05 15 00 00 8b 45 08  |U......E......E.|
00010:  89 45 f8 83 7d 0c 00 74 09 8b 45 08 03 45 0c 89  |.E..}..t..E..E..|
...

Looking at objdump -d I can confirm the above is indeed correct.

However, if the symbol is *dynamically* linked, the hexdump output yields
repetative garbage:

00000:  ff 25 84 c5 04 08 68 18 01 00 00 e9 b0 fd ff ff  |.%....h.........|
00010:  ff 25 88 c5 04 08 68 20 01 00 00 e9 a0 fd ff ff  |.%....h ........|
...

Any suggestions?

Thanks,
Mike






[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Paul Pluzhnikov


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-17-07 06:33 AM

Michael B Allen <mba2000@ioplex.com> writes:

> Is there any way to reliably read the actual code of a function?

Sure: if the processor can read it, so can you.

> Any suggestions?

Your question is not very clear (at least not to me).

In particular, it is difficult to understand what you are talking
about here:
[vbcol=seagreen]
> However, if the symbol is *dynamically* linked, the hexdump output yields
> repetative garbage:
>
> 00000:  ff 25 84 c5 04 08 68 18 01 00 00 e9 b0 fd ff ff  |.%....h.........|[/vbcol
]

Surely you are not expecting to find any executable code at offset
0 in the object file? If the offset 0 was "just an example", what
offset did you *actually* use (and how did you arrive at it)?

Perhaps you should ask your question again, after reading this:
http://catb.org/~esr/faqs/smart-questions.html

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.





[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Alan Curry


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-17-07 06:33 AM

In article <pan.2007.01.17.02.06.37.432404@ioplex.com>,
Michael B Allen  <mba2000@ioplex.com> wrote:
>
>However, if the symbol is *dynamically* linked, the hexdump output yields
>repetative garbage:
>
>00000:  ff 25 84 c5 04 08 68 18 01 00 00 e9 b0 fd ff ff  |.%....h.........|
>00010:  ff 25 88 c5 04 08 68 20 01 00 00 e9 a0 fd ff ff  |.%....h ........|
>

Looks like you got a dump of the PLT entry. It's not garbage, it's executabl
e
code and it's part of the dynamic linking process. It's repetitive because
there's an entry for every dynamically linked function, and each entry is
very short (16 bytes in your example).

>Any suggestions?

Disassemble those first few bytes, and look at what they do. The first time
that location is executed, it calls the dynamic linker to find the function
of the proper name and jump into it. It also stores the result so that the
next time the PLT entry is executed, it jumps directly to the function
without going through the lookup again.

If you want to make a program that can inspect itself as easily as with gdb,
you're in for a lot of work.

--
Alan Curry
pacman@world.std.com





[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Michael B Allen


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-17-07 06:17 PM

On Wed, 17 Jan 2007 07:18:10 +0000, Alan Curry wrote:

> In article <pan.2007.01.17.02.06.37.432404@ioplex.com>,
> Michael B Allen  <mba2000@ioplex.com> wrote: 
>
> Looks like you got a dump of the PLT entry. It's not garbage, it's executa
ble
> code and it's part of the dynamic linking process. It's repetitive because
> there's an entry for every dynamically linked function, and each entry is
> very short (16 bytes in your example).

We'll I need the actual .text of the function such that I can copy it
into a buffer, cast it into a function pointer and be able to call it.
 
>
> Disassemble those first few bytes, and look at what they do. The first tim
e
> that location is executed, it calls the dynamic linker to find the functio
n
> of the proper name and jump into it. It also stores the result so that the
> next time the PLT entry is executed, it jumps directly to the function
> without going through the lookup again.
>
> If you want to make a program that can inspect itself as easily as with gd
b,
> you're in for a lot of work.

Not quite what I need. Perhaps I should explain a little further.

I have a data structure in shared memory being accessed by multiple
processes. This structure represents an ADT that uses a hash function
supplied by the user when the ADT is initialized. However, because a
pointer in one process does not necessarily have the same value within
another I cannot simply store a pointer to the hash function within the
structure. Instead, I copy the hash function's .text into shared memory
and store it's offset relative to the beginning of the shared mem.

Yeah, I could pass a pointer to the hash function as a parameter every
time a process calls one of the ADT functions but that would be
pretty ugly.

Anyway, I have found a method that seems to work - dlsym returns the
.text of the function. For now I suppose I'm satisfied with that but
clearly I'll have to tweek things when porting to different platforms.

Mike






[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Alan Curry


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 12:28 AM

In article <pan.2007.01.17.15.59.04.967179@ioplex.com>,
Michael B Allen  <mba2000@ioplex.com> wrote:
>On Wed, 17 Jan 2007 07:18:10 +0000, Alan Curry wrote:
>
>I have a data structure in shared memory being accessed by multiple
>processes. This structure represents an ADT that uses a hash function
>supplied by the user when the ADT is initialized. However, because a
>pointer in one process does not necessarily have the same value within
>another I cannot simply store a pointer to the hash function within the
>structure. Instead, I copy the hash function's .text into shared memory
>and store it's offset relative to the beginning of the shared mem.

What if the hash function calls a helper function that isn't part of your
clever scheme? Aren't you back where you started, with different addresses i
n
different processes? The PLT is just a particular quirky case of this, a
small wrapper function that locates and calls another function.

>Anyway, I have found a method that seems to work - dlsym returns the
>.text of the function. For now I suppose I'm satisfied with that but
>clearly I'll have to tweek things when porting to different platforms.

dlsym sounds like the right answer for the immediate problem, but the whole
exercise still sounds ugly to me. How do you decide how many bytes to copy? 
I
hope you don't think that a compiled function necessarily ends at the first
ret instruction.

--
Alan Curry
pacman@world.std.com





[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Michael B Allen


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 12:28 AM

On Wed, 17 Jan 2007 21:19:10 +0000, Alan Curry wrote:

> In article <pan.2007.01.17.15.59.04.967179@ioplex.com>,
> Michael B Allen  <mba2000@ioplex.com> wrote: 
>
> What if the hash function calls a helper function that isn't part of your
> clever scheme? Aren't you back where you started, with different addresses
 in
> different processes? The PLT is just a particular quirky case of this, a
> small wrapper function that locates and calls another function.

Right. The hash function cannot and does not call other functions.
 
>
> dlsym sounds like the right answer for the immediate problem, but the whol
e
> exercise still sounds ugly to me. How do you decide how many bytes to copy
? I
> hope you don't think that a compiled function necessarily ends at the firs
t
> ret instruction.

Uh, right. Yeah, uh, I knew that. That's because uh, mmm ... err *why*
can't you get the size from objdump?

Mike






[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Paul Pluzhnikov


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 06:32 AM

Michael B Allen <mba2000@ioplex.com> writes:
 

The function is also not allowed to access any global data, because
in PIC code such access is indirected, and the "global" pointer
will not be properly set up when function code is copied elsewhere.
[vbcol=seagreen]
> Right. The hash function cannot and does not call other functions.

On platforms that maintain separate '%gp' register this function
will not be able to access any non-immediate data at all. I think
PowerPC, PA-RISC, MIPS, and ia64 will all present a problem.
 

Yes, we've told that to Michael about 14 month ago:
http://groups.google.com/group/comp...c92
02fc
[vbcol=seagreen] 
>
> err *why* can't you get the size from objdump?

Objdump doesn't understand some file formats at all, and often
gives you incorrect info on others.

You can get the size by examining disassembly, but you'll have
to repeat the exercise for each user-supplied function, for each
platform, each compiler, and each set of compilation flags.

And hope that user didn't turn on compiler and linker optimizations
which could split a single function into several "chunks" and
scattered them all over the DSO (this actually happens a lot in
x64 DLLs compiled with VS 2005, but I haven't seen this on any
UNIX yet).

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.





[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Michael B Allen


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 06:32 AM

Ok, I think I may have solved this problem. This solution may even
satisfy Paul :-)

The ADT (a hashmap) initialization routine could use dladdr to get the
name of the "shared function" (e.g. "hash_str") and place *it* in shared
memory. When the function needs to be called it uses dlsym on the name
to get the function. However, calling dlsym each time the function needs
to be resolved would be prohibitively slow so the function pointer would
have to be cached in a global table containing the function name and it's
address. Because the global is not in shared memory each process will
have it's own table with the correct address for that process as supplied
by dlsym. Searching the table will introduce a slight performance impact
but the table would only have a few entries (the number of unique shared
functions used throughout the program which for my current application
would be two).

I think that would yield acceptable performance and it would allow the
shared functions to call other functions, use globals, etc. I wouldn't
need to know the size of the .text or store anything architecture
specific.

Sound like a plan?

Mike






[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Logan Shaw


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 06:32 AM

Michael B Allen wrote:
> Ok, I think I may have solved this problem. This solution may even
> satisfy Paul :-)
>
> The ADT (a hashmap) initialization routine could use dladdr to get the
> name of the "shared function" (e.g. "hash_str") and place *it* in shared
> memory. When the function needs to be called it uses dlsym on the name
> to get the function. However, calling dlsym each time the function needs
> to be resolved would be prohibitively slow so the function pointer would
> have to be cached in a global table containing the function name and it's
> address. Because the global is not in shared memory each process will
> have it's own table with the correct address for that process as supplied
> by dlsym. Searching the table will introduce a slight performance impact
> but the table would only have a few entries (the number of unique shared
> functions used throughout the program which for my current application
> would be two).
>
> I think that would yield acceptable performance and it would allow the
> shared functions to call other functions, use globals, etc. I wouldn't
> need to know the size of the .text or store anything architecture
> specific.
>
> Sound like a plan?

That's the basic direction I think I'd go, but with two changes:

(1)  I would pass the name of the dynamic library to load with dlsym()
instead of the name of the function.  The name of the function
would be a fixed part of the library's interface.

(2)  I would store the list of library names in an array in the shared
memory area.  If you do that, the abstract data type can use the
array index as the piece of information that identifies which
hash function to use.  And, the various processes that share this
abstract data type can the same index to find the pointer to the
function (the result of dlsym()) in their own array of pointers,
thus making lookup really fast.

- Logan





[ Post a follow-up to this message ]



    Re: Reading code of a function?  
Paul Pluzhnikov


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-07 06:32 AM

Logan Shaw <lshaw-usenet@austin.rr.com> writes:

> Michael B Allen wrote:
 

There are several gotcha's with this solution ...
[vbcol=seagreen] 

The first gotcha is that now every "client" process has to provide
exported function "hash_str".

The second gotcha is that these hash_str()s better be identical or
at least compatible. If they are not, you'll have difficult to
debug ADT corruption.
[vbcol=seagreen] 

Presumably you have some routine that all clients call to attach
to the ADT in shared memory. That is a good time to perform dlsym()
and store resulting pointer:

typedef int (*HASHFN)(const char*);
typedef struct {
..
void *addr;
HASHFN hashfunc;
} ADT;

ADT *attach()
{
ADT *p = malloc(sizeof(*p));
p->addr = // attaches shmem
p->hashfunc = (HASHFN)dlsym(...);
return p;
}

After that there is no need for any searching -- the function
pointer is "right there".
[vbcol=seagreen]
> That's the basic direction I think I'd go, but with two changes:
>
> (1)  I would pass the name of the dynamic library to load with dlsym()
>       instead of the name of the function.  The name of the function
>       would be a fixed part of the library's interface.

This addresses the hash_str() mismatch.

I would use *absolute* pathname to the library (which avoids
possibility that two clients load two different dynamic libraries
which are both named "hash.so", e.g. because they have different
LD_LIBRARY_PATH, and end up with incompatible hash_str() implementations).

> (2)  I would store the list of library names in an array in the shared
>       memory area.

I don't see the need for a list of shared libs, but perhaps I missed
something ...

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 01:02 AM.      Post New Thread    Post A Reply      
Pages (2): [1] 2 »   Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register