|
Home > Archive > Unix Programming > December 2006 > proper way to determine string length
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
proper way to determine string length
|
|
|
| Often i come across a situation where i need to terminate the string. I
usually use a construct like:
buffer[strlen(buffer)-1]='\0';
And this works fine. But, as i understand it, strlen itself also
searches for the '\0', so that seems a bit
redundant.Is this the proper way to do it ? How does anyone else
terminate a string.
Thanks for your answer (and sorry for the somewhat stupid question)
Kind regards,
atv
| |
|
|
atv wrote:
> Often i come across a situation where i need to terminate the string. I
> usually use a construct like:
> buffer[strlen(buffer)-1]='\0';
>
> And this works fine. But, as i understand it, strlen itself also
> searches for the '\0', so that seems a bit
> redundant.Is this the proper way to do it ? How does anyone else
> terminate a string.
You need to know its length by some other means (the missing context of
your question). As you say above, strlen() is for terminated strings.
>
> Thanks for your answer (and sorry for the somewhat stupid question)
>
> Kind regards,
> atv
| |
| Pascal Bourguignon 2006-12-16, 1:20 pm |
| atv <alef@xs4all.nl> writes:
> Often i come across a situation where i need to terminate the
> string. I usually use a construct like:
> buffer[strlen(buffer)-1]='\0';
>
> And this works fine.
This can raise SIGSEGV signals or garble random data.
You've just been lucky so far.
> But, as i understand it, strlen itself also
> searches for the '\0', so that seems a bit
> redundant.
Assuming that buffer is already null-terminated and more than one
character long, what
buffer[strlen(buffer)-1]='\0';
does is it reduce the length by 1. When the buffer is not
null-terminated, it has undefined effects. Often, in unix system,
it'll find the next null byte following the buffer in memory, and set
the previous byte to 0. If it reaches unmapped or mapped unredable
memory, it will signal a SIGBUS or SIGSEGV. Otherwise, it'll modify
some random byte and this can lead to random bugs.
If the buffer contains an empty string, it'll set the byte just before
the buffer to 0, and this can also signal a SIGBUS or SIGSEGV or
modify a "random" byte generating a random bug (for example, it could
modify a byte used by malloc/free and XXXX them up).
> Is this the proper way to do it ?
> How does anyone else terminate a string.
You must know the length of your string and write:
buffer[length]='\0';
The best way is to write an abstract string type, for example as a
structure:
typedef struct {
size_t allocated;
size_t length;
char* bytes;
} *string;
and the associated library of string manipulation functions, so you
always know the size and length of any string and you can raise errors
when an index is greater than the length of the string, or reallocate
the string when the length grows bigger than the allocated size (and
string_length can be implemented in O(1) instead of O(n) for strlen).
> Thanks for your answer (and sorry for the somewhat stupid question)
No problem.
--
__Pascal Bourguignon__ http://www.informatimago.com/
NEW GRAND UNIFIED THEORY DISCLAIMER: The manufacturer may
technically be entitled to claim that this product is
ten-dimensional. However, the consumer is reminded that this
confers no legal rights above and beyond those applicable to
three-dimensional objects, since the seven new dimensions are
"rolled up" into such a small "area" that they cannot be
detected.
| |
| loic-dev@gmx.net 2006-12-16, 7:26 pm |
| Hello,
> Often i come across a situation where i need to terminate the string. I
> usually use a construct like:
> buffer[strlen(buffer)-1]='\0';
>
> And this works fine.
Not really. If the string is not '\0' terminated, this code is invalid
since (as you noticed) /strlen()/ looks for the terminating '\0'.
And if the string is '\0' terminated, then why would you need to
terminate the string? Plus as noted by Pascal already, you actually
erase the last character of the string in that case (assuming that the
string is non empty...).
So, is that really what you want? Probably not.
What you could do for instance is to first fill the buffer with '\0'
(you should know the size of the buffer, don't you?). And then copy
your string to that buffer (ensuring that you do not have a buffer
overflow). Doing so ensure that the string is '\0' terminated. Even if
you copy several strings at different time.
Cheers,
Loic.
| |
| Barry Margolin 2006-12-17, 1:37 am |
| In article <87r6uzhgm7.fsf@thalassa.informatimago.com>,
Pascal Bourguignon <pjb@informatimago.com> wrote:
> Assuming that buffer is already null-terminated and more than one
> character long, what
>
> buffer[strlen(buffer)-1]='\0';
>
> does is it reduce the length by 1.
I suspect this idiom is most often used to get rid of the newline at the
end of a buffer filled by fgets().
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
|
|
Barry Margolin wrote:
> In article <87r6uzhgm7.fsf@thalassa.informatimago.com>,
> Pascal Bourguignon <pjb@informatimago.com> wrote:
>
> I suspect this idiom is most often used to get rid of the newline at the
> end of a buffer filled by fgets().
....and fgets() NUL-terminates its strings.
>
> --
> Barry Margolin, barmar@alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Michal Nazarewicz 2006-12-17, 7:24 am |
| Barry Margolin <barmar@alum.mit.edu> writes:
> In article <87r6uzhgm7.fsf@thalassa.informatimago.com>,
> Pascal Bourguignon <pjb@informatimago.com> wrote:
>
> I suspect this idiom is most often used to get rid of the newline at the
> end of a buffer filled by fgets().
Idiom which is valid only to some extend as the last character of a
string read by fgets() does not need to be new line character.
--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>---<jid:mina86*chrome.pl>--ooO--(_)--Ooo--
| |
| Michal Nazarewicz 2006-12-17, 7:24 am |
| atv <alef@xs4all.nl> writes:
> Often i come across a situation where i need to terminate the
> string. I usually use a construct like:
> buffer[strlen(buffer)-1]='\0';
If you declare buffer as an array of char (ie. `char buffer[1024];`)
what you want is probably this: `buffer[sizeof buffer - 1] = 0;` If
buffer is merely a pointer and you used malloc to allocate memory for
it you need to use `buffer[NUMBER_OF_BYTES_YOU_ALOCATED - 1] = 0;`
--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>---<jid:mina86*chrome.pl>--ooO--(_)--Ooo--
| |
|
|
Michal Nazarewicz wrote:
> atv <alef@xs4all.nl> writes:
>
>
> If you declare buffer as an array of char (ie. `char buffer[1024];`)
> what you want is probably this: `buffer[sizeof buffer - 1] = 0;`
Assuming the string in the buffer is actually that long. Without
context from the OP, Barry's theory seems more plausible (in which case
the terminator can be assumed).
> If
> buffer is merely a pointer and you used malloc to allocate memory for
> it you need to use `buffer[NUMBER_OF_BYTES_YOU_ALOCATED - 1] = 0;`
>
> --
> Best regards, _ _
> .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
> ..o | Computer Science, Michal "mina86" Nazarewicz (o o)
> ooo +--<mina86*tlen.pl>---<jid:mina86*chrome.pl>--ooO--(_)--Ooo--
| |
| Rainer Temme 2006-12-17, 1:16 pm |
| atv wrote:
> Often i come across a situation where i need to terminate the string. I
> usually use a construct like:
> buffer[strlen(buffer)-1]='\0';
> And this works fine.
No it definately does NOT.
Just assume what happens, if the string within the
buffer has a length of zero.
strlen(buffer)-1 evaluates to -1 then
and what you do is to execute the following assignment:
buffer[-1]='\0';
This is asking for trouble!
Rainer
| |
|
| On 2006-12-17 15:05:37 +0100, Rainer Temme
<Rainer.Temme@NoSpam.Siemens.Com> said:
Thanks everyone for your help. This is really helpful. I was just
testing something regarding strlen and the string functions and i came
upon a weird (unexpected) result.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
char timestamp[256];
int main(void)
{
struct timeval tv;
if(gettimeofday(&tv,NULL)==-1)
strcpy(timestamp,"Time is not available");
else
strcpy(timestamp,ctime(&tv.tv_sec));
printf("%s",timestamp);
printf("%d",(int)strlen(timestamp));
return(0);
}
If i use strncpy here i get the same result as with strcpy! I expected
strcpy to give me a segfault or something, because strcpy only *copies*
the '\0' character from the original passed string it doesn't terminate
a string by itself (at least according to my man pages). But it behaves
just the same as with strncpy! Unless i sort of implicitly pass a '\0'
character when i give the function a string, i don't understand what's
going on.
What am i doing wrong now :-)
Again, many thanks for helping me understand. I appreciate the time you
put into this :-)
ps is it ok/normal to cast size_t into int? Because i don't what other
conversion modifier there is to use for size_t
| |
| Pascal Bourguignon 2006-12-17, 1:17 pm |
| atv <alef@xs4all.nl> writes:
> On 2006-12-17 15:05:37 +0100, Rainer Temme
> <Rainer.Temme@NoSpam.Siemens.Com> said:
>
>
> Thanks everyone for your help. This is really helpful. I was just
> testing something regarding strlen and the string functions and i came
> upon a weird (unexpected) result.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>
> char timestamp[256];
>
> int main(void)
> {
> struct timeval tv;
>
> if(gettimeofday(&tv,NULL)==-1)
> strcpy(timestamp,"Time is not available");
> else
> strcpy(timestamp,ctime(&tv.tv_sec));
>
> printf("%s",timestamp);
> printf("%d",(int)strlen(timestamp));
>
> return(0);
> }
>
> If i use strncpy here i get the same result as with strcpy! I expected
> strcpy to give me a segfault or something, because strcpy only
> *copies* the '\0' character from the original passed string it doesn't
> terminate a string by itself (at least according to my man pages).
Yes. So what difference does it make?
const char string[]="Hi!";
a[i]='\0';
a[i]=string[3];
These two assignment have the same effect! The both put a null byte in a[i].
> But
> it behaves just the same as with strncpy! Unless i sort of implicitly
> pass a '\0' character when i give the function a string, i don't
> understand what's going on.
Read again man strncpy. It doesn't do the same as strcpy at all!
strcpy always copy the null byte, so the destination always contains a
null terminated string.
But strncpy doesn't always copy the null byte. When the buffer is
smaller than the string, then it will be filled with non-null byte and
no null terminating byte will be written!
> What am i doing wrong now :-)
You're not reading the man page well enough ;-)
> Again, many thanks for helping me understand. I appreciate the time
> you put into this :-)
> ps is it ok/normal to cast size_t into int? Because i don't what other
> conversion modifier there is to use for size_t
It's possible, but you may have problems when size_t is unsigned int
and contains a value bigger than MAX_INT. Then you get as an int a
negative value! It would be better to declare variables of type
size_t to manipulate data of type size_t.
--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
| |
| Måns Rullgård 2006-12-17, 1:17 pm |
| atv <alef@xs4all.nl> writes:
> Thanks everyone for your help. This is really helpful. I was just
> testing something regarding strlen and the string functions and i came
> upon a weird (unexpected) result.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>
> char timestamp[256];
>
> int main(void)
> {
> struct timeval tv;
>
> if(gettimeofday(&tv,NULL)==-1)
> strcpy(timestamp,"Time is not available");
> else
> strcpy(timestamp,ctime(&tv.tv_sec));
>
> printf("%s",timestamp);
> printf("%d",(int)strlen(timestamp));
>
> return(0);
> }
>
> If i use strncpy here i get the same result as with strcpy! I expected
> strcpy to give me a segfault or something, because strcpy only
> *copies* the '\0' character from the original passed string it doesn't
> terminate a string by itself (at least according to my man pages). But
> it behaves just the same as with strncpy! Unless i sort of implicitly
> pass a '\0' character when i give the function a string, i don't
> understand what's going on.
>
> What am i doing wrong now :-)
You still have the notion that the C language has a notion of
strings. Any mention of a string in a C context means a
null-terminated sequence of bytes.
The standard definition of strcpy() is this:
The strcpy() function shall copy the string pointed to by s2
(including the terminating null byte) into the array pointed to by
s1. If copying takes place between objects that overlap, the
behavior is undefined.
In other words, strcpy() copies bytes from one place to another until
it encounters, and copies, a null byte. Since ctime() returns a null
terminated string, all is well as long as that string fits into your
target buffer.
strncpy() is a bit more complicated. Quoting the standard:
The strncpy() function shall copy not more than n bytes (bytes that
follow a null byte are not copied) from the array pointed to by s2
to the array pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined.
If the array pointed to by s2 is a string that is shorter than n
bytes, null bytes shall be appended to the copy in the array pointed
to by s1, until n bytes in all are written.
This means that strcpy() always writes exactly n bytes. If the source
string has a length greater than or equal to n, no null terminator
will be written. If the source is shorter, the destination buffer is
padded with null bytes to the specified length.
In your example
> Again, many thanks for helping me understand. I appreciate the time
> you put into this :-)
> ps is it ok/normal to cast size_t into int? Because i don't what
> other conversion modifier there is to use for size_t
No, that's not OK. Use %zd.
--
Måns Rullgård
mru@inprovide.com
| |
| Måns Rullgård 2006-12-17, 1:17 pm |
| Pascal Bourguignon <pjb@informatimago.com> writes:
> atv <alef@xs4all.nl> writes:
>
>
> It's possible, but you may have problems when size_t is unsigned int
> and contains a value bigger than MAX_INT. Then you get as an int a
> negative value! It would be better to declare variables of type
> size_t to manipulate data of type size_t.
Many systems have 32-bit int and 64-bit size_t, so there are more ways
it can go wrong.
--
Måns Rullgård
mru@inprovide.com
| |
|
| On 2006-12-17 17:27:14 +0100, Måns Rullgård <mru@inprovide.com> said:
> atv <alef@xs4all.nl> writes:
>
> You still have the notion that the C language has a notion of
> strings. Any mention of a string in a C context means a
> null-terminated sequence of bytes.
I don't have that notion. And does :
> Any mention of a string in a C context means a
> null-terminated sequence of bytes.
mean that "string" is not a string (since i did not specify a null char) ?
>
> The standard definition of strcpy() is this:
>
> The strcpy() function shall copy the string pointed to by s2
> (including the terminating null byte) into the array pointed to by
> s1. If copying takes place between objects that overlap, the
> behavior is undefined.
Well yes, but what if there _is_ no null byte (as in example, i do a
strcpy(buffer,"string");
I did not specify a null byte here.
>
> In other words, strcpy() copies bytes from one place to another until
> it encounters, and copies, a null byte. Since ctime() returns a null
> terminated string, all is well as long as that string fits into your
> target buffer.
>
> strncpy() is a bit more complicated. Quoting the standard:
>
>
> No, that's not OK. Use %zd.
Ok. What does the %z stand for? unsigned ?
Also, to propose another idea; if i absolutely knew that i could
guarantee that the string i copied with strncpy would never be equal or
bigger then the size i specify in the 3d argument (so that the
remaining space would always be padded with '\0', could i use such a
construct?
| |
| Pascal Bourguignon 2006-12-17, 1:17 pm |
| atv <alef@xs4all.nl> writes:
> On 2006-12-17 17:27:14 +0100, Måns Rullgård <mru@inprovide.com> said:
>
>
> I don't have that notion. And does :
> mean that "string" is not a string (since i did not specify a null char) ?
You DID specify a null char.
"string" means char literal[]={115,116,114,105,110,103,0};
/* # */
>
> Well yes, but what if there _is_ no null byte (as in example, i do a
> strcpy(buffer,"string");
Yes there is.
> I did not specify a null byte here.
Yes you did, by writting "...".
> Also, to propose another idea; if i absolutely knew that i could
> guarantee that the string i copied with strncpy would never be equal
> or bigger then the size i specify in the 3d argument (so that the
> remaining space would always be padded with '\0', could i use such a
> construct?
Yes. But why not just use strcpy?
Actually, I'd advise you if you have anything serrious to do with
strings, to use a true string library over what C has to offer.
See for example (the source code of!):
http://www.zork.org/safestr/
http://bstring.sourceforge.net/
Cords in BoehmGC:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
http://www.hpl.hp.com/personal/Hans...ource/cordh.txt
or vstrings in Postfix sources.
http://darcs.informatimago.com/darc...aces/BcString.h
http://darcs.informatimago.com/darc...rces/BcString.c
--
__Pascal Bourguignon__ http://www.informatimago.com/
Nobody can fix the economy. Nobody can be trusted with their finger
on the button. Nobody's perfect. VOTE FOR NOBODY.
| |
| Barry Margolin 2006-12-17, 1:17 pm |
| In article <45857351$0$321$e4fe514c@news.xs4all.nl>,
atv <alef@xs4all.nl> wrote:
> Well yes, but what if there _is_ no null byte (as in example, i do a
> strcpy(buffer,"string");
>
> I did not specify a null byte here.
You seem to be forgetting that C automatically null-terminates literal
strings.
Get out your C textbook, you need to brush up on some basic features of
the language.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
|
| On 2006-12-17 18:32:14 +0100, Barry Margolin <barmar@alum.mit.edu> said:
> In article <45857351$0$321$e4fe514c@news.xs4all.nl>,
> atv <alef@xs4all.nl> wrote:
>
>
> You seem to be forgetting that C automatically null-terminates literal strings.
>
> Get out your C textbook, you need to brush up on some basic features of
> the language.
Hmm. Indeed :-). I feel stupid now :-P
I did not know that. So everytime i pass a string like "string" to a
function it is auto null-terminated?
But if that's the case, i can't quite grasp the fact that there would
be so much problems with programmers forgetting to null terminate a
string. Or is that just me ?
Thanks Pascal/Barry.
| |
| Gordon Burditt 2006-12-17, 7:32 pm |
| >But if that's the case, i can't quite grasp the fact that there would
>be so much problems with programmers forgetting to null terminate a
>string. Or is that just me ?
If you call read() or fread() the buffer is not guaranteed to be
null-terminated (and it usually won't be, unless you've placed a
null just beyond the end of the buffer). If you copy something
with strncpy() to fit into a limited-size destination field and
prevent buffer overflow, the result is not guaranteed to be
null-terminated if the input could have been at or over the max
length. If you take the address of a character to make a one-character
string, it probably won't be null-terminated and will probably
consist of more than one character.
I don't see that there are "so much problems with programmers
forgetting to null terminate a string" unless you're going to try
to turn every buffer overflow (which IS a big problem) involving
strings into a null-termination problem. Many times there *is* a
string termination, it's just way beyond the end of the buffer (and
it may get clobbered later). Most buffer overflows are caused by
failing to include any logic to limit the size of the input (e.g.
using the evil gets()) and just assuming nobody will do anything
stupid or evil. That's the root cause. A quarter-assed fix will
limit the size of the input (e.g. copy it with strncpy()) but forget
the null termination.
|
|
|
|
|