|
Home > Archive > Unix Programming > August 2006 > get all case-sensitive variations of a string
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
get all case-sensitive variations of a string
|
|
| robert.paschedag@web.de 2006-08-18, 7:32 am |
| Hi,
I want to write a function that, when given a string (foobar), returns
an array of strings with all variations of the given one.
Example:
char **my_array = getVariationsOfString("foobar");
my_array[0] = "foobar"
my_array[1] = "foobaR"
my_array[2] = "foobAr"
my_array[3] = "foobAR"
my_array[4] = "fooBar"
my_array[5] = "fooBaR"
my_array[6] = "fooBAr"
my_array[7] = "fooBAR"
my_array[8] = "foObar"
my_array[9] = "foObaR"
....
....
Can anybody give me a hint how to do this?
Thanks.
Robert
| |
|
| robert.paschedag@web.de wrote:
> Hi,
>
> I want to write a function that, when given a string (foobar), returns
> an array of strings with all variations of the given one.
>
> Example:
>
> char **my_array = getVariationsOfString("foobar");
>
> my_array[0] = "foobar"
> my_array[1] = "foobaR"
> my_array[2] = "foobAr"
> my_array[3] = "foobAR"
> my_array[4] = "fooBar"
> my_array[5] = "fooBaR"
> my_array[6] = "fooBAr"
> my_array[7] = "fooBAR"
> my_array[8] = "foObar"
> my_array[9] = "foObaR"
> ...
> ...
>
> Can anybody give me a hint how to do this?
From your example, a string with N chars will have 2^N variations. You
can run into huge array sizes for simple string lengths.
This is just a guess ... if you are trying to compare a string which can
come in these "variations", why not convert it to lower case or upper
case and compare with a similar case string ?
regards
srp
| |
| robert.paschedag@web.de 2006-08-18, 7:32 am |
| saju schrieb:
> robert.paschedag@web.de wrote:
>
> From your example, a string with N chars will have 2^N variations. You
> can run into huge array sizes for simple string lengths.
>
> This is just a guess ... if you are trying to compare a string which can
> come in these "variations", why not convert it to lower case or upper
> case and compare with a similar case string ?
>
>
> regards
> srp
Thanks for your reply.
I know, this might get very huge.
I have an "indexer" that indexes our complete server data. I give this
indexer a file containing all file extensions I don't want to be
indexed (an exclude list). If this indexer would run on Windows, I
wouldn't have a problem (execpt that it runs on Windows :-P).
But on Linux...giving "*.EXE or *.ExE" won't exclude *.eXe for example.
What I don't want to do is to write the list by hand (*.exe, *.eXe,
*.EXE, ...) I want to parse the file, one line for each extension and
then rebuild this list with all variations to pass to the "indexer"
regars
robert
| |
| Rainer Temme 2006-08-18, 7:32 am |
| robert.paschedag@web.de wrote:
> But on Linux...giving "*.EXE or *.ExE" won't exclude *.eXe for example.
>
> What I don't want to do is to write the list by hand (*.exe, *.eXe,
> *.EXE, ...) I want to parse the file, one line for each extension and
> then rebuild this list with all variations to pass to the "indexer"
Robert,
but since the function your asking for shall produce ALL
variants of lowercase and uppercase letters, why not do the
comparison if the ending matches caseinsensitive?
if you use strcasecmp() to compare the
"EXE" "eXe" "EXe" and so on will compare as equal to "exe".
Rainer
| |
|
| robert.paschedag@web.de wrote:
> saju schrieb:
>
>
> Thanks for your reply.
>
> I know, this might get very huge.
>
> I have an "indexer" that indexes our complete server data. I give this
> indexer a file containing all file extensions I don't want to be
> indexed (an exclude list). If this indexer would run on Windows, I
> wouldn't have a problem (execpt that it runs on Windows :-P).
Ok, I am assuming you cannot modify the indexer to be do case
insensitive comparisons.
>
> But on Linux...giving "*.EXE or *.ExE" won't exclude *.eXe for example.
>
> What I don't want to do is to write the list by hand (*.exe, *.eXe,
> *.EXE, ...) I want to parse the file, one line for each extension and
> then rebuild this list with all variations to pass to the "indexer"
Off the top of my head :
1) Build a binary heap - implemented as an array. For eg : To find
variations of the word "in", your array will look like "iInNnN". For the
word "word", the heap will look like "wWoOoOrRrRrRrRdDdDdDdDdDdDdDdD"
Perform an preorder traversal. Collect the nodes (chars) into a string.
Everytime you hit a leaf node you will have a complete variation.
regards
srp
--
I will never get off this planet
- Luke Skywalker
| |
| robert.paschedag@web.de 2006-08-18, 7:32 am |
| Rainer Temme schrieb:
> robert.paschedag@web.de wrote:
>
> Robert,
>
> but since the function your asking for shall produce ALL
> variants of lowercase and uppercase letters, why not do the
> comparison if the ending matches caseinsensitive?
>
> if you use strcasecmp() to compare the
> "EXE" "eXe" "EXe" and so on will compare as equal to "exe".
>
> Rainer
Hi Rainer,
because the "indexer" application uses a "closed" - third-party C/C++
API, that I cannot modify :-(
I can just tell the "indexer" function to use *this* null-delimited
string with the extensions I want to exclude and then "Do it!". And
because the API does not yet do a case-insensitive match, I have to
provide a "complete" list.
OK. I could do this by hand and write the extension to the file, but I
don't *really* want to.
I already told the "third-party techs" to put this to their "whish
list" for the next release, but for now, I have to do it on my own.
And also, I just wanted to know how to do this by hand, because I am
not yet a "pro" in C/C++ programming. I am still just a "beginner" or a
bit more and want to learn :-)
regards,
robert
| |
|
| robert.paschedag@web.de writes:
> Hi,
>
> I want to write a function that, when given a string (foobar), returns
> an array of strings with all variations of the given one.
>
> Example:
>
> char **my_array = getVariationsOfString("foobar");
>
> my_array[0] = "foobar"
> my_array[1] = "foobaR"
> my_array[2] = "foobAr"
> my_array[3] = "foobAR"
> my_array[4] = "fooBar"
> my_array[5] = "fooBaR"
> my_array[6] = "fooBAr"
> my_array[7] = "fooBAR"
> my_array[8] = "foObar"
> my_array[9] = "foObaR"
> ...
> ...
>
> Can anybody give me a hint how to do this?
>
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#define BITS_PER_LONG 32
static char **getVariantsOfString (const char *s)
{
size_t len, elems, i, j;
char **ret, **r, *p;
len = strlen (s);
if (len > BITS_PER_LONG) return NULL;
elems = 1 << len;
ret = r = calloc (elems + 1, sizeof (char *));
if (!ret) return NULL;
p = calloc (elems * (len + 1) + 1, 1);
if (!p) {
free (ret);
return NULL;
}
for (i = 0; i < elems; ++i) {
*r++ = p;
for (j = 0; j < len; ++j)
*p++ = (i & (1 << j)) ? toupper (s[j]) : tolower (s[j]);
*p++ = 0;
}
return ret;
}
#ifdef TEST
#include <stdio.h>
int main (int argc, char **argv)
{
char **s;
for (s = getVariantsOfString (argv[1]); *s; ++s)
puts (*s);
return 0;
}
#endif
--
mailto:malc@pulsesoft.com
| |
| robert.paschedag@web.de 2006-08-18, 1:33 pm |
| malc schrieb:
> robert.paschedag@web.de writes:
>
>
> #include <stdio.h>
> #include <ctype.h>
> #include <stdlib.h>
> #include <string.h>
>
> #define BITS_PER_LONG 32
>
> static char **getVariantsOfString (const char *s)
> {
> size_t len, elems, i, j;
> char **ret, **r, *p;
>
> len = strlen (s);
> if (len > BITS_PER_LONG) return NULL;
>
> elems = 1 << len;
> ret = r = calloc (elems + 1, sizeof (char *));
> if (!ret) return NULL;
>
> p = calloc (elems * (len + 1) + 1, 1);
> if (!p) {
> free (ret);
> return NULL;
> }
>
> for (i = 0; i < elems; ++i) {
> *r++ = p;
> for (j = 0; j < len; ++j)
> *p++ = (i & (1 << j)) ? toupper (s[j]) : tolower (s[j]);
> *p++ = 0;
> }
> return ret;
> }
>
> #ifdef TEST
> #include <stdio.h>
> int main (int argc, char **argv)
> {
> char **s;
>
> for (s = getVariantsOfString (argv[1]); *s; ++s)
> puts (*s);
> return 0;
> }
> #endif
>
> --
> mailto:malc@pulsesoft.com
Wow!
Thank you very much.
I'll need some time to check how this exactly works..
Again...thanks.
regards,
robert
| |
|
| saju wrote:
_snip_
_snip_
[vbcol=seagreen]
>
> Off the top of my head :
> 1) Build a binary heap - implemented as an array. For eg : To find
> variations of the word "in", your array will look like "iInNnN". For the
> word "word", the heap will look like "wWoOoOrRrRrRrRdDdDdDdDdDdDdDdD"
>
> Perform an preorder traversal. Collect the nodes (chars) into a string.
> Everytime you hit a leaf node you will have a complete variation.
Ok, here is an implementation of the above described method. Note that
you don't actually have to build the binary heap because the value of
the n'th node can be deduced with a simple calculation.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
char *str, *w;
int l;
void variation(int i, char c)
{
char d;
str[i] = c;
if (i >= l) {
printf("%s\n", str+1);
return;
}
d = w[((2*i) >> 1) + 1];
variation(i+1, tolower(d));
variation(i+1, toupper(d));
}
int main(int argc, char **argv)
{
l = strlen(argv[1]);
w = malloc(l + 2);
strcpy(w+1, argv[1]);
str = malloc(l + 2);
str[l + 1] = '\0';
variation(0, w[0]);
}
regards
srp
| |
| Pascal Bourguignon 2006-08-18, 1:33 pm |
| robert.paschedag@web.de writes:
> Rainer Temme schrieb:
>
>
> Hi Rainer,
>
> because the "indexer" application uses a "closed" - third-party C/C++
> API, that I cannot modify :-(
Indeed. Have you heard of free software?
But in any case, if this application actually use a regexp, as * in
*.exe leads me think--perhaps it's just a limited wildcard--but if it
accepts [aA], you can give it: *.[eE][xX][eE]
I use this emacs function command:
(defun pjb-case-insensitive-regexp (start end)
"
DO: Replace the selection with a case insensitive regexp,
ie. all letter characters are replaced by [Xx] matching
both lower case and upper case.
"
(interactive "r")
(do ((text (buffer-substring-no-properties start end))
(replacement (make-string (* 4 (- end start)) (CHARACTER " ")))
(rlen 0) ;; no fill pointer in emacs lisp...
(i 0 (1+ i)))
((>= i (length text))
(progn
(delete-region start end)
(insert (subseq replacement 0 rlen))))
(if (/= (upcase (aref text i)) (downcase (aref text i)))
(progn
(setf (aref replacement rlen) (aref "[" 0))
(incf rlen)
(setf (aref replacement rlen) (upcase (aref text i)))
(incf rlen)
(setf (aref replacement rlen) (downcase (aref text i)))
(incf rlen)
(setf (aref replacement rlen) (aref "]" 0))
(incf rlen))
(progn
(setf (aref replacement rlen) (aref text i))
(incf rlen)))))
to convert regexp to case-insensitive regexps; I type:
*.exe
then select it, and M-x pjb-case-insensitive-regexp RET and get:
*.[Ee][Xx][Ee]
Another, easy solution, would be to homogeneize all the file names in
the file system. I've got a clean-path script that remove all space
or other special character in path names, and put them in lowcase.
--
__Pascal Bourguignon__ http://www.informatimago.com/
"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"
| |
| James Antill 2006-08-18, 1:33 pm |
| On Fri, 18 Aug 2006 06:42:44 -0700, robert.paschedag wrote:
> Wow!
>
> Thank you very much.
>
> I'll need some time to check how this exactly works..
The basic principle is that you have a bit for each character, you
then treat the set of bits as a number and count from 0 to max. Ie. a 3
character word like "exe", has:
000 = exe
001 = exE
010 = eXe
011 = eXE
100 = Exe
[...]
--
James Antill -- james@and.org
http://www.and.org/and-httpd
| |
| John W. Krahn 2006-08-18, 1:33 pm |
| robert.paschedag@web.de wrote:
>
> I want to write a function that, when given a string (foobar), returns
> an array of strings with all variations of the given one.
>
> Example:
>
> char **my_array = getVariationsOfString("foobar");
>
> my_array[0] = "foobar"
> my_array[1] = "foobaR"
> my_array[2] = "foobAr"
> my_array[3] = "foobAR"
> my_array[4] = "fooBar"
> my_array[5] = "fooBaR"
> my_array[6] = "fooBAr"
> my_array[7] = "fooBAR"
> my_array[8] = "foObar"
> my_array[9] = "foObaR"
> ...
> ...
>
> Can anybody give me a hint how to do this?
$ PERL -le'($word = shift) =~ s/(.)/{\L$1,\U$1}/g; print for glob $word' foo
foo
foO
fOo
fOO
Foo
FoO
FOo
FOO
John
--
use Perl;
program
fulfillment
| |
| jmcgill 2006-08-18, 7:22 pm |
| John W. Krahn wrote:
> $ PERL -le'($word = shift) =~ s/(.)/{\L$1,\U$1}/g; print for glob $word' foo
I didn't know the curly bracket block on the right hand side of a
substitution returned a list. Or is something else going on?
This syntax is weird enough to merit some explanation. I think you'd
never even know it was legal, just from the PERL docs.
| |
| Bill Pursell 2006-08-18, 7:22 pm |
|
malc wrote:
>
> #define BITS_PER_LONG 32
Just a stylistic note--you could re-write that as:
#define BITS_PER_LONG (sizeof(long) * CHAR_BIT)
| |
| John W. Krahn 2006-08-18, 7:22 pm |
| jmcgill wrote:
> John W. Krahn wrote:
>
>
> I didn't know the curly bracket block on the right hand side of a
> substitution returned a list. Or is something else going on?
> This syntax is weird enough to merit some explanation. I think you'd
> never even know it was legal, just from the PERL docs.
The substitution converts 'foo' to '{f,F}{o,O}{o,O}' and glob(
'{f,F}{o,O}{o,O}' ) produces the list. TIMTOWTDI:
$ PERL -le'print for <@{[join q//, map "{\L$_,\U$_}", split //, shift]}>' foo
foo
foO
fOo
fOO
Foo
FoO
FOo
FOO
John
--
use Perl;
program
fulfillment
| |
| Logan Shaw 2006-08-19, 1:26 am |
| John W. Krahn wrote:
> jmcgill wrote:
>
> The substitution converts 'foo' to '{f,F}{o,O}{o,O}' and glob(
> '{f,F}{o,O}{o,O}' ) produces the list. TIMTOWTDI:
>
> $ PERL -le'print for <@{[join q//, map "{\L$_,\U$_}", split //, shift]}>' foo
> foo
> foO
> fOo
> fOO
> Foo
> FoO
> FOo
> FOO
Indeed, there is more than one way to do it:
#! /usr/bin/perl
use Quantum::Superpositions;
sub superpose_case ($)
{
my $str = shift;
return '' if length $str == 0;
my $char = substr ($str, 0, 1, '');
return any (lc $char, uc $char) . superpose_case ($str);
}
print map { "$_\n" } eigenstates superpose_case ($ARGV[0]);
This has the following output:
$ ./casecombos.pl foo
Foo
FoO
fOo
foo
foO
fOO
FOO
FOo
$
It also has the nice property that it automatically doesn't create
duplicates if there are characters which don't have case variations.
For example:
$ ./casecombos.pl ab01
Ab01
ab01
AB01
aB01
$
Unfortunately(?), Quantum::Superpositions is not part of the core
Perl distribution...
- Logan
| |
| William James 2006-08-19, 1:26 am |
| robert.paschedag@web.de wrote:
> Hi,
>
> I want to write a function that, when given a string (foobar), returns
> an array of strings with all variations of the given one.
>
> Example:
>
> char **my_array = getVariationsOfString("foobar");
>
> my_array[0] = "foobar"
> my_array[1] = "foobaR"
> my_array[2] = "foobAr"
> my_array[3] = "foobAR"
> my_array[4] = "fooBar"
> my_array[5] = "fooBaR"
> my_array[6] = "fooBAr"
> my_array[7] = "fooBAR"
> my_array[8] = "foObar"
> my_array[9] = "foObaR"
> ...
> ...
ruby -e 'puts
$*.shift.split("").map{|s|[s,s.upcase]}.inject([[]]){|old,a|
a.inject([]){|new,e| new+old.map{|c| c.dup<<e}}}.map{|a| a.join}' abc
abc
Abc
aBc
ABc
abC
AbC
aBC
ABC
|
|
|
|
|