07-17-04 01:50 AM
Peter Ammon wrote:
>
> I have a trie and a hash table, both in a binary format that can be used
> as-is without further conversions. The trie is about 1.1 MB, and the
> hash table is about 358k. They will be used for searching only; I will
> not need to modify them.
>
> Is it reasonable to use mmap() to access the files instead of read()ing
> them into a large buffer and accessing them there? How is using mmap()
> instead of read() likely to affect search speed, memory useage, and
> initialization time (when I first read in the files)?
>
> Thanks,
> -Peter
IMHO, `mmap()' will be faster initially since only the
pages that you _touch_ durning the search process(es)
will actually be loaded into memory by the kernel. With
`read()', you'll have to read the whole thing into memory 1st.
After everything is loaded, whether by `read()'ing
or `mmap()'ing/multiple searches, the search speed
should/will be the same.
One slight advantage (on some Unixes), a file that is
`mmap()'d read-only will not take swap space - the
file itself is used as backing store.
A while ago, I experimented with the speed difference
between `mmap()' and `read()'. My results showed
that `mmap()' was about 2x faster than `read()'.
I don't remember all of the particulars of the test,
however.
Personally, for read-only files, I prefer to `mmap()'
them. I think it's easier and cleaner (no worries
'bout `malloc()'/`free()', etc.) When I'm done with
the file, just un-map it.
HTH,
-
Stephen
[ Post a follow-up to this message ]
|