| kayjay 2004-05-19, 8:37 pm |
|
Hi Ashish,
Why don't you have a look at the search that the SQL
server 2000 has to offer. You can easily store the PDF
files and MS Office files as a blob in an image or ntext
datatype and add another column to specify the file type
extension.
SQL Server 2000 has excellent search facility and it
supports PDF files. You can download the PDF filter
(free) from the Adobe website :
http://www.adobe.com/support/salesdocs/1043a.htm
Look at MSDN for more help on storing data as a blob. I
think this would satisfy all your requirements for the
problem you stated.
Good Luck !
-kayjay
>-----Original Message-----
>Hello
>
>we have following questions andreson behind this
question? Please advise how dowe handle these issues?
>
>1. We would like to search in pdf/MS word doc, so what
options we have for searching PDF documents?
>Actually, when we present the search results to the
users, we do not want to say we found what you are
looking for on page 45 of this document. Then they have
to click to download the PDF and manually navigate to
page 45.
>
>We essentially want to pull out the textual components
of the source MS Word and/or PDF documents and make them
searchable on their own.
>
>2. It is expected that the documents will need to be
converted to another format for searching across
documents - perhaps XML or XLST.
>
>Actually the source document is an MS Word doc.
Basically, each page of the document should be treated
as its own entity. Sure, it has a connection to the
document as a whole but it needs to be addressed /
searchable as its own body. MS Word [2003] supports
saving to XML. From our basic, initial research it
appears that converting a document to XML and then
writing an import engine to process this XML is the best
way to "pull-out" the textual / graphical images. Each
page will have a header, body text, and 1 or more images.
>
>3. We expect to purchase either a MS SQL database for
planning to store the documents in the database for push
out the searching and persistent storage logic to the
database. It would be ideal to utilize the power of the
SQL backend to do the searching (perform SELECT queries
using full-text and/or " .. LIKE .. " searching). This
would have the benefit of performing cross-referencing
to other tables in the same database.
>
>
>
>
>---
>Outgoing mail is certified Virus Free.
>Checked by AVG anti-virus system
(http://www.grisoft.com).
>Version: 6.0.676 / Virus Database: 438 - Release Date:
05/03/2004
|