About site: Software/Document Management - How to Make a Documents Database
Return to Computers also Computers
  About site: http://users.belgacom.net/bruno.champagne/db.html

Title: Software/Document Management - How to Make a Documents Database A technical recipe on writing a document repository and search engine software using tools based on Linux.
Laptop_Outlet Source for used and refurbished laptops.

Swikis_on_coweb_cc List of over 40 Swikis at Georgia Tech, all run on Squeak Comanche Web server.

Cd-dvd-supplies_com Supplies needed to produce and distribute CD and DVD disks, CD-R or CD-RW, DVD-R or DVD-RAM media.

Tridion R5 is an XML content management platform enabling multinational companies to keep content consistent, relevant and up to date.

FunctX_XQuery_Functions__Hundreds_of_useful_examples A set of XQuery example functions to perform useful tasks like substrings, modifying elements and manipulating namespaces.

Ajax_Tutorial Provides a tutorial to start with the Ajax technic.


  Alexa statistic for http://users.belgacom.net/bruno.champagne/db.html





Get your Google PageRank






Please visit: http://users.belgacom.net/bruno.champagne/db.html


  Related sites for http://users.belgacom.net/bruno.champagne/db.html
    Patrick,_Strater Resume. Web design, flash examples. Links.
    Digital_Equity Focuses on the design, development, and implementation of intranet and extranet systems. Based in Irving, Texas.
    X-Micro_Technology Manufacturer in design and distribution of professional 3D graphics cards worldwide.
    RFC_2934 Protocol Independent Multicast MIB for IPv4. K. McCloghrie, D. Farinacci, D. Thaler, B. Fenner. October 2000.
    Mess_Music Offers Drum Fix, a tool designed to rescue poorly recorded live drum tracks.
    Cole_Enterprises Specializing in construction accounting, estimating, job cost, and project management software.
    Ulex A lexical analyzer generator for Unicon.
    How_to_Make_an_A4_Poster_Using_the_Gimp The tutorial is composed of five lessons each broken up into particular areas of gimp expertise. The tutorials are designed for newcomers to the Gimp, and to Linux. By Kester Clegg.
    Winstation_Systems_Corporation Manufacturer of commercial solid state storage devices.
    TaBazar_Tablature_Editor Multitrack notation program for fretted instruments and percussion. View, edit, print, and play back scores.
    Simply_Neat_Software Video Poker, and SMTP Server (C# and VB.NET).
    Ullix,_Inc_ Web solutions for employee performance management and 360 feedback surveys.
    Free-To-Try_com More than 14000 manually accepted shareware and freeware downloads arranged in 150 categories, daily updated content, accepting PAD submissions from software authors.
    DesktopTuners Provides wallpapers such as Lego, Playmobil and space, as well as sounds for mail and computer.
    Incompatibilities_Between_ISO_C_and_ISO_C++ Thorough listing of incompatibilities between ISO C 99 and ISO C++ 98. An incompatible C feature in this context is valid as C code but not as C++ code.
    Websites_Services Provides web design, content and hosting facilities.
    Commercial_Webs,_Inc_ Offers design, hosting, eCommerce, and maintenance services.
    In_Dust_Real_Plus Collection of smiley graphics for use in chatting online.
    Alpaca_Cam Provides a view of the outdoor feeding station in the alpaca's field.
    Avantec_Managed_Care Offers managed care software to aid in processing medical claims, authorizations, capitation and provider payments.
This is websites2007.org cache of m/ as retrieved on 2008.10.12 websites2007.org's cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time.
Make a documents database

Linux Index

Introduction Emacs intro Installation after-cares Redhat bugs fixes file systems X Windows Kernel Config tools RPM intro Internet connection Email configuration Tcl programming Best-of softwares Linux Resources M$ Windows Documents database C memento Auto shutdown

Make a documents database

Introduction

Many people classify their files in a directory structure. Suppose you have many documents and you want to be able to find them back easily and quickly. You need a search engine able to find a document based on its keywords, title, author name, kind of document, ... The purpose of this section is to explain, step for step, a very appropriate solution to find any document stored somewhere in a directory structure very fast. Give your search criterion in a web form and you get a list of matching documents. Click on one of the documents in the list and it's open ! This solution is multi-platform, free and flexible. It's just an assembly of renowned software with small scripts around them to clue the whole thing. The basic ingredients of this receipt are : Tcl/Tk scripting language Mysql database Apache web server PHP scripting language Your favorite web-browser If many users need to access your database, you have to install the software only once on a central computer. Each user only need a web browser to access the database from anywhere on the network. To prevent any abusive use, the user need to give a password before he can access the documents.

The story in an example

Suppose you have a directory 'Documents' containing the following files : Documents phones.txt Project1 hf38_specs.pdf kg76_specs.pdf report.doc Project2 hf_meas.xls letter.doc For each file you want to see in your database, you will add a description file with the extension '.nfo' as follow : Documents phones.nfo phones.txt Project1.nfo Project1 hf38_specs.nfo hf38_specs.pdf kg76_specs.nfo kg76_specs.pdf report.nfo report.doc Project2.nfo Project2 hf_meas.nfo hf_meas.xls letter.nfo letter.doc The description files contain informations about the corresponding document. Only the first four letters of the field name are significant. It may be : titl : the title of the document keyw : keywords auth : the name of the author(s) crea : the creation date proj : the name of the project type : the kind of document (for example : calculation, measurement, document, script ...) refe : a reference or a document number All fields are optional but you should at least fill the field 'titl'. In the previous example, we could have : phone.nfo title : phone book keywords : phone author : Bruno Champagne type : document project : none Project2.nfo title : Project no 2 project : project2 hf_meas.nfo title : HF measurements keywords : HF jitter type : measurements author : Antonio Soubri letter.nfo title : research contract keywords : KGF contract type : document author : Mona Moors In place of 'author', you may just write 'auth'. Don't forget to create the '.nfo' file for directories (even an empty file) otherwise sub-directories won't be scanned. A Tcl script will scan the directories and try to find any '.nfo' file and the corresponding document. The script will fill the database (here an SQL database). In the table below, you see what the database entries will look like (only a few columns are shown) : titlekeywordstypeproject phone bookphonedocumentnone Project no 2--project2 HF measurementsHF jittermeasurementsproject2 research contractKGF contractdocumentproject2 ............ This table also contains the name of the author, a reference and the location of the file. Actually, it is slightly more complicated. We suppose we have a limited number of types and projects. So, the real database contains 3 tables : a documents table is the same than described above expected that the columns 'type' and 'project' are filled with an index in place of a string. The index points to an entry in another table type or project the project table has two columns : the first is an index and the second is a string containing the name of the corresponding index the type table has two columns : the first is an index and the second is a string containing the name of the corresponding type But those are details you don't need to care about ... Fields values may inherit from a parent directory : type may inherit from a parent directory if not specified for the current file project may inherit from a parent directory if not specified for the current file the keywords of the parent directories are added to the keywords of the current file We can connect to the search engine with a simple web browser. If we configure Apache so that our files are 'a restricted stuff', the user is first prompted for his/her login and password. Then we get a search form such as the following : Search document project any project none project1 project2 title containing keywords reference document type any type document measurement dirname contains filename contains authors contains As you can see in the form above, the possible values for 'project' and 'type' are automatically filled from the scanned files. After you've typed your search criterion, you get a list of matching documents : Search results title/keywordsauthor(s)type/fileproject phone bookphoneBruno Champagnedocumentphone.txtnone research contractKGF contractMona Moorsdocumentletter.docproject2 The title of the document is also a hyper-link to the document. If your browser has the appropriate plug-ins, one click one the title is enough to open the document.

Apache/PHP setup

Download Apache at http://httpd.apache.org. Install it. Make a directory where you will put your documents. For example, make 'C:/html'. Download PHP at http://www.php.net/. Install it. In the configuration file of Apache, httpd.conf, change the setting 'DocumentRoot' to point to the directory containing your html files. For example, DocumentRoot "C:/html" You also need to specify the directories to be served and the corresponding 'alias' names. For example if you want to be able to access to the directories 'C:/documents_project1' and 'C:/personnal_docs', add the following lines in the file httpd.conf: alias "/project1" "C:/documents_project1" alias "/family" "C:/personnal_docs" In the same file, check the section included between '<Directory />' and '</Directory>'. It should look like this : <Directory /> Options FollowSymlinks AllowOverride None AuthName "restricted area" AuthType Basic AuthUserFile "c:/pass.txt" require valid-user </Directory> In the example above, we specify that any user has to identify himself before he can access to the documents. The file containing the user login/password is (in this example) named 'c:/pass.txt'. You also need to say Apache where to find the PHP interpretor. For example, if the interpretor is 'C:/PHP/PHP.EXE', then you need to check that the following lines are present in the file http.conf : ScriptAlias /php/ "c:/php/" AddType application/x-httpd-php .php Addtype application/x-httpd-php-source .phps Action application/x-httpd-php "/php/php.exe" Define a new user. Go into the bin directory of Apache in a console. (for Windows users, this directory should be 'C:/Program files/Apache Group/Apache/bin'). Type htpasswd -c c:/pass.txt username where you should replace 'username' by the name of the user you wish to add. To add a second user, type htpasswd c:/pass.txt username2 where you should replace 'username2' by the name of the user you wish to add. Start the Apache server.

Tcl + SQL library

Download Tcl/Tk 8.3 from http://dev.scriptics.com/software/tcltk/download83.html. Install it. Now you need a library to allow tcl to access Mysql. Download fbsql at http://www.fastbase.co.nz/fbsql/index.html. Windows users: install the dll file in the bin directory of tcl. Unix users: follow the instructions ...

Scripts installation

Download the following zip file scripts.zip and install its contents in the root directory of the Apache server. In our example, install the files in the directory 'C:/html'. You will find the following files : index.html : the first file Apache will open when you connect. It just contains a link to 'search.php'. But it's up to you to make a more attractive site ... search.php : php script containing the search form results.php : php scripts that shows the result of the search initdb.tcl : prepare Mysql for the documents database makeindex.tcl : scans the directories and fills the database. Every hour, it will update the database in background.

Mysql setup

If needed, download Mysql at http://www.mysql.org. Install it. Start the server : Under windows, if you don't see any traffic light on the corner of the screen, click on 'C:/mysql/bin/winmysqladmin' and start the server by clicking with right key of the mouse on the red light and selecting 'start server' Under Linux, log as 'root' and type : safe_mysqld &. Before you start using the database, you need to create the grant tables (which determines who can connect to the database). So type : mysql_install_db Now we need to prepare Mysql for our documents database. Execute the script 'initdb.tcl'. It tries to connect to the Mysql server. If needed, it prompts for a new password for the user 'root'. 'root' is the administrator user of Mysql database (don't confuse with the administrator of Unix machines!). It creates a new user 'db_user'. The password is 'db_pass'. This user can only connect locally. It creates a new database 'documents_db' and all the needed tables. One of them ('scan_dirs') contains a list of directory to be scanned. The 'initdb.tcl' script inserts one entry in this table : 'documents' (it's the default name of the directory where you will put your documents). Restart the Mysql server.

Changing the directories to be scanned

The script 'makeindex.tcl' mentioned above has to know where are the directories to be scanned. If you only want to scan the directory 'documents' (or more precisely, 'C:/html/documents'), you don't need to change anything (because it is the default setting). Suppose you want to scan the directories named 'C:/documents_project1' and 'C:/personnal_docs' (see Apache setup). Start Mysql in the console : (Windows users : start MS DOS, go into the directory C:/mysql/bin ; Unix user : no problem) mysql -udb_user -p When prompted for password enter 'db_pass'. Then, use documents_db; To see the list of the scanned directories, type select * from scan_dirs; You will only see the directories 'documents'. To suppress this first element of this list, type delete from scan_dirs where id=1; Now, you can insert the new entries. insert into scan_dirs values(null, 'C:/documents_project1','project1'); insert into scan_dirs values(null, 'C:/personnal_docs','family'); As you can see, for each directory you add, you also need to add its alias name (the same than specified in the setup of the Apache server). This needed to take care that the search results are linked to the right web address.

Try it !

At this stage, the database should be fully operational. First of all, you should create the directory where you want to place your documents. For example, create 'C:/documents_project1'. Place a few documents in this directory and make the corresponding '.nfo' files. You may also create sub-directories but don't forget to create a '.nfo' file for each sub-directory (even an empty '.nfo' file is OK). Start the script 'makeindex.tcl'. It will run in background and update silently the database every two hours. Start your favorite web-browser. As address, type 'http://127.0.0.1' if you are working without network or enter the address of your computer (or the one where the server is running) if you are working on network. Now you should see the prompt form for login and password. Enter the user name and password you have defined as described above. Click on the link 'search document'. You should see the form 'Search document'. Click on 'Submit' and you will see a list of all the indexed documents. Remark: the use of this database is not limited to documents. You really can use it for anything. For example, if you want to make a database of your friends, you can for example make a html file for each of them where you enter any information you want. You can even place a picture. Or you can just make a scan of their name-card. Make the corresponding '.nfo' file.

Auto-logon and auto-startup when using Windows computer

This section is only applicable for Windows users. Whereas it should be easy to do the same job on an Unix computer, I've not yet tried this. First of all, if you have NT computer, you can configure the Apache Web server and the MySQL database as 'services' so they are automatically started at startup. Secondly, to be able to access the network drives, you need to logon. To be sure the same login is used every time, the simplest solution is to use auto-logon. Click on the 'Start' button, 'Run...', then type 'regedit.exe'. Select the path 'HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Windows NT/CurrentVersion/Winlogon'. Define the following entries as string : AutoAdminLogon, value "1" DefaultUserName DefaultDomainName Defaultpassword Thirdly, to start the script 'makeindex.tcl' automatically after each login, go to 'HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/ Windows/CurrentVersion/Run' in the registry. Define a string entry, for example "shutdownscript" and give it as value the location of the makeindex script, for example "c:\html\makeindex.tcl". If you also want to shutdown automatically at a fixed time, refer to the chapter 'auto-shutdown'.
 

A

technical

recipe

on

writing

a

document

repository

and

search

engine

software

using

tools

based

on

Linux.

http://users.belgacom.net/bruno.champagne/db.html

How to Make a Documents Database 2008 October

dvd rental

dvd


A technical recipe on writing a document repository and search engine software using tools based on Linux.

Rules




© 2008 Internet Explorer 5+ or Netscape 6+

Recommended Sites: 1. Arts - Business - Computers - Games - Health - Home - Kids and Teens - News - Recreation - Reference - Regional - Science - Shopping - Society - Sports - World Miss Gallery - Top Anime Hentai - DVD rental by mail - Loans - Cheap Car Insurance - Loans - 2008 NFL Draft Information - Mortgages
2008-10-12 10:39:58

Copyright 2005, 2006 by Webmaster
Websites is cool :) 160Removals - Iva - Hotel Hannover - Implanty - Łóżka Do Masażu