The forindex utilities
Version 0.1

Guido Milanese

guido.milanese@unicatt.it

Abstract:

Making a good index is a very important part in the process of writing a document, particularly books and manuals. Entering data manually can be a very long process; although a certain amount of data must be entered manually, some tasks can be performed automatically, e.g. an index of geographical names or other trivial tasks. This task can be achieved using the program doindex, that prepares a file to be processed by makeindex. Another useful feature is to remove all the \index entries in a L^ATEX file, obtaining a clean file with no indexing (program cleanindex). The programs are written in Snobol4; the only requirement is to install the interpreter. For Windows, a standalone file compiled with Spitbol is also provided, and the programs can be run without the need of an external interpreter.

1 The programs

1.1 The program doindex

The programs doindex reads a L^ATEX file, using a list file, and enters index entries in the file according to this list. Previously entered index entries are left unchanged, making it possible to add further indexing to an already indexed file.

The input L^ATEX file may have extension tex or latex, both uppercase and lowercase (not mixed as in Tex).

The list file is meant to contain all the words to be indexed. It must have exactly the extension wls. Sub-entries are identified with the separator character '/'. See test.wls as example:

  animals
  dogs/animals
  cats
  house/nouns/english/languages
  sleeping@sleep
  évita/Italian/foreign words
  ça/French/foreign words
  drücken/German/foreign words

No particular order in this file is required. Some users will prefer alphabetical order, others different orders, so the programs has no requirements concerning order/sort in this file. Entries as sleeping@sleep use the standard makeindex syntax and are left unchanged. The ``logic'' of this syntax is opposite to the internal logic of makeindex, that is - I think - very clever at the stage of typesetting an index, but not at the stage of designing an index. ``A dog is an animal'' (dog/animal in my syntax) seems to me to be more natural than ``Among animals there are dogs'' (animals!dogs in the makeindex syntax). The L^ATEX file produced by doindex follows, of course, the makeindex conventions.

The original L^ATEX file is left unchanged. A new file will be written, identified by -ind. For example, from file.tex you will get file-ind.tex. Of course, you'll have to run makeindex as usual.

The purpose of the program is similar to what is provided by the program ixgen (http://www.iit.upco.es/~oscar/ixgen/) written by OSCAR LOPEZ (oscar@iit.upco.es), but forindex was designed to be a bit more flexible.

1.2 The program cleanindex

The program cleanindex removes \index sequences from a L^ATEX file. The program can be used e.g. if a user is not happy with the indexing of a file and wants to start it over again.

The input file may have extension tex or latex, both uppercase and lowercase.

The original file is left unchanged. A new file will be written, identified by -noind. For example, from file.tex you will get file-noind.tex. In this file, lines concerning makeindex will be left but commented, in order to avoid an empty Contents section in the output. You can uncomment the lines as soon as you want to reindex the file again.

2 Installation

2.1 GNU/Linux and other *nix systems

Install snobol4 from http://www.snobol4.org. This is Philip Budne's CSNOBOL implementation. You need a c compiler to compile the interpreter; it's normally a very quick and easy process.
Make sure snobol4 is in your PATH or make a symbolic link.
Copy all the files from the source directory in a suitable directory (you do not need the bat files, provided for Windows, and can safely remove them).
Make executables the scripts (doindex and cleanindex with no extensions), e.g. chmod +x doindex
Run the scripts as follows:
1. - to index a text: ./doindex file.tex
2. If you want to exclude words with accents: ./doindex file.tex -noacc
3. - to remove \index sequences: ./cleanindex file.tex
4. If the current directory is in your PATH, you do not need ./ before the script name.

2.2 Windows

The package offers exe files compiled with Spitbol (see (http://www.snobol4.com). Make a directory and copy all the file in the bin/windows directory. There must be two *.exe files and the two test.* files.

Run the programs as follows:

- to index a text: doindex file.tex

If you want to exclude words with accents: doindex file.tex -noacc. Accents must be encoded using the latin1 encoding (see the list of Todo).

- to remove \index sequences: cleanindex file.tex

2.3 Windows from source

Basically, follow the same directions given about GNU/Linux, but make sure to use the bat files and to install the Windows version of the interpreter. Before using the sources, that are in Unix format, use a script to translate from Unix to Dos-Windows format. If you do not have such a script, open the files with a text editor and save the sources in Windows-Dos format. This can be done reading and saving each file with the DOS edit program, with vim or any other editor able to deal with different file formats. Do not alter the files if you are not sure of what you are doing. Please (1) do not use a word processor (as Word or similar) but a simple text editor and (2) make sure to leave the encoding of file acc.inc to ISO-8859-1 or 8859-15, not to plain DOS or Unicode.

2.4 Cygwin

I suggest to follow the same directions given for GNU/Linux, but the EXE files provided for native Windows can be used anyway if preferred.

2.5 Macintosh

Not yet tested (I do not have a Mac right now). It's in the TODO list.

3 Test files

Please test the program on test.tex and test.wls. The produced file will be called test-ind.tex if you use doindex, test-noind.tex if you use cleanindex.

4 Bugs and TODO

The program does not support Unicode files. At this moment, most L^ATEX users are still using latin1, but the situation is rapidly changing.

List of features that I would like to add:

Index also included files.
Add typographical styles, such as italics for the most important locations of a word.
Add support for several indexes (particularly with class memoir)
Add an option to generate a rough index for all the words.
Add a support to index words listed with regular expressions. E.g. read* should index read, reads, reading, readings, all under the same heading read.
Make possible to use another separator for the list file, e.g. a simple blank or other char preferred by user.
Test the programs on a Mac.
Add Unicode support.

5 Acknowledgements

The program ixgen gave me the idea of forindex. Many thanks to OSCAR LOPEZ for this very good program.

Some questions sent by CARLO PELLEGRINO (Modena University, Italy) gave me the idea of transforming a very rudimentary script into a general purpose utility. MAURIZIO LORETI (Padua University, Italy) sent me very useful remarks on the problems of automatical generations of indexes, which I made use of in the introduction to this text.

My warmest thanks to PHIL BUDNE (phil@ultimate.com) for making his excellent CSNOBOL available. Many thanks to the community of Snobol users, particularly to the members of the list snobol4@mercury.dsu.edu, and, among them, to GORDON PETERSON (http://personal.terabites.com/), MICHAEL RADOW (mikeradow@yahoo.com), GREGORY L. WHITE (glwhite@netconnect.com.au) and to RAFAL M. SULEJMAN (rafal@engelsinfo.de) whose vim syntax files are a daily blessing.

Thanks to Jim Hefferon <ftpmaint@alan.smcvt.edu> who pointed out that the original name of the package, 4index, was not acceptable due to XML syntax rules.

6 Author, copyright, license, disclaimer

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

If you do not have a copy of the GNU General Public License write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

If the author of this software was too lazy to include the full GPL text along
with the code, you can find it at: http://www.gnu.org/copyleft/gpl.html.

About this document ...

The forindex utilities
Version 0.1

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir3312NEVskb/lyx_tmpbuf0/forindex.tex

The translation was initiated by guido on 2005-01-29

Guido Milanese, 2005-01-29

The forindex utilities Version 0.1

Abstract:

The forindex utilities
Version 0.1