The Indexed File ENGine                    Version: 1.01
                                           Date:    09-16-91
+-----------------------------------+
| Alton Moore III                   |
| Rt. 21, Box 694-A                 |
| Edinburg, TX  78539-9805          |
| www.alton-moore.net               |
| alton@alton-moore.net             |
+-----------------------------------+


INTRODUCTION:
=============

Many programmers have a need for multi-keyed indexed files.  Sometimes we
have a large number of records which we need to look up quickly; sometimes
we just want to do a small internal sort conveniently.  Those programmers
writing business/accounting applications, especially, often wish for more
sophisticated file access schemes than are generally found on PCs.

Many software writers have come to depend on the file access abilities of
separate database products to implement their programming assigments; while
this may be appropriate for some, many programmers would prefer to not be
dependent upon ANY outside resources/programs when building/maintaining
their applications.  It is for these people that the Indexed File ENGine
was written.  The source is written entirely in ANSI C to be as portable
as is possible, and so should be compilable under almost any C compiler on
almost any system.  If you need a consistent way to do indexed file accesses
across many platforms, then the Indexed File ENGine is for you!


DISTRIBUTION:
=============

This archive should contain the following files:

readme.1st   This (disposable) file is a quick rundown on what the engine does

i_f_eng.doc  This is the entire documentation for the system (this file)
i_f_eng.h    This is the Borland C++ source file containing the file engine.
              This is the only file which needs to be included in your source.
i_f_test.c   Source to see if you can successfully compile with the file engine
i_f_utl.c    Source to the i_f_utl.exe program - samples of routine's usage
i_f_utl.c    A utility program written to support the file engine's format
i_f_utl2.c   Program which converts/reorganizes sequential/indexed files
i_f_utl3.c   Batch version of above program (functions not yet integrated!)


HOW TO USE THE ENGINE:
======================

The object file I_F_ENG.C contains the ANSI C code for an extremely flexible
multi-keyed file system.  The header file I_F_ENG.H contains the
function/error codes used/returned by the indexed file routine, and should be
#included in any C source which is written to utilize the file engine.  The
relevant limits of the file system are:

- 1 to 255 keys are definable; duplicate/no-duplicates can be specified for all.
- Record lengths up to 65536 bytes (records can contain any data you desire).
- Variable length records if desired.
- Keys can have lengths of up to 65535 chars. and occur anywhere in the record.
- File size of 21 gigabytes (although this might be impractical).
- Number of open files is operating-system dependent (although each file
    requires that some dynamic memory be allocated by the program to maintain
    tables).  The limit in the source code is 32,767 files; now that shouldn't
    be a problem, should it?
- File space for deleted records is used by succeeding writes when the file
    has the fixed record length attribute.
- Files can be started/read either forwards or backwards.  See I_F_ENG.H.

The utility program I_F_UTL.EXE supports the file format by providing various
functions (an integrity test and thrashing tests).  I_F_UTL2.EXE does
file conversions (indexed-->stream and stream-->indexed).  Converting an
indexed file to stream format and back again optimizes the file.

To link with Borland C:    (1) #include I_F_ENG.H in your C source code
                           (2) bcc -ml prog.c i_f_eng.obj

                                     ^---- or whatever memory model!

One routine (named indexed_file()) performs all indexed-file-related activity.
Another routine is provided for parsing the file attributes string; you will
probably never use this routine, although its use is illustrated in the
I_F_UTL program's source code.  See I_F_ENG.H for calling information.
The attributes string looks like so:  "20 1 0 10 0"  meaning fixed record
lengths of 20 with one key.  Each key has three numbers, the first being
the starting offset in the record (0 above), the next the length of the
key (10 above), and the last a flag to allow-disallow duplicates in this key
(not allowed above by the 0).  A file with two keys might have the following
attribute string:  "100 2 0 10 0 10 30 1"  giving one duplicate and one
non-duplicate key.

I_F_ENG.H also contains information on the I_F_DIRECTION variable, which
determines whether start, read, and read-next operations move the indexed
file's current record pointer forwards or backwards.  The I_F_RECORD_LENGTH
variable is set and/or used when dealing with variable-record-length files.

The routine is never destructive insofar as the record area goes; it is left
untouched after writes, rewrites, deletes, etc..  Allowable operation orders
are given by the following table (such as a rewrite can be followed by a start,
another rewrite, etc.):
                                start | x  x  x  x
   The last                      read | x  x  x  x  x  x
   operation ---->           readnext | x  x  x  x  x  x
   performed                    write | x  x     x
                              rewrite | x  x  x  x  x  x
                               delete | x  x  x  x
                                      +------------------
                                        s  r  r  w  r  d
             The                        t  e  e  r  e  e
             subsequent                 a  a  a  i  w  l
             operations ---->           r  d  d  t  r  e
             allowed                    t     n  e  i  t
                                              e     t  e
                                              x     e
                                              t

Rewrites update only the pointers for the keys which have changed;
the record data, of course, is always rewritten.

If you want to rewrite by key (rewrite a certain record without reading it
first), then I suggest that you implement a routine like the following:

 char *junk_record;
 indexed_file(&file_handle,I_F_READ,original_record,0);  /* Read original. */
 /* Now save a key (in this case key 0) which we use to read the original. */
 junk_record = (char *)malloc(record_length);  /* Create temporary record. */
 memcpy(junk_record+key_0_position,original_record+key_0_position,key_0_length);
 .
 ....read other records and/or update data in original_record....
 .
 indexed_file(&file_handle,I_F_READ,junk_record,0);   /* Sets record pointer. */
 indexed_file(&file_handle,I_F_REWRITE,original_record,0);  /* The rewrite! */
 free(junk_record);  /* Deallocate the temporary record. */

This set of statements could be put into a routine if you plan to do a lot of
rewriting by key.  If you think that a rewrite by key would be a good function
to include in this software then let me know, but since it's probably not too
commonly used, I figure that leaving it out will leave us a smaller .OBJ file,
and those people who need it can implement it as described above.


PERFORMANCE:
============

The engine operates by maintaining binary trees in the file, one for each
key.  Record lengths can be variable or fixed; using fixed record lengths has
the advantage of allowing the system to reclaim the space of deleted records.

The performance of the file system depends on disk caching being generously
available.  This is also one of those applications which will benefit from a
"clean" hard disk; be sure to optimize your drives regularly for best results.
RUN A WRITE CACHE IF AT ALL POSSIBLE!

The file system will support file locking only (one user at a time may
update the file).  Record locking will be added only if requested by
enough (registered) users.  I haven't had any luck with multiple users
accessing the same file under MS-DOS when SHARE is running because I
haven't mastered the sopen() function.  Since this function is MS-DOS
specific I'm not that concerned, but it would be nice if I could work
together with SHARE, so if anyone has any words of wisdom on this subject
then please send them my way!

A file is NOT started when you open it in a read mode.  Many file systems
start the file by the primary key for you after opening in a read mode
but I don't think that feature belongs in this system; I'm a purist!

The engine allocates small amounts of dynamic memory to hold information
on open files.  It also allocates/deallocates (when files are opened/closed)
enough memory to hold the largest key found in any file currently open; if
you need dynamic memory and have a file with an extremely large key open, you
might try closing that file.

I'm sure that I have omitted some relevant points, many of which I probably
mused about for hours but just forgot to mention.  Let me know what they are
and I'll answer your questions AND update this documentation.


THE UTILITY PROGRAMS:
=====================

I_F_UTL is a program (the source for which is included) which allows file
integrity testing and random write benchmarking.  Because each key which
is written has a random number as its value, the key structure stays more
or less balanced as records are written to the file.  This lessens the effect
of any disk caching you might have but it provides a fairly realistic example
of how long it might take a program to write a record of the requested type.
Since rewrites only update the keys which have changed, they may proceed much
more quickly than writes depending on the nature of your program.

I_F_UTL2 is a program for converting/reorganizing indexed files.  The source
is not included in this distribution.  The main ability of this program is to
load an empty indexed file from a stream file, creating optimized key indices
at the same time.  Since the program will also convert indexed files to stream
files, this program may be used to optimize indexed files.  The steps are
as follows:
           - Run option #1 to create a stream file from an indexed file.
              For each record in the indexed file, the following sequence
              of bytes is written out to the stream file:
                                       - 2 bytes (unsigned short) which
                                          give the record length in bytes
                                       - x bytes of record data
           - Use option #2 to create an empty indexed file
           - Use option #3 to load the empty indexed file from the stream file

If you wish to create a stream file for loading an indexed file from your own
data, you will probably end up with code that looks like the following:

unsigned short record_length;     /* How to write out a stream file! */
struct  phonelist_record_type  {
    char phone_number[10];     /* This is a sample structure. */
    char name[40];
    short number_of_calls;   /* This is 2 bytes long. */
    }  phonelist_record;
record_length = sizeof(struct phonelist_record_type);  /* Could've said "52". */
while (there_is_data)   /* Replace with appropriate code. */
  {
  (read data into phonelist_record)
  fwrite(&record_length,2,1,stream_file); /* Unsigned short is 2 bytes long. */
  fwrite(&phonelist_record,sizeof(struct phonelist_record_type),1,stream_file);
  }

Command line versions of I_F_UTL and I_F_UTL2 will be included with the next
distribution so that their functions (like file integrity checking and file
reorganization) may be run in batch files.  I have a habit of running integrity
checks every night and file optimizations at least once a week.  This lets me
know right away when a file is corrupted (instead of me finding out during the
end-of-month runs!) and keeps the access to the files as quick as is possible.


The indexed file ENGine is written in ANSI C and should be compatible with
any operating system under which it will compile successfully, since it uses
the most primitive of C functions to do file I/O.

Any application which uses the indexed file engine, whether or not you have
purchased the source code, must continue to display the Copyright notice
which appears the first time the engine is called.  You may cause this notice
to appear at the beginning of your program (before you intend to use the
engine in any way) by issuing an I_F_CLOSE_ALL call to the engine; this
call is always safe to issue: if no files are open then nothing happens.
This requirement may be waived by the copyright owner.


MODIFICATION HISTORY:
=====================

Version   Description
-------   -----------
 1.00     This is the original public (mass) offering.
 1.01     This version added the I_F_DIRECTION variable.

In the works is a modification to allow multi-part keys (where a key is
composed of a couple of bytes here in the record, a few bytes there, etc.;
the portions of the key need not be contiguous in the record).  This is
a simple modification but I would like to get a little more use/testing
done with the current version before making too many changes.

Command-line versions of the utility programs are also on the way.

Suggestions for modifications, bug fixes, inquiries, improvements to the
documentation, etc. should be directed to me at one of the above addresses.
For a certain response, US Mail works for sure, although it takes a little
longer.


KNOWN BUGS:
===========

There are none as of yet!  The variable-length record operations have not been
tested as thoroughly as the fixed-length operations have been, but the engine
is not extremely picky about this.  Nevertheless, if you are implementing an
extremely important function with this system, keep good backups (you should
always keep good backups anyway!).  Please report suspected bugs immediately!


Alton Moore


<End of the Indexed File ENGine documentation>