The Indexed File ENGine Version: 1.01 Date: 09-16-91 +-----------------------------------+ | Alton Moore III | | Rt. 21, Box 694-A | | Edinburg, TX 78539-9805 | | www.alton-moore.net | | alton@alton-moore.net | +-----------------------------------+ INTRODUCTION: ============= Many programmers have a need for multi-keyed indexed files. Sometimes we have a large number of records which we need to look up quickly; sometimes we just want to do a small internal sort conveniently. Those programmers writing business/accounting applications, especially, often wish for more sophisticated file access schemes than are generally found on PCs. Many software writers have come to depend on the file access abilities of separate database products to implement their programming assigments; while this may be appropriate for some, many programmers would prefer to not be dependent upon ANY outside resources/programs when building/maintaining their applications. It is for these people that the Indexed File ENGine was written. The source is written entirely in ANSI C to be as portable as is possible, and so should be compilable under almost any C compiler on almost any system. If you need a consistent way to do indexed file accesses across many platforms, then the Indexed File ENGine is for you! DISTRIBUTION: ============= This archive should contain the following files: readme.1st This (disposable) file is a quick rundown on what the engine does i_f_eng.doc This is the entire documentation for the system (this file) i_f_eng.h This is the Borland C++ source file containing the file engine. This is the only file which needs to be included in your source. i_f_test.c Source to see if you can successfully compile with the file engine i_f_utl.c Source to the i_f_utl.exe program - samples of routine's usage i_f_utl.c A utility program written to support the file engine's format i_f_utl2.c Program which converts/reorganizes sequential/indexed files i_f_utl3.c Batch version of above program (functions not yet integrated!) HOW TO USE THE ENGINE: ====================== The object file I_F_ENG.C contains the ANSI C code for an extremely flexible multi-keyed file system. The header file I_F_ENG.H contains the function/error codes used/returned by the indexed file routine, and should be #included in any C source which is written to utilize the file engine. The relevant limits of the file system are: - 1 to 255 keys are definable; duplicate/no-duplicates can be specified for all. - Record lengths up to 65536 bytes (records can contain any data you desire). - Variable length records if desired. - Keys can have lengths of up to 65535 chars. and occur anywhere in the record. - File size of 21 gigabytes (although this might be impractical). - Number of open files is operating-system dependent (although each file requires that some dynamic memory be allocated by the program to maintain tables). The limit in the source code is 32,767 files; now that shouldn't be a problem, should it? - File space for deleted records is used by succeeding writes when the file has the fixed record length attribute. - Files can be started/read either forwards or backwards. See I_F_ENG.H. The utility program I_F_UTL.EXE supports the file format by providing various functions (an integrity test and thrashing tests). I_F_UTL2.EXE does file conversions (indexed-->stream and stream-->indexed). Converting an indexed file to stream format and back again optimizes the file. To link with Borland C: (1) #include I_F_ENG.H in your C source code (2) bcc -ml prog.c i_f_eng.obj ^---- or whatever memory model! One routine (named indexed_file()) performs all indexed-file-related activity. Another routine is provided for parsing the file attributes string; you will probably never use this routine, although its use is illustrated in the I_F_UTL program's source code. See I_F_ENG.H for calling information. The attributes string looks like so: "20 1 0 10 0" meaning fixed record lengths of 20 with one key. Each key has three numbers, the first being the starting offset in the record (0 above), the next the length of the key (10 above), and the last a flag to allow-disallow duplicates in this key (not allowed above by the 0). A file with two keys might have the following attribute string: "100 2 0 10 0 10 30 1" giving one duplicate and one non-duplicate key. I_F_ENG.H also contains information on the I_F_DIRECTION variable, which determines whether start, read, and read-next operations move the indexed file's current record pointer forwards or backwards. The I_F_RECORD_LENGTH variable is set and/or used when dealing with variable-record-length files. The routine is never destructive insofar as the record area goes; it is left untouched after writes, rewrites, deletes, etc.. Allowable operation orders are given by the following table (such as a rewrite can be followed by a start, another rewrite, etc.): start | x x x x The last read | x x x x x x operation ----> readnext | x x x x x x performed write | x x x rewrite | x x x x x x delete | x x x x +------------------ s r r w r d The t e e r e e subsequent a a a i w l operations ----> r d d t r e allowed t n e i t e t e x e t Rewrites update only the pointers for the keys which have changed; the record data, of course, is always rewritten. If you want to rewrite by key (rewrite a certain record without reading it first), then I suggest that you implement a routine like the following: char *junk_record; indexed_file(&file_handle,I_F_READ,original_record,0); /* Read original. */ /* Now save a key (in this case key 0) which we use to read the original. */ junk_record = (char *)malloc(record_length); /* Create temporary record. */ memcpy(junk_record+key_0_position,original_record+key_0_position,key_0_length); . ....read other records and/or update data in original_record.... . indexed_file(&file_handle,I_F_READ,junk_record,0); /* Sets record pointer. */ indexed_file(&file_handle,I_F_REWRITE,original_record,0); /* The rewrite! */ free(junk_record); /* Deallocate the temporary record. */ This set of statements could be put into a routine if you plan to do a lot of rewriting by key. If you think that a rewrite by key would be a good function to include in this software then let me know, but since it's probably not too commonly used, I figure that leaving it out will leave us a smaller .OBJ file, and those people who need it can implement it as described above. PERFORMANCE: ============ The engine operates by maintaining binary trees in the file, one for each key. Record lengths can be variable or fixed; using fixed record lengths has the advantage of allowing the system to reclaim the space of deleted records. The performance of the file system depends on disk caching being generously available. This is also one of those applications which will benefit from a "clean" hard disk; be sure to optimize your drives regularly for best results. RUN A WRITE CACHE IF AT ALL POSSIBLE! The file system will support file locking only (one user at a time may update the file). Record locking will be added only if requested by enough (registered) users. I haven't had any luck with multiple users accessing the same file under MS-DOS when SHARE is running because I haven't mastered the sopen() function. Since this function is MS-DOS specific I'm not that concerned, but it would be nice if I could work together with SHARE, so if anyone has any words of wisdom on this subject then please send them my way! A file is NOT started when you open it in a read mode. Many file systems start the file by the primary key for you after opening in a read mode but I don't think that feature belongs in this system; I'm a purist! The engine allocates small amounts of dynamic memory to hold information on open files. It also allocates/deallocates (when files are opened/closed) enough memory to hold the largest key found in any file currently open; if you need dynamic memory and have a file with an extremely large key open, you might try closing that file. I'm sure that I have omitted some relevant points, many of which I probably mused about for hours but just forgot to mention. Let me know what they are and I'll answer your questions AND update this documentation. THE UTILITY PROGRAMS: ===================== I_F_UTL is a program (the source for which is included) which allows file integrity testing and random write benchmarking. Because each key which is written has a random number as its value, the key structure stays more or less balanced as records are written to the file. This lessens the effect of any disk caching you might have but it provides a fairly realistic example of how long it might take a program to write a record of the requested type. Since rewrites only update the keys which have changed, they may proceed much more quickly than writes depending on the nature of your program. I_F_UTL2 is a program for converting/reorganizing indexed files. The source is not included in this distribution. The main ability of this program is to load an empty indexed file from a stream file, creating optimized key indices at the same time. Since the program will also convert indexed files to stream files, this program may be used to optimize indexed files. The steps are as follows: - Run option #1 to create a stream file from an indexed file. For each record in the indexed file, the following sequence of bytes is written out to the stream file: - 2 bytes (unsigned short) which give the record length in bytes - x bytes of record data - Use option #2 to create an empty indexed file - Use option #3 to load the empty indexed file from the stream file If you wish to create a stream file for loading an indexed file from your own data, you will probably end up with code that looks like the following: unsigned short record_length; /* How to write out a stream file! */ struct phonelist_record_type { char phone_number[10]; /* This is a sample structure. */ char name[40]; short number_of_calls; /* This is 2 bytes long. */ } phonelist_record; record_length = sizeof(struct phonelist_record_type); /* Could've said "52". */ while (there_is_data) /* Replace with appropriate code. */ { (read data into phonelist_record) fwrite(&record_length,2,1,stream_file); /* Unsigned short is 2 bytes long. */ fwrite(&phonelist_record,sizeof(struct phonelist_record_type),1,stream_file); } Command line versions of I_F_UTL and I_F_UTL2 will be included with the next distribution so that their functions (like file integrity checking and file reorganization) may be run in batch files. I have a habit of running integrity checks every night and file optimizations at least once a week. This lets me know right away when a file is corrupted (instead of me finding out during the end-of-month runs!) and keeps the access to the files as quick as is possible. The indexed file ENGine is written in ANSI C and should be compatible with any operating system under which it will compile successfully, since it uses the most primitive of C functions to do file I/O. Any application which uses the indexed file engine, whether or not you have purchased the source code, must continue to display the Copyright notice which appears the first time the engine is called. You may cause this notice to appear at the beginning of your program (before you intend to use the engine in any way) by issuing an I_F_CLOSE_ALL call to the engine; this call is always safe to issue: if no files are open then nothing happens. This requirement may be waived by the copyright owner. MODIFICATION HISTORY: ===================== Version Description ------- ----------- 1.00 This is the original public (mass) offering. 1.01 This version added the I_F_DIRECTION variable. In the works is a modification to allow multi-part keys (where a key is composed of a couple of bytes here in the record, a few bytes there, etc.; the portions of the key need not be contiguous in the record). This is a simple modification but I would like to get a little more use/testing done with the current version before making too many changes. Command-line versions of the utility programs are also on the way. Suggestions for modifications, bug fixes, inquiries, improvements to the documentation, etc. should be directed to me at one of the above addresses. For a certain response, US Mail works for sure, although it takes a little longer. KNOWN BUGS: =========== There are none as of yet! The variable-length record operations have not been tested as thoroughly as the fixed-length operations have been, but the engine is not extremely picky about this. Nevertheless, if you are implementing an extremely important function with this system, keep good backups (you should always keep good backups anyway!). Please report suspected bugs immediately! Alton Moore