                         Turbo Pascal for DOS Tutorial
                             by Glenn Grotzinger
        Part 10 -- binary files; units, overlays, and include files.
             All parts copyright 1995-6 (c) by Glenn Grotzinger.

        There was no prior problem, so lets get started...

Typed binary files
==================
We know that files can be of type text.  We can also make them type "file
of <datatype>".  We can read and write binary data types to disk.  Here's
an example.   Keep in mind that with typed binary files, you can only
read and write the type of file you define it to be.  For the example
below, we can only deal with integers with this file.  The type we may
use may be anything that we have covered up to this point.  We also
will see that reading, accessing and writing of typed binary files will
be no different than accessing text files, except we can not make use
of readln and writeln (as those are for text files only).

program integers2disk;
{ writing integers 1 thru 10 to a disk data file, then reading 'em back }
   var
     datafile: file of integer;
     i: integer;
   begin
     assign(datafile, 'INTEGERS.DAT');
     rewrite(datafile);
     for i := 1 to 10 do
       write(datafile, i);
     close(datafile);  { done with write }
     reset(datafile);  { now lets start reading }
     read(datafile, i);
     while not eof(datafile) do { we can use the same concept }
       begin
         writeln(i);
         read(datafile, i);
       end;
     writeln(i);
     close(datafile);
   end.

You will notice the numbers 1 through 10 come up.  Look for the file
named INTEGERS.DAT, and then load it up in a text editor.  You will
notice that the file is essentially garbage to the human eye.  That,
as you see, is how the computer sees integers.  In part 11, I will
explain storage methods of many many different variables, and introduce
a few new types of things we can define.  We can use records, integers,
characters, strings, whatever...with a typed file as long as we comply
with the specific type we assign a file to be in the var line.

Untyped Binary Files
====================
We can also open binary files as an untyped, unscratched (essentially)
file.  There we simply use the declaration "file". (I think this is ver7
dependent, am I right?)  Anyway, in addition to this, we have to learn
a few new commands in order to use untyped files.

BLOCKREAD(filevar, varlocation, size of varlocation, totalread);

filevar is the untyped file variable.
varlocation is the location of where we read the variable into.
size of varlocation is how big varlocation is.
totalread is how much of varlocation that was readable. (optional)

BLOCKWRITE(filevar, varlocation, totalread, totalwritten);

filevar is the untyped file variable.
varlocation is the location of where we read the variable into.
totalread is how much of varlocation was readable. (optional)
totalwritten is how much of varlocation that was written. (optional)

SizeOf(varlocation)

Function that gives total size of a variable in bytes.

Maximum readable by BlockRead: 64KB.

Reset and Rewrite have a record size parameter if we deal with an untyped
file.

Probably, the best thing to make things clearer is to give an example.
This program does the same thing as the last one does, but only with
an untyped file.  See the differences in processing...

program int2untypedfile;

  var
    datafile: file;
    i: integer;
    numread: integer;

  begin
    clrscr;
    assign(datafile, 'INTEGERS.DAT');
    rewrite(datafile, 1);
    for i := 1 to 10 do
      blockwrite(datafile, i, sizeof(i));
    close(datafile);
    reset(datafile, 1);
    blockread(datafile, i, sizeof(i), numread);
    while numread <> 0 do
      begin
        writeln(i);
        blockread(datafile, i, sizeof(i), numread);
      end;
    close(datafile);
  end.
      
This program performs essentially the same function as the first example
program, but we are using an untyped file.  Blockread and blockwrite are
used in very limited manners here.  It's *VERY GOOD* for you to experiment
with their use!!!!!!!   As far as the EOF goes on a comparison, blockread
returns how many records it actually read.  We use that as an equivalent.

The 2 missing DOS file functions
================================
We now have the tools to perform the 2 missing DOS file functions that you
probably recognized were gone from part 8, copying files, and moving files.

Copying files essentially, is repeated blockreads and blockwrites until
all the input file is read and all the output file is written.  We can
do it with either typed or untyped files.  An untyped file example may
be found on page 14 of the Turbo Pascal 7.0 Programmer's Reference.
For those who do not have this reference...Snippet of my own...untested...

while (numread <> 0) or (bytesw = bytesr)
  begin
    blockread(infile, rarray, sizeof(rarray), bytesr);
    blockwrite(outfile, rarray, bytesr, bytesw);
  end;

Moving files is a copy of an input file to a new location, followed by
erasure of the input file.

Units
=====
A unit is what you see probably on your drive in the TP/units directory.
Compiled units are TPU files.  They are accessed via USES clauses at the
start.  CRT, DOS, and WinDos are some of the provided units we have already
encountered.  Nothing is stopping us from writing our own, though.  The
actual coding of procedures/functions that we place into units is no
different.  The format of the unit, though, is something we need to think
about.  An example is the best thing for that.  This is a simple
implementation of a unit, with examples to give you some idea of a
skeleton to place procedures and functions into.

unit myunit;

  interface
     { all global const, type, and var variables go here as well as any
       code we may want to run as initialization for starting the unit. }

     { procedures and function headers are listed here }

     procedure writehello(str: string);

  implementation
     { actual procedural code goes here, as it would in a regular program }

     procedure writehello(str: string);  { must be same as above }
       begin
         writeln(str);
       end;

  end.

The unit up above is compilable to disk/memory, but unrunable.  Essentially,
what it is is a library of procedures/functions that we may use in other
programs.  Let's get an example out on how to use one of our own units.

program tmyunit; uses myunit; { must match ID at beginning }
  var
    str: string;
  begin
    str := 'Hello!  Courtesy of myunit!';
    writehello(str);
  end.

Though this program/unit combination is ludicrous, it does illustrate
exactly how to incorporate your own unit with MANY functions into your
programming, if your project gets too big, or for portability's sake
on some of your frequently used procedures.

Overlays
========
This will describe how to use TP's overlay facility.  It must be used with
units.  Typically, my thoughts are that if you get a large enough project
to dictate the use of overlays (we can use 'em on anysize projects, but
the memory taken up by the overlay manager far uses more memory on smaller
projects to make it an advantage to habitually do this).  We will use
the overlay facility with the unit/program set above for example purposes.

ONLY CODE IN UNITS HAVE AN OPPORTUNITY TO BE OVERLAID!  System, CRT, Graph,
and Overlay (if I remember right) are non-overlayable.

{$O+} is a compiler directive for UNITS only which designate a unit which
is OK to overlay.  {$O-} is the default, which says it's not OK to overlay
a unit.

To get to the overlay manager, we must use the overlay unit.

After the overlay unit, we need to use the {$O <TPU name>} compiler
directive to specify which units that we want to compile as an overlay.

WARNING: It is good to check your conversion to overlays in a program
with a copy of your source code.  If you alter it with overlays in mind
and it doesn't work (it's known to happen -- a procedure works ONLY when
it's not overlaid...), you won't have to go through the work to alter
it back if it doesn't work right...

NOTE: You must compile to disk, then run when you work with overlays.

Results come back in the OvrResult variable.  Here's a list...

                  0       Success
                 -1       Overlay manager error.
                 -2       Overlay file not found.
                 -3       Not enough memory for overlay buffer.
                 -4       Overlay I/O error.
                 -5       No EMS driver installed.
                 -6       Not enough EMS memory.

As for examples, let's look at the unit set up to overlay.  As we can
see, the only real difference (which is a good policy to make), is that
there is the {$O+} compiler directive there now...

{$O+}
unit myunit;

  interface
     { all global const, type, and var variables go here as well as any
       code we may want to run as initialization for starting the unit. }

     { procedures and function headers are listed here }

     procedure writehello(str: string);

  implementation
     { actual procedural code goes here, as it would in a regular program }

     procedure writehello(str: string);  { must be same as above }
       begin
         writeln(str);
       end;

  end.

Now lets look into the program itself.  It's error-reporting from the
overlay manager isn't great.  It stops the program if the overlay won't
load, but doesn't do a thing, really, with the ems section.

program tmyunit; uses myunit, overlay;

  {$O MYUNIT.TPU}        { include myunit in the overlay }
  var
    str: string;
  begin
    ovrinit('TMYUNIT.OVR'); { final overlay file name/init for program. }
    if OvrResult <> 0 then
      begin
        writeln('There is a problem');
        halt(1);
      end
    else
      write('Overlay installed ');
    ovrinitems;        {init overlay for EMS.  Usable after ovrinit}
    if OvrResult <> 0 then
      writeln('There was a problem putting the overlay in EMS')
    else
      writeln('in EMS memory.');
    str := 'Hello!  Courtesy of myunit!';
    writeln;
    writehello(str);
  end.

EXE Overlays
============
Here's how to set up EXE overlays.  The DOS copy command features the B
switch.  For example, to take the programs source file above and attach the
overlay to the end of the EXE (be sure you run any exe packers/encryptors
before you do this!), use the following:

        COPY /B TMYUNIT.EXE+TMYUNIT.OVR

Then the change that needs to be made in the source for the program is to
change the overinit line to read TMYUNIT.EXE instead of TMYUNIT.OVR.  You
should be able to handle doing this and understanding what is going on.

Include Files
=============
Use the {$I <filename>} compiler directive at the position the include
file is to be placed.  An include file is code that is in another file,
which may be considered as "part of the program" at the position the
{$I ..} compiler directive is at.

Copy function
=============
You can use the copy function to get a portion of a string into another
part of a string.  For example...

        str := copy('TurboPascal', 5, 3);
        writeln(str);      { writes oPa }

Programming Practice for Part #10
=================================
We have opened ourselves a business selling computer equipment in 1993.
Since we have occupied ourselves with working on computers, and not on
bookkeeping (we wanted to save the funds instead of hiring someone), and
rather not use the cash registers, we have done everything on paper over
the last two years.  It's the beginning of 1996, and any accurate records
of sales progression, as well as records of our customers has become
almost impossible, since our records are represented by a closet-full of
paper.  So, we finally have decided to get things into computer.

To do the typing, we have temporarily hired interns from a nearby business
college.  Unfortunately, with our limited funds, we could not draw in
people who had sufficient typing skill and accuracy, but we took what
we could get.  We now have things typed in as text files with 80 columns
a line.  Unfortunately, the interns' attention to detail has been as bad
as their typing skill, and nothing makes sense in their work.

Our purposes is to save our money in hiring these interns and locate the
badly entered records, while writing the good records to a solid binary
data file by the name of COMPHVN.DAT.  For the bad records, on EACH AND
EVERY error we encounter, we should write a text message with the first
20 characters of the problem line and a description of what is wrong with
the data set for that particular error so we may go back through and make
the interns redo what they did wrong to a text file named ERRORS.LOG.

The data format for the output file COMPHVN.DAT is as follows.  For
interest of efficiency, we shall write this program using COMPHVN.DAT
as an untyped file.  As the person posing this problem, I realize that
some of the data types in this record will not be recognizable at this
point, but with the variable description, you will know how to handle
them, and in part 11, you will see what they are exactly.  In creating
a binary file, we must always be concerned with using the least amount
of space as effectively as possible.   Uses of the variables will be
explained later.  For interest of typing efficiency on your parts,
I am asking that you cut and paste this record description out of this
description and save it as a text file named COMPHVN.INC, which may be
used as an include file in our compilation.

  comphvndata = record
      datacode: string[7];  
      acct_classification: char;
      phone_area: integer;   {area+prefix+exchange = phone number}
      phone_prefix: integer;
      phone_exchange: integer;
      work_area: integer;
      work_prefix: integer;
      work_exchange: integer;
      other_area: integer;
      other_prefix: integer;
      other_exchange: integer;
      cnct1_lname: string[16];
      cnct1_fname: string[11];
      cnct1_minit: char;
      cnct1_pobox: integer;
      cnct1_sname: string[8];
      cnct1_stype: string[4];
      cnct1_apt: integer;
      cnct1_city: string[10];
      cnct1_state: string[2];
      cnct1_zip: longint;
      cnct1_birthm: byte;
      cnct1_birthd: byte;
      cnct1_birthy: integer;
      accept_check: boolean;
      accept_credt: boolean;
      balnce_credt: real;
      total_sold: real;
      cnct1_emp_code: string[4];
      total_sales: integer;
      emp_name: string[10];
      emp_stnum: integer;
      emp_sttype: string[4];
      emp_city: string[10];
      emp_state: string[2];
      emp_zip: longint;
      emp_area: integer;
      emp_prefix: integer;
      emp_exchange: integer;
      emp_yrs: byte;
      compu: boolean;
      compu_type: string[9];
      compu_mon: char;
      compu_cdr: boolean;
      compu_cdt: char;
      compu_mem: byte;
      minor: boolean;
   end;

The format for our INPUT file, which will be named INDATA.TXT, will be as
follows (80 characters).  Since we had 15 interns doing the typing at once
we also had them merge their work.  They were careless, and may have not
accomplished it properly.  There will be three lines for each customer
that we have encountered.

Line 1                                 Line 2
--------------------------------------------------------------------                                  
datacode              columns 1-7     datacode         columns 1-7
acct_classification   column 8        accept_check     column 8
sequence number       column 9        sequence number  column 9
phone_area            columns 10-12   cnct1_stype      columns 10-13
phone_prefix          columns 13-15   cnct1_apt        columns 14-17
phone_exchange        columns 16-19   cnct1_city       columns 18-27
work_area             columns 20-22   cnct1_state      columns 28-29
work_prefix           columns 23-25   cnct1_zip        columns 30-38
work_exchange         columns 26-29   cnct1_birthm     columns 39-40
other_area            columns 30-32   cnct1_birthd     columns 41-42
other_prefix          columns 33-35   cnct1_birthy     columns 43-46
other_exchange        columns 36-39   balnce_credt     columns 47-55
cnct1_lname           columns 40-55   total_sold       columns 56-63
cnct1_fname           columns 56-66   cnct1_emp_code   columns 64-67
cnct1_minit           column 67       total_sales      columns 68-70
cnct1_pobox           columns 68-72   emp_name         columbs 71-80
cnct1_sname           columns 73-80

Line 3
--------------------------------------------------------------------
datacode              columns 1-7
accept_credt          column 8
sequence number       column 9
emp_stnum             column 10-13
emp_sttype            column 14-17
emp_city              column 18-27
emp_state             column 28-29
emp_zip               column 30-38
emp_area              column 39-41
emp_prefix            column 42-44
emp_exchange          column 45-48
emp_yrs               column 49-50
compu                 column 51
compu_type            column 52-60
compu_mon             column 61
compu-cdr             column 62
compu_cdt             column 63
compu_mem             column 64-65
minor                 column 66
spaces                column 67-80

Now, a description as to what is defined as a correct set that we should
write to COMPHVN.DAT.

1) Each 3 lines that are read are considered for errors.  Check the sequence
numbers.  The first line's sequence number should be 1, for example.  A
successful read of 3 lines should say 1, 2 and 3 in that order.  For example,
in our error reporting, if you have a read of 1,2,2 , you should not write
the group to the binary file, and report a duplicate line #2 and a missing
line #3.  There will not ever be a circumstance where these sequence numbers
will all be the same...The cases covered in this paragraph would be the only
cases that would ever forstall processing of error-checks listed in points
2-14.

2) Datacode on lines 1, 2 and 3 should MATCH exactly and be checked for the
following: It has the format, for example, with my name of GROTZ*G, and
should be verified using the cnct1_names...

3) phone_area, phone_prefix, phone_exchange, work_area, work_prefix, work_
exchange, other_area, other_prefix, other_exchange, pobox, emp_zip, emp_
area, emp_prefix, emp_exchange, emp_yrs, cnct1_zip, cnct1_birthm, cnct1_
birthd, cnct1_birthy, balance_credt, total_sold, total_sales, compu_mem
all should be checked to verify that they are numeric in origin.

4) phone_prefix, work_prefix, other_prefix, emp_prefix all should not start
with a 1 or a 0.

5) cnct1_birthy should be in this century 1900-1999.

6) acct_classification should be B,C,G,P, or O.

7) accept_check, accept_credt, compu, compu_cdr, and minor should be
Y or N.

8) emp_yrs (employed how many years?) should be checked with cnct1_birthy
for sanity (a person who was born in 1980 cant have worked 20 years).

9) If compu is N, then compu_type, compu_mon, compu_cdr, compu_cdt, and
compu_mem should be either blank or 0 depending upon the type of field.

10) cnct1_emp_code should be GOVT, RET, STUD, or BUS.  If this field is
RET, then emp_* should either be blank or 0 depending on the type of field.

11) compu_mon should be S, V, E, C, H, or I.

12) compu_cdt should be 1, 2, 4, 6, or 8.

13) emp_sttype and cnct1_stype should be BLVD, LANE, ST, AVE, CT, LOOP,
    DR, CIRC, or RR.

14) minor should be Y if person listed in cnct1_?name is < 21 years old 
and N otherwise.  Check to be sure that this field is correct in being
Y or N.

Format of ERRORS.LOG (also solution to the INDATA.TXT posted below)
--------------------

                        Error Report -- INDATA.TXT
                        --------------------------

  First 20 characters of line    Problem
  ---------------------------    --------------------------
  GROA2*GN334  ST  WAR           Datacode does not agree with name.
  GROT2*GP181612932918           Work-exchange is not numeric.
  GROT2*GP181612932918           phone-prefix started with a 0 or 1.
  GROT2*GT2ST  314 SED           accept-check is invalid.
  GROT3*GP181642932918           Duplicate line #1
  GROT3*GN234  ST  WAR           Missing line #3
  GROT4*GI181642932918           Datacode does not agree with name.
  GROT4*GY2ST  314 SED           Datacode does not agree with name.
  GROT4*GN334  ST  WAR           Datacode does not agree with name.
  GROT4*GY2ST  314 SED           cnct1-birthy is not in this century.
  GROT4*GI181642932918           acct-classification is invalid.
  GROT7*GN334  ST  WAR           emp-zip is not numeric.
  GROT7*GN334  ST  WAR           compu-cdr is invalid.
  GROT7*GN334  ST  WAR           The emp-yrs doesn't make sense.
  GROT7*GN334  ST  WAR           There were fields present when compu was N.
  GROT7*GN334  ST  WAR           compu-mon is invalid.
  GROT7*GN334  ST  WAR           compu-cdt is invalid.
  GROT8*GN334  ST  WAR           empcodes are present when RET is true.
  GROT8*GN334  ST  WAR           compu-mon is invalid.
  GROT0*GN334  STR WAR           compu-cdt is invalid.
  GROT0*GN334  STR WAR           emp-sttype is invalid.
 


Remember to be as general as possible on your error messages. Use the
example listed above as a guide. Your program can not predict everything.
Also, in the interest of finding out your programming skill, we ask that
you code this program using the pascal overlay system with EMS load
capability, with all error codes and status statements active and visible
to the user, for at least one procedure or function.  Also note, that many
of the separate integer fields are put together in the input file, so we
can not just plain read the input file.

Here is a copy of the current input file, INDATA.TXT
(keep in mind it's 80 characters per line, and the character
positions MATTER)
----------------------------------------------------
GROT1*GP1816429329181674700008163475753GROT1INGER      GLENN      K232  34th
GROT1*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT1*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT2*GP18161293291816747000A8163475753GROT2INGER      GLENN      K232  34th
GROT2*GT2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROA2*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT3*GP1816429329181674700008163475753GROT3INGER      GLENN      K232  34th
GROT3*GY1ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT3*GN234  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT4*GI1816429329181674700008163475753BROT4INGER      GLENN      K2E2  34th
GROT4*GY2ST  314 SEDALIA   MO64093    062518742.34     3245.23 STUD32 CMSU
GROT4*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT5*GP1816429329181674700008163475753GROT5INGER      GLENN      K232  34th
GROT5*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT5*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT6*GP1816429329181674700008163475753GROT6INGER      GLENN      K232  34th
GROT6*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT6*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT7*GP1816429329181674700008163475753GROT7INGER      GLENN      K232  34th
GROT7*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT7*GN334  ST  WARRENSBURMO65W37    816543411134NHOMEBUILT  00 N
GROT8*GP1816429329181674700008163475753GROT8INGER      GLENN      K232  34th
GROT8*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 RET 32 CMSU
GROT8*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTZY18 N
GROT9*GP1816429329181674700008163475753GROT9INGER      GLENN      K232  34th
GROT9*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT9*GN334  ST  WARRENSBURMO65337    81654341114 YHOMEBUILTVY18 N
GROT0*GP1816429329181674700008163475753GROT0INGER      GLENN      K232  34th
GROT0*GY2ST  314 SEDALIA   MO64093    062519742.34     3245.23 STUD32 CMSU
GROT0*GN334  STR WARRENSBURMO65337    81654341114 YHOMEBUILTVYA8 N


Notes
-----
1) You may use a for loop to read each set of 3 lines.  I will not throw
an error of omission of lines into the data file.  There will always
be multiples of 3 lines to work with.

2) The included data file in this text file includes errors from all 14
points listed above.  The data file I use for the contest will be different,
but will as well cover all 14 points listed above...

3) Be sure to get good use of your debugger, as you will NEED it...Also, be
sure to plan the program -- this is an easy one, yet it's complex because
of the amount of planning it requires...plan well, it's easy.  Don't plan
well, it's a bugger...:>

4) ONE hint: remember string addressing, and use of the copy procedure.

5) Another hint.  You can have what is referred to as "next sentence"
IF THEN ELSE statements.  It is very good in this program to be able to
use them.  (if condition then else) is essentially, a do nothing if con-
dition is true situation.  I suggest it because the pascal operator NOT
seems to not work right in all cases. :<

Also, keep in mind that this is the part 10 practice, too, so be sure to
at least attempt it!

Next Time
=========
Interfacing with a common format; how data types are stored in memory and
on disk.  You may wish to obtain use of a hex viewer for this next part.
Send comments to ggrotz@2sprint.net.


