Data Files |
Data Files
An Introduction To Data Files
If you work with a
computer you work with files. Sometimes those files contain information about
something you are writing - in a word processor, for example. Sometimes those
files contain other information like results of calculations. Sometimes those
files contain results of measurements you took in a laboratory or out in the
field.
In this lesson we are going to
examine data files. There are numerous good reasons why you need to understand
some basic ideas about data files.
The amount of data you can store in a data file on a disk is determined by
the precision of the measurements you take and the number of data points you
want to record.
You may want to write programs that take and store data, and you need to
understand a little bit about file structure when you do that.
In both cases you need to
know something about file structure and some details of the characteristics of
data files. That's what this lesson is about.
Let's look at the life
history of a typical data file.
You use some instrument to
measure data - voltage, temperature, whatever.
The instrument manufacturer
supplies a utility program that lets you store the data in a file.
Alternatively, you write a program that controls an instrument and you write
the program to store data in the file.
Later you want to analyze the
data and you load your file into your favorite analysis program - Mathcad,
Matlab, Excel, whatever.
Through all of this, you need
to know something about data files. That's what this lesson is about.
Goals For
This Lesson
It's pretty clear that data files
are useful. We've come a long way from the days when we wrote data into a lab
notebook for storage or kept a pile of computer printouts. So what do you need
today?
Given a need to store data in a file,
Be able to use data files in general application programs.
Be able to write C or Visual Basic programs that will write data within
a program into a file.
Be able to explain the structure of a file you created
in C or Visual Basic.
Be able to determine file size as a function of measurement
precision and number of
data points.
Data Files
In this section we are
going to discuss what a flat file looks like. Later sections will show specific
program functions that will let you open and close flat data files and write
data to the files you create. First we will examine how files are built up.
Then we will look at details of creating files and writing data to files in some
popular programs.
In order to manipulate
data as described earlier you will need to understand some basic facts about how
files can be constructed. In particular, you'll need to know the following.
- Data files stored as
text files (a txt
extension in DOS/Windows systems) are stored as a
sequence of characters.
- Every character is stored as
a single byte of data. You know that a byte can store a number from 0 to
255. Every character on the keyboard has a numerical representation.
- There is a standard
code for character representations. That code is the American Standard Code
for Information Interchange, that is ASCII, and that is pronounced "Ask-ee".
When you strike a key on the keyboard, the ASCII code for that key is what
is transmitted to your computer.
Maybe the most important item
on the list above is that every character is stored in a byte in a file. If you
have that concept, then you can compute how much information can be stored on a
disk.
- Let's just take a single
megabyte (1MB). That's one million (1,000,000) bytes.
- Disks can store more than
that. Floppies can store 1.44 MB and hard drives can store many gigabytes
(A gigabyte is one billion bytes.)
- If we can figure out how much
you can get in a megabyte you can figure out how much you can get on a
floppy or a hard disk.
As this is written, I'm
reading a book.
- By my count it had about 2500
characters on a random page I picked.
- That means it would take 2500
bytes to store the text on a single page.
- By that count, one megabyte
could store 400 pages (2500 bytes/page x 400 pages = 1,000,000 bytes)
- The particular book I was
reading has only 267 pages, so the entire book could fit on a single floppy
disk.
You can store a large amount
of data even on a single floppy disk. Now, there are higher density disks that
hold 100 megabytes or 250 megabytes, so consider these problems.
EXAMPLE
How many 267 page books
will fit on a 100 megabye disk?
- Assume 2500 characters per
page, or 2500 bytes per page.
- 2500 x 267 = 667,500
bytes/book 0r .6675 megabytes/book
- Therefore, the number of
books is 100/.6675 = 149.8. Let's call it 149 books, or even 150.
- That's a lot easier than
carrying the hard copy version in a backpack.
To get started we will create
a simple data file. We will start with a small file first using the data below
1.5
|
34.451
|
3
|
33.779
|
4.5
|
33.152
|
Open a simple text
editor. (If you are in Windows, that will be Notepad. In Unix it might be
emacs.) Then do the following, and DO NOT
type any extra characters!
- Type a "1"
a "." (that's a
period) and a "5".
- Type a tab. This
inserts a tab character
between the "5"
and the next set of numbers.Type the next set of numbers, i.e. "34.451".
- Hit
Enter. (You really are
typing a carriage return and a line feed.)
- Continue with all lines.
- Save the file.
The file that you have
created is a simple data file but it has several interesting properties.
- The file can be loaded by
many different applications. The reason many applications can load one file
is that a simple file structure of this type is used by many different
applications.
- The file structure is so
simple that you pretty much know every chacter in the file. They are
the characters you typed.
- Word processors can load this
file. But if you save it as a word processor file you will save information
about formatting that is not contained in this file.
- Spreadsheets can load this
file. But if you apply functions to your spreadsheet you are adding
information beyond what is contained in this file.
Let's examine exactly
what those characters were that you typed into the file. Here's the sequence of
characters for the first line or so that you typed. Note that when you hit the
Enter key you are
actually typing two characters.
What is typed
|
What you get
|
ASCII Character
|
1
|
1
|
49
|
.
|
.
|
46
|
5
|
5
|
53
|
TAB
|
TAB
|
09
|
3
|
3
|
51
|
4
|
4
|
52
|
.
|
.
|
46
|
4
|
4
|
52
|
5
|
5
|
53
|
1
|
1
|
49
|
ENTER
|
CR
|
13
|
|
LF
|
10
|
Every line on the list above has an ASCII character. That includes the
tab, carriage return and linefeed characters. If you want to explore this
further you might want to check out your favorite word processor. Many word
processors have a feature that allows you to make the
non-printing characters (like
tabs, etc.) visible. Check that out and make sure that you can see how each of
these characters shows up. (You might find the carriage return and line feed
lumped into a single character shown with a paragraph mark.)
- Every letter, every number,
every punctuation mark has a specific numerical representation.
- ASCII characters are what you
manipulate when you put data into a text file.
|