File Structure and Size |
File Structure and Size
(For simple data files)
When you store data in a
file, the simplest data files can be imported into many different kinds of
analysis programs. Those files are easy to generate and have great
versatility. It can be important to understand how those files are structured.
Here is an example of a data file.
0.01 33.19
0.02 33.45
0.03 35.70
|
Note the following:
- Point 1:
The data in the file is composed of characters - not numbers.
As a human being, you want to
interpret the characters in the file as numbers, but you need to focus on them
being characters.
- In the data above, the first
character is a "0".
- In the data above, the second
character is a "." (a period).
- etc.
Then:
- Point 2:
Each character is represented by a single
ASCII character which takes a single byte of digital storage.
In a data file, the data is usually
interpreted as being in rows and columns. There are special characters - called
delimiters - that are used to separate columns.
- Point 3:
Delimiters used to separate columns are usually one of the following.
- Tab characters
(number 9 in the ASCII table) Note that tab characters are not part of
HTML so copying file data into an HTML file (as in the table above)
replaces tabs with spaces. Files using tabs as delimters are called
Tab-delimited files.
- Commas
(number 44 in the ASCII table) Files using commas as delimiters are
called Comma-delimited files.
- Point 4:
The end of a row in a tab-delimited or a comma-delimited file is usually
marked by using two characters.
- A Carriage Return
(Number 13 in the ASCII table - denoted by '\r' in many programming
languages and by 'CR' in many others) and
- A Line Feed
(Number 10 in the ASCII table - denoted by '\n' in many programming
languages and by 'LF' in many others)
- Note that the CR + LF
combination is used in many other situations. For example, it is common
to find that combination at the end of a data string when an instrument
sends data to a computer. That's used to indicate the end of the data
string.
When you encounter a file
constructed as above, then it is easy to caculate the size of the file. In the
file above:
- There are four characters in
the "0.01" string - three numbers and a period.
- There is a tab character
between the two columns.
- There are five characters in
the "33.19" string - four numbers and a period.
- There are two characters at
the end of the row - a carriage return and a line feed.
- There are twelve characters
in each row.
- There are three rows, thus 36
total characters.
- Each character is a byte, so
the file size should be 36 bytes.
|