Data Files - File Format |
Data Files - File Format
File Formats
When you write data to a
file, you want that data to be in a format that can be read by other programs.
Programs you might use are:
- Mathcad,
- Matlab
- Spreadsheets
- etc,
All of these programs can
read the same kind of simple file. Note, that the files we are talking about
here are not typical spreadsheet files that contain formatting and formulas and
other things that allow the spreadsheet to make calculations, and present your
data in graphs, etc. Those files are more complex, and are much more bloated
(larger file size) that what we are talking about here.
Here, we are talking about
very simple files that contain nothing about the data. When you are talking
about simple files there are only a few considerations.
- Data is almost always
organized in rows and columns, so there will have to be ways of marking the
end of one data point and the beginning of the next (some sort of column
marker or delimiter).
- Data is almost always saved
as character strings. If you have numerical data that you want to save,
(and you may have worked hard to get it into numerical format when it
started life as a character string) you will need to convert it into a
character string format before you can save it.
With those considerations,
then, a data file might consist of the following characters.
0
|
.
|
0
|
/t
|
2
|
.
|
1
|
/r
|
/n
|
0
|
.
|
1
|
/t
|
2
|
.
|
5
|
/r
|
/n
|
etc.
|
|
This file would encode the
data that would appear in a spreadsheet looking like this.
Notice how the data appears in this
tab-delimited file.
- The first three characters
give the 0.0 in the upper left spreadsheet cell.
- The tab (/t as an escape
sequence) delimits the first column from the second.
- The next set of characters
gives the 2.1 - the data in the second column.
- The carriage return and line
feed take you to the second row in the table where the same pattern is
repeated.
This pattern repeats for the
rest of the data in the file. The only thing that's not covered by this is an
EOF (End-Of-File) marker that is used to indicate the end of the file.
A Note On File Sizes
The
information above can allow you to calculate or estimate file sizes. You can
compute the number of characters in a row, and multiply by the number of rows.
Here's an example.
Assume
you have data in this format:
xx.yyy xxxx.yy xxxx.yy
- This format has:
- 6 characters in the first
column,
- Then a tab character to
separate columns,
- 7 characters in the
second column,
- Another tab character,
- 7 characters in the third
column,
- And - finally - a
carriage return and a line feed.
- That's 24 characters in a row
- If you have 100 rows, you
would need 2400 bytes (one byte per character) to store this data in a file.
Formatting File Data
When you print data to a file,
you usually want to control the format of the data you write. To do that you
need to know how to use format strings. Format strings specify how data is
stored. Let's examine a few.
- %f
- Data is a floating point
number and will be stored as a floating point number with a decimal
point - but no exponential.
- %5.3f
- Data is a floating point
number and will be stored as a floating point number with a decimal
point. There will be a total of five characters with three characters
to the right of the decimal point. (So, there can be only one character
to the left of the decimal point. You can't use this for 98.666, for
example.)
- %8.4f
- Data is a floating point
number and will be stored as a floating point number with a decimal
point. There will be a total of eight characters with four characters
to the right of the decimal point. (So, there will be three characters
to the left. 98.666 will be stored with a space padding the left side.)
|