|
|
Storing Methods With Perl
By Alex Osipov
LAST EDITED:
Tuesday, August 6, 2002 0:13 AM
Programmers
often use flat files when storing small amounts of data. Take
for example storing something such as small caching information.
For example for one project I was working on, I needed to
store IP numbers, the unique IP address of the visitor, and
the time the entry occurred. I used flat files for this task
because it was not very data intensive, and the information
was cleared every 15 minutes.
When
doing something like this, you can take 2 different approaches.
You can create a file for each visitor (what I had done, as
I needed to store extra information), something that I like
to call flat-files, or you can have the same file for all
entries.
When
creating many different files you will need to be able to
ensure that you can have a unique filename for each file,
otherwise files will start to overlap after some time. You
can use the Digest::SHA1 modules to generate a 160 bit signature
from random data (only in incredibly rare cases will the signature
to be the same), however there are number of different ways
to do this. Once you generate the unique name you can start
to create the flat file.
# Open file for write only or die.
open(FH, "> $unique_filename") or die("Error:
$!");
# Lock the file.
flock(FH, 2);
# Save the remote ip address, a null, and then the time.
print FH $ENV{REMOTE_ADDR}, "\0", time;
# Close the file and release lock or die.
close(FH) or die("Error: $!"); |
Now
this takes care of saving the data in flat-files. Retrieving
data from a simple structure like this is very simple.
#
We open the file for reading only or die.
open(FH, "$unique_filename") or die("Error:
$!");
# Read the first line from open file.
$line = <FH>;
# Close the file or die.
close(FH) or die("Error: $!");
# Separate the data using split.
($remote_addr, $create_time) = split(/\0/, $line); |
In
this example, the $ENV{REMOTE_ADDR} and the time since epoch
is saved in the $unique_filename file. Be careful to watch
for security risks when using a variable in an open (for more
information read perlsec man page or view it online at http://www.perl.com/pub/doc/manual/html/pod/perlsec.html).
Using the same fundamental ideas you can create much more
complex data structures within flat-files.
As
I mentioned earlier, the other way of using flat files is
to create one larger file for all entries. Retrieving data
from this kind of flat file database can be slower as data
increases, so only use this if it presents something beneficial
to your programs. You've been warned! The basic ideas for
using this type of flat file database is virtually the same
as for flat-files.
Rather
than opening the file for writing as we did in the flat-files
example, we have to open the file for appending, because overwriting
data will not help us in this example. We must also separate
each entry by a delimiter, I will use the newline character,
and we no longer need to use $unique_filename in open because
the filename will be static.
#
Open file for append or die.
open(FH, ">> ./cache.db") or die("Error:
$!");
# Lock the file.
flock(FH, 2);
# Save the unique id, a null, remote ip address, a null, and
then the time since epoch.
print FH $unique_id, "\0", $ENV{REMOTE_ADDR}, "\0",
time, "\n";
# Close the file and release lock or die.
close(FH) or die("Error: $!"); |
For
retrieving data from the file we still needed the $unique_filename
because in order for the program to be able to pick out a
certain entry it needs something to search for, you could
use the remote ip address, or the time, but I personally prefer
a unique id for each visitor (that I save as a cookie, and
retrieve anytime a script is run by the user).
Once you know what the unique id is that you want to retrieve
from the flat file database, you can do the following.
# Open the file for read only.
open(FH, " ./cache.db") or die("Error: $!");
# Loop through each entry in the flat file and look for the one we need.
while ($line = <FH>) {
# Remove the newline character at the end of the line
chomp($line);
# Separate the data on line using split.
($unique_id, $remote_addr, $create_time) = split(/\0/, $line);
# Check if the unique id that we saved earlier matches the one
# that we are looking for this time, where $our_id is the id that
# we are looking for. If the two ids match, we break out of the loop.
if ($unique_id eq $our_id) {
$found = 1;
last;
}
}
# Close the file or die.
close(FH) or die("Error: $!");
unless ($found) {
die("Error: Could not find entry $our_id in the flat file database.");
}
|
In
this example the $unique_id, $remote_addr, and $create_time
will be retrieved from the cache.db file if they match the
$our_id variable, otherwise it will die. You can adapt this
for your own programs with minimal effort. Let me be mention
this again, this can be very inefficient when dealing with
large amounts of data, as the program must loop through every
line until the entry is found. Another deficiency in this
small example is the program will only retrieve the first
entry in the cache.db file and exit, this is what most people
would want, but if you want to retrieve all entries, or the
most recent one, a little more work will be required. (There
are different ways of sorting, and matching data which can
speed this process up significantly.)
I
will mention some other ways of storing data in flat files
as well as other storing data methods, in the following pages.
|
|
|
|
|
|