snapsvg

2014-12-09

Day 9: Scalar filehandles, or IO, IO, it's not to disk we go

Did you know you can open a variable as a file handle?

This is a great trick that avoids temporary files. You can write to the filehandle, and the stuff written thereto are available in the other variable. I'm going to call the other variable the "buffer"; this is a common term for a-place-where-data-get-stuffed.

Here's an example whereby I created an XLS spreadsheet entirely in memory and uploaded it using WWW::Mechanize. The template for the spreadsheet came from __DATA__, the special filehandle that reads stuff from the end of the script.

This allowed me to embed a simple CSV in my script, amend it slightly, and then upload it as an XLS, meaning I never had to have a binary XLS file committed to git, nor even written temporarily to disk.

In the example below, a vehicle, identified by its VRM (registration plate) is uploaded in an XLS spreadsheet with information about its sale. The $mech in the example is ready on the form where this file is uploaded.

The main problem this solves is that the VRM to put into the spreadsheet is generated by the script itself, meaning that we can't just have an XLS file waiting around to be uploaded. As noted, it is also preferable not to have to edit an XLS file for any reason, essentially because this can't be done on the command line - LibreOffice is required, or some Perl hijinks.

open my $spreadsheet_fh, ">", \my $spreadsheet_buf;       # [1]
my ($header, $line) = map { chomp; [split /,/] } <DATA>;  # [2]
my $xls = Spreadsheet::WriteExcel->new($spreadsheet_fh);  # [3]
my $sheet = $xls->add_worksheet();

# processing

$line->[0] = $vrm;

$sheet->write_col('A1', [ $header, $line ]);              # [4]
$xls->close;

$mech->submit_form(
  with_fields => {
      file => [ [ undef, 'whatever', 
          Content => $spreadsheet_buf ],                  # [5]
      1 ]
  },
  button => 'submit',
);

# [5]
__DATA__
VRM,Price,Fees,Collection,Valeting,Prep costs
,2333,10,0,10,0

The key to this example is in [1], which looks like a normal open call except for the last expression:

\my $spreadsheet_buf;

This is a valid shortcut to declaring the $spreadsheet_buf and then taking a reference to that:

my $spreadsheet_buf;
open my $spreadsheet_fh, ">", \$spreadsheet_buf;

The clever part is that now, $spreadsheet_fh is a normal filehandle that can be used just like any other; just as if we'd used a filename instead of a scalar reference. At [3] you can see a normal Spreadsheet::WriteExcel constructor, taking a filehandle as the argument, as documented.

At [2] you can see DATA in use, which reads from __DATA__ at [5]. This also acts like a normal filehandle; <DATA> reads linewise, and we have to chomp to remove the newlines.

We map over these lines, chomping them and using split /,/ to turn them into lists of strings; and this list is inside the arrayref constructor [...], meaning we get an arrayref for each line.

At [4] we have processed sufficiently to have installed the VRM in the gap at the front of the second line, i.e. the zeroth element of $line, so write_col is employed to write both arrayrefs as rows (yes I know) into the spreadsheet.

When we call $xls->close, this writes the spreadsheet to the filehandle. But no file is created; instead, the data go to $spreadsheet_buf. If we were to print $spreadsheet_buf to a file now, we would get an XLS we can open.

Instead, at [5], we use the trick documented in submit_form (ether++ for reading everyone's mind) to use the file data we already have as the value of the form field.

This trick is remarkably useful. You can reopen STDOUT to write to your buffer:

{
    local *STDOUT;

    open STDOUT, ">", \my $buffer;

    do_stuff_that_prints();

    do_stuff_with($buffer);
}

but that's better written

my ($buffer) = capture { do_stuff_that_prints() };

from Capture::Tiny.

See also

If you use IO::Handle then your $spreadsheet_fh will be an object like any other - but these days, you get that simply by using lexical filehandles anyway.

IO::Scalar seems like a decent OO-type module to deal with this but also look nice.

IO::String also works with strings-as-IO.

I've not tried either of these latter two, but YMMV etc.