From Excel to automation with Ruby
For the last few days I’ve been working in tech support for a large company: I’ve been hired to assist in backing up, mastering and restoring a couple of computers. As I pointed out the need to automate some trivial tasks, our manager gave us an Excel sheet containing data about the machines we need to work on.
The Excel sheet
I was given an Excel file containing four columns, and organized like this:
NAME | SURNAME | OLD HOSTNAME | NEW HOSTNAME |
---|---|---|---|
John | Carmack | DOOM001 | QUAKE001 |
John | Romero | DOOM002 | QUAKE002 |
Fritz | Box | FRITZ!Box3490 | FRITZ!Box7590 |
And I needed some tedious Batch code to match the old hostname with the new one and link them with the names, to efficiently name backup folders. I could have manually copied the data, except that the table was more than 80 rows long!
The solution
By looking at Microsoft Excel, however pretty it may be, I wasn’t going anywhere: I needed to get the data in an usable format. What about exporting it in CSV?
A CSV file (Comma Separated Values) is a format commonly used by datasheet programs to export raw data. The id.csv
file generated by Excel from the table above looks like this:
NAME;SURNAME;OLD HOSTNAME;NEW HOSTNAME
John;Carmack;DOOM001;QUAKE001
John;Romero;DOOM002;QUAKE002
Fritz;Box;FRITZ!Box3490;FRITZ!Box7590
Now this is data I can manipulate!
Now a couple of solutions spring by: I could have created a program that crunched the data and created the necessary structures on the backup medium, or statically generate the Batch code. I went with the latter option, for comfort and ease of use.
I wanted the result to look like this:
IF %COMPUTERNAME%==DOOM001 SET %BACKUPNAME%=CARMACK.JOHN_QUAKE001
The implementation
To implement this tool I chose Ruby, which is a powerful, no fuss language. Let’s start by loading the CSV library and see how it works.
I whipped up this sample code:
require 'csv'
# Open the CSV file
input = CSV.read("id.csv")
# Iterate on the CSV
input.each do |row|
puts "Name: #{row[0]}, Surname: #{row[1]}"
end
And ran it. This was the result:
$ ruby csv2bat.rb
Name: NAME;SURNAME;OLD HOSTNAME;NEW HOSTNAME, Surname:
Name: John;Carmack;DOOM001;QUAKE001, Surname:
Name: John;Romero;DOOM002;QUAKE002, Surname:
Name: Fritz;Box;FRITZ!Box3490;FRITZ!Box7590, Surname:
Ok, so we have got a couple of things wrong:
- The first row is garbage and should not be displayed
- Columns are not separated correctly
The separator
The separator used by Microsoft Excel in this case was not a comma but a semicolon. To fix this issue you can either do Ctrl+H to replace all occurrences, which is inelegant, or think about a solution. Let’s look at the RDocs:
read(path, *options)
Use to slurp a CSV file into an Array of Arrays. Pass the
path
to the file and anyoptions
::new understands. This method also understands an additional:encoding
parameter that you can use to specify the Encoding of the data in the file to be read. You must provide this unless your data is in Encoding::default_external(). CSV will use this to determine how to parse the data. You may provide a second Encoding to have the data transcoded as it is read. For example,encoding: "UTF-32BE:UTF-8"
would read UTF-32BE data from the file but transcode it to UTF-8 before CSV parses it.
You may set any reading and/or writing preferences in the
options
Hash. Available options are:
:col_sep
The String placed between each field. This String will be transcoded into the data’s Encoding before parsing.
Bang, looks like we can do it quite easily with no hit-and-miss substitutions. Our code now looks like this:
#!/usr/bin/ruby
require 'csv'
# Open the CSV file
input = CSV.read("id.csv", {:col_sep => ';'})
# Iterate on the CSV
input.each do |row|
puts "Name: #{row[0]}, Surname: #{row[1]}"
end
Notice the {:col_sep => ','}
: this is an hash, the quick way to do associative arrays in Ruby. :col_sep
is a symbol, a way to name things that uses less memory than a string, but can easily be looked up and created on the fly. Our output is much prettier too:
$ ./csv2bat.rb
Name: NAME, Surname: SURNAME
Name: John, Surname: Carmack
Name: John, Surname: Romero
Name: Fritz, Surname: Box
The first row
I could manually strip the header from the file (in fact, originally I did), but this wouldn’t be elegant at all: let’s look at a quick way to do this automagically.
We could tackle the issue in different ways:
- Check whether the row contains “NAME”, which shouldn’t occur elsewhere
- Skip the first line
I decided to elaborate a bit on the first option and create a solution that could be useful in many cases:
#!/usr/bin/ruby
require 'csv'
key_index = 0 # Index of the key
key_blacklist = ["NAME"] # Items that should be skipped
# Open the CSV file
input = CSV.read("id.csv", {:col_sep => ';'})
# Iterate on the CSV
input.each do |row|
key = row[key_index]
puts "Name: #{row[0]}, Surname: #{row[1]}" unless key_blacklist.include?(key)
end
Now the output looks like this:
$ ./csv2bat.rb
Name: John, Surname: Carmack
Name: John, Surname: Romero
Name: Fritz, Surname: Box
In case I wanted to exclude my friend Fritz Box who doesn’t exist and has never been an Id employee, I could just add "Fritz"
to my blacklist and the output would look like this:
$ ./csv2bat.rb
Name: John, Surname: Carmack
Name: John, Surname: Romero
The script could now be modified to accept those as command line parameters but that’s out of the scope of my discussion.
Getting a Batch file
The output is still not what I need: my goal was to generate Batch code straight out of my script. Let’s add the final touches and call it a day:
#!/usr/bin/ruby
require 'csv'
key_index = 0 # Index of the key
key_blacklist = ["NAME", "Fritz"] # Items that should be skipped
# Open the CSV file
input = CSV.read('id.csv', {:col_sep => ';'})
# Open the Batch file
File.open('csvlut.bat', 'w') do |out|
# Iterate on the CSV
input.each do |row|
key = row[key_index]
unless key_blacklist.include?(key)
out.puts "IF %COMPUTERNAME%==#{row[2]} SET BACKUPNAME=#{row[1].upcase}.#{row[0].upcase}_#{row[3]}"
end
end
end
In case code’s not clear enough, I told the script to open a file in write mode, put a couple of lines of batch code in and terminate, relying on the fact that in Ruby everything is an object, so I can iterate on File.open()
even if I’m opening only one file. The postfix unless
, which is syntactic sugar for if !
, was swapped with standard notation to reduce the length of line 17. The output looks like this:
$ ./csv2bat.rb
$ cat csvlut.bat
IF %COMPUTERNAME%==DOOM001 SET BACKUPNAME=CARMACK.JOHN_QUAKE001
IF %COMPUTERNAME%==DOOM002 SET BACKUPNAME=ROMERO.JOHN_QUAKE002
Wrapping up
Ruby feels like the optimal language to quickly perform text transformation and line operations on text files: it’s powerful enough to be up to up to the most tedious tasks, yet simple enough to whip up a quick-and-dirty solution that gets the job done in mere minutes.