Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
March 30, 2022 03:02 am GMT

Ruby - Convert CSV File to Two Dimensional Array

TLDR;

Ruby comes O.O.T.B. with a module to read and parse CSV files

two_dimensional_array = CSV.read('/path/to/file.csv')=>[["TWO"], ["DIMENSIONAL"], ["ARRAY"]]

This article will cover the basics of working with CSVs in Ruby. I will be operating on a MacOS linux-like file system with a ZSH terminal shell, but I'm sure Windows users can benefit as well!

What is a CSV File?

Popular applications like Excel and Numbers can read and write to pure CSV, but technically their default extensions are .xlxs and .numbers.

CSV means 'comma separated values'. A pure .csv file is really just a string with values separated by commas and newlines. The commas separate the columns, and the newlines separate the rows.

Do you want to see what CSV data looks like?

Navigate to a directory in your terminal where you have a pure CSV file saved.

$ pwd/Users/jvon1904/csv$ lscontacts.csv

Then use the cat command in the terminal with the file name as the argument, and you will see what a pure CSV really is!

$ cat contacts.csvID,First Name,Last Name,Age,Gender1,Victoria,Waite,38,F2,Jamar,Hayes,37,M3,Leonard,Brendle,39,M4,Abby,Atchison,57,F5,Marc ,Stockton,64,M6,Geraldine,Roybal,52,F7,James,Coles,57,M8,Hiram,Spellman,58,M9,Bradford,Vela,41,M10,William,Haskell,74,M11,Christopher,Mason,70,M12,Thomas,Atkinson,68,M13,Peggy,Underwood,37,F14,Charles,Wilson,66,M15,Joanne,Sanchez,42,F16,Leo,Sanders,58,*17,Robert,Castillo,39,M18,Joan ,Traxler,82,F19,Dana,Pitts,78,F20,Susan,Dupont,34,F%

Notice how entries #5 and #18 have spaces after the first name. That's because spaces were accidentally left in the file.

So there it is. CSVs are just values, commas, and newlines.

The Ruby CSV Module

Ruby ships with two libraries, the Core and the Std-lib (Standard Library). The Core contains the classes that make up the Ruby language, stuff like Stings, Arrays, Classes, Integers, Files, etc. That's because everything in Ruby is an object that ultimately inherits from BasicObject.

$ irb> Array.class => Class> Array.class.superclass => Module> Array.class.superclass.superclass => Object> Array.class.superclass.superclass.superclass => BasicObject> Array.class.superclass.superclass.superclass.superclass => nil

Since the Core is the core of Ruby, everything is included whenever you are coding in Ruby.

The Std-lib contains extensions to Ruby. They are modules that need to be required, just like gems, only they are already installed on your computer (unless you deleted them of course). They are worth checking out and contain some really cool and helpful modules.

You can inspect all the code by navigating to where they are stored.

Open up an IRB session and type the global variable $:, it will return an array of paths which Ruby searches for modules in when they are required. Your paths might be different especially if you don't use RVM.

$ irb> $: =>["/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby/3.0.0", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby/3.0.0/arm64-darwin20", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/site_ruby", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby/3.0.0", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby/3.0.0/arm64-darwin20", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/vendor_ruby", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/3.0.0", "/Users/jvon1904/.rvm/rubies/ruby-3.0.3/lib/ruby/3.0.0/arm64-darwin20"]

That neat little variable helps me remember where they are located. The second to last path is where the Std-lib resides.

$ pwd/Users/jvon1904/.rvm/rubies/ruby-3.0.0/lib/ruby/3.0.0$ lsEnglish.rb        expect.rb         open-uri.rb       ripperabbrev.rb         fiddle            open3.rb          ripper.rbarm64-darwin20    fiddle.rb         openssl           rubygemsbase64.rb         fileutils.rb      openssl.rb        rubygems.rbbenchmark         find.rb           optionparser.rb   securerandom.rbbenchmark.rb      forwardable       optparse          setbigdecimal        forwardable.rb    optparse.rb       set.rbbigdecimal.rb     getoptlong.rb     ostruct.rb        shellwords.rbbundler           io                pathname.rb       singleton.rbbundler.rb        ipaddr.rb         pp.rb             socket.rbcgi               irb               prettyprint.rb    syslogcgi.rb            irb.rb            prime.rb          tempfile.rbcoverage.rb       json              pstore.rb         time.rbcsv               json.rb           psych             timeout.rbcsv.rb            kconv.rb          psych.rb          tmpdir.rbdate.rb           logger            racc              tracer.rbdebug.rb          logger.rb         racc.rb           tsort.rbdelegate.rb       matrix            rdoc              un.rbdid_you_mean      matrix.rb         rdoc.rb           unicode_normalizedid_you_mean.rb   mkmf.rb           readline.rb       uridigest            monitor.rb        reline            uri.rbdigest.rb         mutex_m.rb        reline.rb         weakref.rbdrb               net               resolv-replace.rb yamldrb.rb            objspace.rb       resolv.rb         yaml.rberb.rb            observer.rb       rinda

Since they are plain .rb files, you can open them up to see their inner workings. You can even modify them, although don't do it unless you know what you're doing.

As was mentioned, each module in the Std-lib needs to be required. So if you want to use the CSV class, make sure you require 'csv'.

# Otherwise you'll get this:> CSV(irb):15:in `<main>': uninitialized constant CSV (NameError)
# Don't stress, just do this:> require 'csv' => true> CSV => CSV> CSV.class => Class

It's always a great idea to hit up CSV.methods.sort to reference all its capabilities.

Using the CSV Module to Read and Parse CSVs

There are two main methods for reading and parsing CSVs, #read and #parse! Use #read to read an actual file, and #parse to parse a properly formatted string. Let's compare the two.

$ irb> require 'csv' => true> my_csv_string = "this,is,a,csv
can,you,believe,it?" => "this,is,a,csv
can,you,believe,it?"> parsed_data = CSV.parse(my_csv_string) => [["this", "is", "a", "csv"], ["can", "you", "believe", "it?"]]

There it is! A two dimensional array from a CSV!
Just make sure when you want to escape a newline character, you use double quotes.

CSV#parse has two parameters, a string to parse, and a hash of options. Maybe for some odd reason we want to parse a CSV string with that's separated by semicolons... so an SSV? We can pass the col_sep option in like so.

> CSV.parse("this;is;an;ssv
can;you;believe;it?", col_sep: ';') => [["this", "is", "an", "ssv"], ["can", "you", "believe", "it?"]]

The CSV#parse method can parse an actual file, but you have to open the file first. For instance, CSV.parse(File.open('path/to/file.csv')). Thankfully, this is what CSV#read is for!

Extracting Data from CSV Files

I created a simple CSV shown in this screenshot:

contact.csv image

Now let's find the path so we can use Ruby to extract those values with CSV#read!

$ pwd /Users/jvon1904/csv$ lscontacts.csv
$ irb> require 'csv' => true# Make sure you remember the first forward slash in your path> contacts_csv = CSV.read('/Users/jvon1904/csv/contacts.csv') =>[["ID", "First Name", "Last Name", "Age", "Gender"],...> contacts_csv =>[["ID", "First Name", "Last Name", "Age", "Gender"], ["1", "Victoria", "Waite", "38", "F"], ["2", "Jamar", "Hayes", "37", "M"], ["3", "Leonard", "Brendle", "39", "M"], ["4", "Abby", "Atchison", "57", "F"], ["5", "Marc ", "Stockton", "64", "M"], ["6", "Geraldine", "Roybal", "52", "F"], ["7", "James", "Coles", "57", "M"], ["8", "Hiram", "Spellman", "58", "M"], ["9", "Bradford", "Vela", "41", "M"], ["10", "William", "Haskell", "74", "M"], ["11", "Christopher", "Mason", "70", "M"], ["12", "Thomas", "Atkinson", "68", "M"], ["13", "Peggy", "Underwood", "37", "F"], ["14", "Charles", "Wilson", "66", "M"], ["15", "Joanne", "Sanchez", "42", "F"], ["16", "Leo", "Sanders", "58", "M"], ["17", "Robert", "Castillo", "39", "M"], ["18", "Joan ", "Traxler", "82", "F"], ["19", "Dana", "Pitts", "78", "F"], ["20", "Susan", "Dupont", "34", "F"]]

Great! With this data, you now have the power to create class instances with each row, or save them to a database, or whatever you want! In a future article I will write about just that. For now, here's some ideas of how you can play around with this.

# getting a record is easy now> contacts_csv.last => ["20", "Susan", "Dupont", "34", "F"]# retrieve all female contacts> contacts_csv.select { |row| row[4] == 'F' } =>[["1", "Victoria", "Waite", "38", "F"], ["4", "Abby", "Atchison", "57", "F"], ["6", "Geraldine", "Roybal", "52", "F"], ["13", "Peggy", "Underwood", "37", "F"], ["15", "Joanne", "Sanchez", "42", "F"], ["18", "Joan ", "Traxler", "82", "F"], ["19", "Dana", "Pitts", "78", "F"], ["20", "Susan", "Dupont", "34", "F"]]#retrieve the first names of contacts under 40> contacts_csv.select{ |row| row[3].to_i < 40 }.map{ |row| row[1] } => ["First Name", "Victoria", "Jamar", "Leonard", "Peggy", "Robert", "Susan"]

Oops! See how we got the "First Name" there? That's a header, so it shouldn't be part of the records. There's a way to get around this, but instead of getting an array back, we'll get a CSV::Table class. Let's check it out!

# we just need to pass in the headers option> parsed_data = CSV.read('/Users/jvon1904/csv/contacts.csv', headers: true) => #<CSV::Table mode:col_or_row row_count:21>> parsed_data.class => CSV::Table

Be aware the every time you pass in that header: true option, it will return a CSV::Table.
We can access indices the same was as arrays.

# only it will return a CSV::Row class now> parsed_data[0] => #<CSV::Row "ID":"1" "First Name":"Victoria" "Last Name":"Waite" "Age":"38" "Gender":"F">> parsed_data[4][16] => "M"> parsed_data[6].to_h =>{"ID"=>"7", "First Name"=>"James", "Last Name"=>"Coles", "Age"=>"57", "Gender"=>"M"}

We can access columns by using the #by_col method.

> parsed_data.by_col[2] =>["Waite", "Hayes", "Brendle", "Atchison", "Stockton", "Roybal", "Coles", "Spellman", "Vela", "Haskell", "Mason", "Atkinson", "Underwood", "Wilson", "Sanchez", "Sanders", "Castillo", "Traxler", "Pitts", "Dupont"]# use the bang sign `!` to change the orientation of the table> parsed_data.by_col! => #<CSV::Table mode:col row_count:21># now switch it back> parsed_data.by_row! => #<CSV::Table mode:row row_count:21>> parsed_data[14]["First Name"] => "Joanne"

Two more things. Let's see if we can change the format of the integers into floats, so they behave more like currency, and then write the file back to CSV.

> parsed_data.each do |row|>   row["Age"] = row["Age"].to_f> end => #<CSV::Table mode:row row_count:21>> parsed_data.by_col[3] =>[38.0, 37.0, 39.0, 57.0, 64.0, 52.0, 57.0, 58.0, 41.0, 74.0, 70.0, 68.0, 37.0, 66.0, 42.0, 58.0, 39.0, 82.0, 78.0, 34.0]

Now we'll write to a new file. For this we'll use the CSV#open method with two arguments, the path, and a 'w' for 'write'.

> CSV.open('ruby_made_csv.csv', 'w') do |file|# we start by pushing the headers into the file>   file << parsed_data.headers# next we'll push each line in one by one>   parsed_data.each do |row|>     file << row>   end> end => #<CSV::Table mode:col_or_row row_count:21># you can execute shell commands by using back-ticks! > `ls` => "contacts.csv
ruby_made_csv.csv
"# there they are!

Hopefully this has given you a sample of all you can do with CSVs in Ruby!


Original Link: https://dev.to/jvon1904/ruby-convert-csv-file-to-two-dimensional-array-1ih2

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To