URL parser ruby C extension binding with C library re2c lexer

This is step-by-step description how to create Ruby C extension using static library.
Creating Ruby extensions is not a complicated task. There are plenty of tutorials and articles. Also Ruby API is clean and well described. For me, an interesting task was to create a Ruby project using external library included in the project within git repository.

Lets imaging we want to create Ruby library for parsing text. Lets say, URL. Of course, there is standard Ruby library (URI), and it is ruby code. But if we want something faster,  it is good idea to use C extension.


For this tutorial, I have created a C project to parse URL following RFC 3986 specification. The specification has Collected ABN  form for URI in Appendix A. There is no need to manually create such a parser. Luckily, there are open source projects that helps to create it: flex, ragel, antlr, re2c etc.
I will be using re2c, which is light and easy to learn/use lexer and C code generator.
My project is quite a simple library project that can be used by other C project or as an extension for Ruby, Python or other languages.

So, here is a summary of what I am going to use in my Ruby URL parser project:

  • C library for parsing URL generated by re2c.
  • git for code management.
  • rake tasks to facilitate binding Ruby and C during development.
  • rspec for test-driven development. In my opinion, covering code with tests is an essential for any project.
  • guard to automate testing
Also, I am using RVM for ruby versions management and ruby version 2.3.1 for that project.

Bootstrap project


We will start with generating project skeleton using bundle:

bundle gem url_parser

This will bootstrap project in "url_parser" directory. Change to the created directory and lets start with file "url_parser.gemspec" updating author information, description and adding dependencies:

...
  spec.add_development_dependency "guard"
  spec.add_development_dependency "guard-rspec"
  spec.add_development_dependency "rake-compiler"
...

Next, lets create extension directories:
mkdir ext
mkdir ext/url_parser
mkdir ext/lib

Include external library as git submodule:
git submodule add https://github.com/staskobzar/url_parser_re2c.git ext/lib/url_parser_re2c

Fix Ruby version for our project:
rvm --ruby-version use 2.3.1

And install dependencies (make sure you have updated summary, description and homepage in url_parser.gemspec before):
bundle install

Commit your changes to git.
git add .
git commit -m "Initial commit."

Tests

Initialize rspec:
rspec --init

This will create all you need to start testing. But what we want is to automatically run tests if we are changing files in project. For that we will use "guard" for rspec. Lets initialize "guard" and create Guardfile:
guard init

Run guard and change something in any file in lib/ or in spec/ directory to see how "guard" run tests.

C extension

Create file "ext/url_parser/extconf.rb" with only two lines:

require 'mkmf'
create_makefile 'url_parser'


Then create file "ext/url_parser/url_parser.c" with following content:

#include <ruby.h>

void Init_url_parser(void)
{
  VALUE mUrlParser = rb_define_module ("UrlParser");
  VALUE cUrlParser = rb_define_class_under (mUrlParser, "URL", rb_cObject);

}

We have defined Ruby module "UrlParser" and module class "URL".

Good, it is time to try our C extension. Run following commands:
cd ext/url_parser
ruby extconf.rb
make

This might produce some warnings now but it is ok for now.
Now lets test extension:
ruby -e 'require "url_parser"; puts UrlParser::URL'

If it does not produce any errors, then everything is good so far.
Do not forget to clean extension directory:
make clean

And return to project root:
cd ../../

Now, it is time to automate C extension creation for development.
Change Rakefile and add following lines:
require 'rake/extensiontask'
require 'rake/clean'

CLEAN.include 'lib/url_parser.so'
Rake::ExtensionTask.new('url_parser')

Rake/extensiontask is a part of ruby-compiler gem which is smart enough to do everything for you with only one line. This will create rake tasks "compile" and "clean" which we can in our project.
Run command:
rake compile

And you should have new file "lib/url_parser.so". Ok, now lets remove "url_parser.rb" to make sure Ruby loads our shared library "url_parser.so" and not "url_parser.rb" which was created by bundle:
git rm lib/url_parser.rb

Include shared library


It is time to include re2c library which implements parser. As you remember we store it in directory "ext/lib/url_parser_re2c". This project has its Makefile for compiling, but it is using call to re2c command. This command probably will not be available on target machine which will be installing our gem. So we better create our own compiling procedure.
First lets update our "url_parser.gemspec" to make sure package will include our source files and add following lines:
...
  spec.files        += %w'ext/lib/url_parser_re2c/src/url_parser.c'
  spec.files        += %w'ext/lib/url_parser_re2c/src/url_parser.h'
  spec.extensions    = ["ext/url_parser/extconf.rb"]
...

Update file "ext/url_parser/extconf.rb" and add following lines before "create_makefile":

find_executable('cc')
find_executable('ar')

libdir = File.expand_path(File.join(File.dirname(__FILE__), "../lib/url_parser_re2c/src"))

Dir.chdir(libdir) do
  system 'cc -fPIC -c -o url_parser.o url_parser.c'
  system 'ar rcs liburlparser.a url_parser.o'
end

$libs += " -lurlparser"
$INCFLAGS << " -I#{libdir}"
$LIBPATH << libdir


First two lines make sure we have compiler and archiver for library create. Then we change to library directory, compile and creates library "liburlparser.a". Last three lines add configuration for Makefile to include library and adds include directory to help find our library header file "url_parser.h".
Now lets include header file in "ext/url_parser/url_parser.c" file. Add line:

#include "url_parser.h"

Now, if you run rake task to compile extension, shared object "url_parser.so" will be created using library "liburlparser.a" file which is a part of our submodule library:
rake compile

Guard extension files

Last step to finish our development environment is to configure "guard" to watch extension directory and if C source files are changed, automatically recompile extension and re-run all tests.

Before continue, we need one more Rake task to make all recompile job. In file Rakefile add following lines:

desc "Re-build extension library and run spec"
task :spec => [:prereq, :spec_runner]

begin
  require 'rspec/core/rake_task'
  RSpec::Core::RakeTask.new(:spec_runner)
rescue LoadError
end

task :prereq => [:clean, :liburlparser, :compile]

desc "Build static library liburlparser.a"
task :liburlparser do
  sh "make -C ext/lib/url_parser_re2c"
end



Guard init script generates configuration more that we need for the project. Mostly configuration is good for Ruby-on-Rails project. We can replace all the configuration with following:

guard :rspec, cmd: "bundle exec rspec" do
  require "guard/rspec/dsl"
  dsl = Guard::RSpec::Dsl.new(self)

  # Feel free to open issues for suggestions and improvements

  # RSpec files
  rspec = dsl.rspec
  watch(rspec.spec_helper) { rspec.spec_dir }
  watch(rspec.spec_support) { rspec.spec_dir }
  watch(rspec.spec_files)

  # Ruby files
  ruby = dsl.ruby
  dsl.watch_spec_files_for(ruby.lib_files)

  watch('ext/url_parser/url_parser.c') do
    `bundle exec rake prereq`
    "spec"
  end
end

Note line "watch('ext/url_parser/url_parser.c') do", it watches C source file and if it is changed runs rake task to re-compile shared library and install it in "lib" folder and re-run all tests.
Run command "guard" and save C extension file or spec files. You should see in "guard" console compile and rspec tests output.

Now we have our project setup and ready for development.

Put all together

Lets start with tests. Update file "spec/url_parser_spec.rb" to make it test our URL class:



After you save file, guard should run all the tests and fail. We are on the first step of "red/green/refactor" TDD mantra. We have defined our class methods for URL.

Lets add our C code. Update file "ext/url_parser/url_parser.c" to make it look like this:

Now when you save file, guard runs tests and the should all pass.

Finally, we can create gem file with command:
rake build

This will create file "pkg/url_parser-0.1.0.gem". Now we can install our gem:
gem install pkg/url_parser-0.1.0.gem

To test if it works, change to any other directory (for example /tmp), make sure you are running the same Ruby version under which you have installed the gem and run this:
ruby -e "require 'url_parser'; puts UrlParser::URL.new('http://example.com').host"

Summary

We have created ruby project with C extension. Our C extension is using external library and binds it to Ruby Module::Class. Development environment allows to recompile all the code and run tests automatically whenever files are saved.
I do not go into details with C code but it is pretty simple.

Source code of the described project is here.
You can also fine C URL parsing project with re2c lexer here.




Comments

Popular posts from this blog

Asterisk Queues Realtime Dashboard with amiws and Vue

YAML documents parsing with libyaml in C