Datos personales

miércoles, 24 de septiembre de 2014

Banker Algorithm - Safe State

This time i'll share a simulation of the Banker algorithm with safe state that i did on my intern program at IPICYT-CNS LAB.

The Banker's algorithm is a resource allocation and deadlock avoidance algorithm developed by Edsger Dijkstra that tests for safety by simulating the allocation of predetermined maximum possible amounts of all resources, and then makes an "s-state" check to test for possible deadlock conditions for all other pending activities, before deciding whether allocation should be allowed to continue.

We will use C++ and Makefile to build files.

GitHub Repository

Work directory:
- banker/
|- lib/
|- - processes.cpp
|- - processes.h
|- src/
|- - banker.cpp
|- Makefile

sábado, 7 de junio de 2014

Cassandra: Partitioners and Tokens

Currently i'm intern at Flytecomm working on data engineering and as big data trainee we want to upgrade our system using new technologies as NoSQL with Cassandra for time series.
The most important to start is to know how cassandra stores data and then choose the right partitioner depending on the queries that we need, we will use CQL3 to get token values.

We want to share our test using token function on range queries based in tokens.

A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a partition key. Each row of data is uniquely identified by a partition key and distributed across the cluster by the value of the token.

Tested:
Murmur3Partitioner
ByteOrderedPartitioner
More info about partitioners

Cassandra version 2.0.7
- Schema
create keyspace ksptest
with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

use ksptest;

create table data1(
    key1 int,
    key2 int,
    id int,
    column1 timestamp,
    column2 timestamp,
    PRIMARY KEY((key1, key2), id));


INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(1,2014,1,'2014-01-01 16:50:45 -0500','2014-01-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(2,2014,1,'2014-01-01 17:50:45 -0500','2014-01-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(3,2014,1,'2014-01-01 18:50:45 -0500','2014-01-01 20:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(4,2014,2,'2014-02-01 16:50:45 -0500','2014-02-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(5,2014,2,'2014-02-01 17:50:45 -0500','2014-02-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(6,2014,2,'2014-02-01 18:50:45 -0500','2014-02-01 20:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(7,2014,3,'2014-03-01 16:50:45 -0500','2014-03-01 17:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(8,2014,3,'2014-03-01 17:50:45 -0500','2014-03-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(9,2014,3,'2014-03-01 18:50:45 -0500','2014-03-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(10,2014,3,'2014-03-01 19:50:45 -0500','2014-03-01 20:50:45 -0500');
- Using Murmur3Partitioner

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,1);

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,2);

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,3);

TOKENS

token(key1, key2) | key1 | key2
----------------------+------+------+----
-2662374876872028068 | 2014 | 1
(3 rows)

token(key1, key2) | key1 | key2
----------------------+------+------+----
-8469758598453416143 | 2014 | 2
(3 rows)

token(key1, key2) | key1 | key2
---------------------+------+------+----
8957232040621060434 | 2014 | 3
(4 rows)

Querying token ranges

- Okey the first logic query should be get all from 1 to 3 row key and get 10 rows

SELECT token(key1,key2),key1,key2,id FROM ksptest.data1 where
token(key1,key2) >= token(2014,1) and token(key1,key2) <= token(2014,3);

token(key1, key2) | key1 | key2 | id
----------------------+------+------+----
-2662374876872028068 | 2014 | 1 | 1
-2662374876872028068 | 2014 | 1 | 2
-2662374876872028068 | 2014 | 1 | 3
8957232040621060434 | 2014 | 3 | 7
8957232040621060434 | 2014 | 3 | 8
8957232040621060434 | 2014 | 3 | 9
8957232040621060434 | 2014 | 3 | 10
(7 rows)

Why we get 7 rows?

Because we are querying tokens so we are searching a hash(token)

Sorted tokens
-8469758598453416143 2 (3 rows)
-2662374876872028068 1 (3 rows)<--
8957232040621060434 3 (4 rows)<--

So if we query from 2 to 3 we will get all row keys and that is because our tokens are created randomly by the partitioner and the range query go through all nodes requested only


- Using ByteOrderedPartitioner

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,1);

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,2);

SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,3);

TOKENS

token(key1, key2) | key1 | key2
--------------------------------+------+------
0x0004000007de0000040000000100 | 2014 | 1
(3 rows)

token(key1, key2) | key1 | key2
--------------------------------+------+------
0x0004000007de0000040000000200 | 2014 | 2
(3 rows)

token(key1, key2) | key1 | key2
--------------------------------+------+------
0x0004000007de0000040000000300 | 2014 | 3
(4 rows)

Querying token ranges

Same query to get all rows from 1 to 3 and should get 10 rows

SELECT token(key1,key2),key1,key2,id FROM ksptest.data1 where
token(key1,key2) >= token(2014,1) and token(key1,key2) <= token(2014,3);

token(key1, key2) | key1 | key2 | id
--------------------------------+------+------+----
0x0004000007de0000040000000100 | 2014 | 1 | 1
0x0004000007de0000040000000100 | 2014 | 1 | 2
0x0004000007de0000040000000100 | 2014 | 1 | 3
0x0004000007de0000040000000200 | 2014 | 2 | 4
0x0004000007de0000040000000200 | 2014 | 2 | 5
0x0004000007de0000040000000200 | 2014 | 2 | 6
0x0004000007de0000040000000300 | 2014 | 3 | 7
0x0004000007de0000040000000300 | 2014 | 3 | 8
0x0004000007de0000040000000300 | 2014 | 3 | 9
0x0004000007de0000040000000300 | 2014 | 3 | 10
(10 rows)

Conclusion

We can now understand how token function works and how return data but choose ByteOrderedPartitioner means a complex work to manage a cluster and there are very few good reasons to use it.

Here you can find an analysis about partitioners:

Apache Cassandra: The Case Against The ByteOrderedPartitioner

miércoles, 22 de enero de 2014

Ruby on Rails automation script for new project

When i started to develop rails applications(short time ago) setup the project was the first: name, database, rvmrc, ruby version, gems and maybe some third-party angular, bootstrap, etc. I get bored quickly, because i had to do it each time for a project in order to practice rails, so i wrote a simple script to do it fast and start to code happy.


To do:

#Install ruby version
$rvm install 2.0.0

Work directory sample bootstrap - postgreSQL
- railsprojects
|- railsnew.sh
|- Gemfile
|- dist/
Setup Gemfile pg & haml

source 'https://rubygems.org'

gem 'rails', '4.0.0'
gem 'pg'
gem 'sass-rails', '~> 4.0.0'
gem 'uglifier', '>= 1.3.0'
gem 'coffee-rails', '~> 4.0.0'
gem 'jquery-rails'
gem 'jquery-ui-rails'
gem 'turbolinks'
gem 'jbuilder', '~> 1.2'
gem 'haml'

group :test,:development do
  gem 'rspec-rails'
  gem 'pry'
end

group :test do
  gem 'capybara', "2.0.2"
  gem 'fabrication'
  gem "shoulda-matchers"
end

Download bootstrap
$wget https://github.com/twbs/bootstrap/releases/download/v3.0.3/bootstrap-3.0.3-dist.zip
$unzip bootstrap-3.0.3-dist.zip 
Script railsnew.sh
echo "Running"
rails new "$1" --database=postgresql

cp Gemfile $PWD"/$1"
cd $1
echo "$1" > .ruby-gemset
cat .ruby-gemset
echo "$2" > .ruby-version
cat .ruby-version

cp ../dist/js/bootstrap.js  $PWD/app/assets/javascripts/
cp ../dist/css/bootstrap.css  $PWD/app/assets/stylesheets/
Permissions
sudo chmod +x railsnew.sh
To run the script we need to pass 2 parameters "appname" "ruby-version"
#Our ruby list
$rvm list

rvm rubies

=* ruby-1.9.3-p429 [ i686 ]
   ruby-2.0.0-p195-perf [ i686 ]

# => - current
# =* - current && default
#  * - default

Execution & verification
$./railsnew.sh testapp ruby-1.9.3-p429
$cd testapp
$rvm current
$bundle
That's it, you can add/edit what you need for your projects

domingo, 5 de enero de 2014

Benchmark Ruby Special Pythagorean Triplet

More info about Problem 9

More info about Benchmark

System Values
Ubuntu: Release 12.04 (precise) 32-bit
Kernel Linux 3.5.0-45-generic
Processor : Intel® Pentium(R) Dual CPU T2330 @ 1.60GHz × 2
Memory: 2.0 GiB

#!/usr/bin/ruby
require 'benchmark'
class PythagoreanTriplet

  def square val
    val * val
  end
  
  def add_square(a,b)
    square(a) + square(b)
  end

  def triplet(a,b,c)
    square(c) == add_square(a,b) ? true : false
  end

  def triplet_addition(a,b,c)
    a+b+c 
  end

  def get_product_abc
    for a in 0..500
      for b in 0..500
        for c in 0..500
          if a < b && b < c
            triplet(a,b,c) ? find = triplet_addition(a,b,c) : find = nil
            printf '%d - %d - %d ', a,b,c if find == 1000
            break if find == 1000
          end
        end
        break if find == 1000
      end
      break if find == 1000
    end
  end

end

pt = PythagoreanTriplet.new
puts Benchmark.measure { pt.get_product_abc }

Results

No break statement

200 - 375 - 425  
42.950000   0.170000  43.120000 ( 43.553589) <- Time

Break statement

200 - 375 - 425  
25.200000   0.040000  25.240000 ( 25.396476) <- Time