Skip to Content Skip to Search Go to Top Navigation Go to Side Menu


Controlled Vocabulary And Semantic Web


Monday, October 29, 2007

If you’re not sure what’s the difference between a controlled vocabulary, a taxonomy and a thesaurus, this blog article is quite good at clarifying the above concepts and more.

What’s not covered are tags and folksonomy.

Fun with Google’s Image Labeler


Wednesday, October 17, 2007

Interesting article on Boing Boing about Google image labeler and how to entice the mass into a dull job by the means of a game challenge…

It’s a clever way to do retrospective tagging. Organizations with large archives of data are likely to do more of this in order to open access to their “long tail”.

In a non-digital world, an analogy could be the way the BBC organized national “treasure hunts” to retrieve long missing archive material, except they were one-offs.

Google has found a way to harness the power of the mass (and no, this has nothing to do with what happens to humans in the Matrix trilogy, although …)

Automation of release notes in agile projects


Wednesday, September 19, 2007

My team works in short iterations at the end of which we should be able to release a new version of the software with added value.

To facilitate the generation of release notes, another team I was observing is using markups in their version control commit message: they add [R] in the commit message to denote the addition new features to the repository.

I thought it was a good idea and introduced it as a version control standard in my team.
After a couple of iterations, there weren’t many [R] in the commit messages.

We found ourselves sometimes shying away from adding an [R] to the commit message, as we’re not always sure a feature is done or not. Also we do commit often, I’d say compulsively, at various steps of our work: it’s easy to commit the last chunk without realizing it’s the last. It is also easy to mark a commit with [R] on a chunk of code for which some files are accidentally missing from the commit.

After further investigation, I discovered a difference in branching strategy between the two teams:
In my team we branch on ad-hoc basis when we are about to start a risky task, but keep working on trunk for work with limited scope and impact. The other team is systematically creating branches for ANY new work they are doing, which means that merging branches (after running the test suite on that branch) to trunk define what is a completed feature and the [R] mark tends to be found on commit message for merges.

Since we are not applying a branch for every feature policy we need to find other ways to identify task done in an iteration. What we are doing though is acceptance test driven development.
It starts with a user story, then the associated acceptance tests. We use fit to write FIT tables for acceptances and write the fixture code for the automated test. The Fit infrastructure is held in Fitnesse which is a wiki-on-steroids with an integrated FIT runner.

Each user story is en entry in the wiki, and Fitnesse allows you to create virtual wiki links on wiki entries which allow easy creation of indexes (dynamically or statically updated).
We are already using these facilities to regroup all completed and validated stories on a page called RunningTestedFeature which is then executed as a suite as part of our continuous integration process for regression testing.

The interesting bit is that in a similar way, for each iteration we can also create a page linking to all stories planned for that iteration at the end of which the suite is executed . The stories whose acceptances tests passed become the bullet point item in the release note if we decide to release.

And there is an added bonus: by keeping these iteration index pages over time, it will help the calculation of velocity as it becomes easy to count how many stories are completed per iteration.

Finger Shopping


Tuesday, September 4, 2007

Can you imagine yourself doing your grocery shopping and when you arrive at the till, you pay only by putting your finger on a finger print reader?

According to this spanish article, there is a trial in Germany that seems successful enough to be extended to hundreds of stores.

Apparently the system is found useful by elderly people who don’t have to memorize pin number, or fiddle with coins and notes.

The drawback is that it is one more company to hold your payment details.
In other hand, being brick and mortar, they are not going to disappear the next day.

Source: faq-mac

Perl OO = Evil ?


Sunday, September 2, 2007

In response to Chris’ comment about “OO=Evil” and his perl program that he said would have been easier to change had it been written in Java:

by saying “…much easier to alter if it were written in java.” , I think you meant to say ”much easier if it was properly encapsulated, layered with  clearly defined interfaces between components and recognizable abstractions and design patterns”

Object Orientation paradigm is all about that:

  • Modularity through encapsulation, abstraction, interfaces
  • Re-usability through polymorphisms, inheritance, interfaces

The existence of documented and proven design patterns for OO further helps build and refactor  OO programs.

These are characteristics of OO programming in general, not specific to a language.

You can write proper Object Oriented Programs in perl (see Damian Conway’s PBP or better, the recent effort around Moose)

You can apply the OO design patterns to such perl programs (e.g:  see Object::PerlDesignPatterns)

and they will have the same benefits as properly written Java programs (easy to refactor and to reuse code).

It’s just that:

  • Java was designed for Object Orientation, therefore if you write java you are forced into this paradigm. Because of that  they are more tools for java to assist OO programming (especially for refactoring).
  • OO in Perl 5 has too many unsatisfactory way to do it and a too verbose syntax

Regarding the talk on Perl’s Worst Practices and the seemingly controversial  “OO=Evil”:

what Mark Fowler  meant is that, OO is not only the paradigm in computing and it’s not always the best way to solve a problem.

You’ve got OO programming, procedural programming, functional programming,  aspect oriented programming,…

The nature of perl doesn’t force you in any of these (”There’s More Than One Way To Do It”), and each of these paradigms are better than the others for their own class of problems.

In other words,  a skilled programmer should use the paradigm that suit best the domain of problems it tries to solve.

Of course, all programmers involved in writing or changing such programs need to have the same understanding of the domain and the best suited paradigm otherwise …

Mark Fowler also hinted at the multiple unsatisfactory ways along with heavy syntax when doing OO with perl 5 as a reason for OO=Evil.

Perl 6 has a new object system which is very good and it is being ported on perl 5 as Moose.

Regarding my views on OO=Evil

I’m a big fan of Object Orientation as implemented with Objective-C or Small Talk.

Java OO is less elegant than
ST/Objective-C’s.

Perl 5 OO implementation upsets me more, but at least I can choose not do OO in perl.
Also, the prospects of Perl 6 and the influence it had on Perl 5 (Moose) is starting to make OO in perl more interesting and desirable.

I think OO is not always the best paradigm, and I’m liking Functional Programming more and more.
Mark Jason Dominus’ High Order Perl  is the  “FP design patterns” reference for perl programmers.

I found it difficult to get my head around (I’m still in the first chapters of HOP) and also I don’t know
the best practices for unit testing FP programs.

I can restrospectively see a bunch of code of mine that might have benefitted of FP. Conversely, I have also unappropriately applied FP to some code. My current project though is better served by using OO.

Going back to OO, Ruby’s OO is close to Small Talk, and Perl 6 is also borrowing things from Small Talk (ST traits will be known as roles in Perl 6) among other languages.

So, OO in Perl 5 is the devil, but often you have to sleep with it and at the end you get used to it.
If I can identify problems in code I create or change that are better solved by FP (I really need to make progress with HOP), I’ll go for it.
In Perl 6,  OO (and FP too) will be dramatically improved.

If you can’t wait, go for Ruby or Perl 5+Moose.

YAPC::Europe: Thursday


Friday, August 31, 2007

it’s just my notes, modified for obvious spelling errors fixes and URLs for the interesting bits. It may contains errors. I’ll post proper and scoped articles later.

Unicode (by Juerd Waalboer)

characters
are not bytes

in 8 bit encoding, one char mas to
one byte
that means you can have at most 256 diff values
enough
for roman and Russian
enough for roman alphabet and Greek
not
enough for roman and Russian and Greek

multi
byte encodings

more bytes => more
characters
fixed width, variable width
unicode
encodings are all multi-byte
UTF-8 is very popular on the
Internet
UTF-16 is the internal encoding in MS Windows

 *
“Character Set”
character set is character <->
number
Unicode is a charset
encoding is number
<-> bytes
UTF-8 is an encoding
MIME
calls them both “charset”
Perl calls them both “encoding”

2
kinds of strings:
perl has one string type
the
universe has several
“text string” and “binary string”
a.k.a
“character string” and “byte string”
the computer doesn’t
know the diff
you should know

*
Unicode perl

text strings are unicode strings,
not UTF-8
ISO-8859-1 maps to 0..255, useful!
perl
keeps stings at ISO-8859-1 as longs as possible
if that
doesn’t work, it upgrades to UTF-8 internally
if you mix the
two kinds, UTF-8 wins.

Prime rule:
Do
not mix byte strings with text strings
except if you
explicitly convert between them

decoding: bytes
-> characters (binary to text)
encoding:

first
slide:
All communication with the “outside world” is in bytes
something
has to decode their binary input to text
something has to
encode your text output to binary

read input
decode
input
process data
encode output
write
output

Neat trick:

Perl
lets you use code points (character numbers)
that do not yet
officially exist

In practice part 1:

use
Encode;

my $text = decode(”UTF-8″,$binary_input)
;
my $output = encode(”UTF-8″,$text) ;

you
have to encode otherwise the output will be character not binary

part
2:
let perl do the hard work!

binmode
STDIN, “:encoding(ISO-8859-1)” ;
binmode STDOUT
“:encoding(”UTF-8″)” ; # don’t forget the hyphen

print
while <> ;

Unicode semantics

perl
has unicode semantics
lc, uc,lcfirst, ucfirst
case
insensitivity
character classes like \w
perl also
has ASCII semantics :(
hard to tell which semantics will be
used for some operation
utf8::upgrade($your string) to
ensure Unicode semantics

in perl
5.9.5: perlunitut
in perl 5.9.5: perlunifaq
http://juerd.nl/site.plp/perluniadvice

Don’t
use encoding.pm
it is broken and cannot be fixed. Using i
will hurt.

encoding::stdio

http://juerd.nl/perlunitut.html

http://www.cafepress.com/perl5/

Making
of ibeatgarry.com (by Karlheinz Zoechling)

Garry
Kasparov

The
oracle of Bacon at Virginia

do the
same with chess instead of movies
garry kasparov instead of
Kevin Bacon
The question: How many hops are needed to defeat
Garry at least transitively

data source:

Chessbase
Megabase 2005 (2007)
3 507 786 chess games
proprietary
data format
but can export to PGN (portable game notation,
Clear text format)

problem 1: max export is 2gb
files -> need split the export

Chess::PGN::Parse
from
PGN to PostgreSQL
ID is created

most
logic in sql

206 650 players

draw
game are discarded
short games less than 5 moves are
discarded too (defaulted games, drunk players, other silly stuff …)

discard
games that aren’t tournament games
leaves 2 385 622 games

->
graph problem -> don’t know graph theory -> CPAN!
->
Shortest Path problem: Graph::*
-> interesting:
Dijkstra algorithm
-> said to be inefficient for
graphs with ends of equal length edges
-> finding:
seems to be true, long wait
-> rumor: Breadth first
search should be the best
-> No breadth first in CPAN
->
rolls his own
-> first use hashes
->
inefficient for graphs with ends of equal length edges
->
use array (improve performances by order 1 of magnitude)

->
approaching 2 pi

-> his garry kasparov
number is 4

problem for making web site
18
m Storable takes seconds to freeze

-> put
the graph into RAM : mod_perl

performance: 0.1s
average per query (array version): not too bad

not
good: as nb of instances increases, RAM usage explode
->
didn’t find a way to share the graph across children

other
problem: Player names are not unique in Chessbase

esp.
for game-same player names appear before 1900 and after 1990, this cant
be.

solution: players who have a “gap” in their
playing records for more than 40 years will be treated as 2 (or more)
players. (Assumption)

-> rework tables,
rebuild Storable freeze

-> build caching
into the front end
computing chains takes times so queries
are stored in a table when they appear for the first time
(added
benefit: data for statistics)

-> redirect
uncached queries to backend

-> fill the
cache with “Kasparov queries”, for a head start

can
link everyone to everyone

Zoechling, Karlheinz
Anderssen,
Adolf is the 1st world champion

Anatoly karpov
has a kasparov number of 1 and a Bacon number of 2 !

hits:
couple of thousands a day

Building
Scalable Data Collection (by mock)

huge quantity
of data from Akamai from various source, sometimes all at once

cron
based db insertion sucks
insert email

steal
good ideas from

perlbal, memcached, mogilefs,
db shards

glue together with POE the wrong way

from
Akamai up into db and mogileFS

scalable fast
architecture

queue-> reader ->
storage

larger lumps of data are faster to
process and transport

MogileFS data store

distributed
load balanced storage
uses mysql - too many inserts is bad

JSON as
compromise record encoding
aggregate data in large gzipped
files
index position of records in sql db

(JSON
access is fast)

2-3 months of data -> 60GB

db
reads scale with clusters
but db writes don’t scale with
clusters

solution -> DB shards

mock
modified DBIx::Class
to work with sharded databases (not yet on CPAN, but its planned)

other
implementations:

Apache/mod_perl (faster in some
way but doesn’t handle loads of transactions very well)
Event::Lib
(not mature)

issue of asynchronous work flow
-> need locking


mogilefs
:
weirdness
with small records
not that fast with writes

Akamai:
services to push back data to content provider

pre
sharded version of pgsql

commercial alternative:
Sybase
IQ

all the nodes are load-balanced
with perlbal

mail:
mock@obscurity.org
web: http://sketchfactory.com

use
JSON::XS
(doesn’t like unicode)

Perl
sucks and what to do about it (by Mark Fowler)

*
Installing perl program is hard

-> PAR

perl
-MCPAN -e ‘install PAR::Packer’

pp -o hellow
hellow.pl

exec time
perl 0.35s
par
0.60s

->alternative -> build own
perl and ship it with the app
-> problem when moving
to a different machine (paths are hard coded so are different)
->
bleed to the rescue

when config perl add
-Duselocableinc

 * perl exception
handling

 - die means die not capture
exception
 - eval
 -
if(blessed($@)  &&
$@->isa(”NoCheeseException”){

 }

try
{
       
throw NoCheeseException “redo”;
}
catch
NoCheeseException with {

}

above
is perl code

(see Error.pm)

->
problem (same as with eval)

in try{
       
return “this doesn’t return from foo”;
}

replace
return by rreturn

and add return allowed after
the catch

 * I hate the way perl
programs are just script

Template Toolkit tpage
solution
1: source filter

solution : build your own
executable

 * I want to
programmatically manipulate my code

 PPI
 -
cant tell the diff between certain perl constructs (like subroutine
prototypes)
 but reliable
 
 MAD

when
config perl
-Dmad=y

 B::Generate
 can
be used to created opcode

 optomize.pm

 *
real prog language can do compile time checking

 use
typesafety;

 typesafety::check()

Perl
worst practices
——————–

Good
Perl
 * easy to read
 *
beautiful
 * useful

Bad Perl
 *
difficult to read
 * ugly
 *
useful
 * fragile

I don’t
like java
java was designed for stupid people…
…but
you don’t need be stupid to use java

examples of
good java made by smart people: lucene, eclipse

I
like perl
perl was designed for smart people …

but you don’t need to be smart to use perl!

Slacks
Law

95% of all the people you meet are stupid

Lies,
damned lies

The problems

3
big problems

variables, regexp, OO

 *
evil variables
1. global
2. package
3.
local
4. my ?

my variables are ok in
a small scope

magic global variables

$txt
= /(\w+):(\w+)/ ;
check_name($1) ;
add_aut($2) ;

 *
regexp

 simple components
 complex
machine

 simple mistakes
 (regexp
injections)
 simple solutions
 
use eq, substr, index, unpack

 complex
mistakes
 regexp evolution
 !DIY
 use
Mail::RFC822::Address;
 use
Regexp::Common;

 *
Object Orientation

 OO is evil
 an
object is just a variable that think they are smart
 slow
 ugly
 too
many ways to do it
 multiple inheritance !

 do
you really need OO?

 POO = Perl Object
Orientation

YAPC::Europe: Wednesday


Thursday, August 30, 2007

it’s just my notes, modified for
obvious spelling errors fixes and URLs for the interesting bits. It may
contains errors. I’ll post proper and scoped articles later.

AntiSocial Perl (Damian
Conway)

rod logic

Rod::Logic
(unfortunately not in CPAN :-(   ;-)

Quantum
mechanics + Special relativity

dirac equation
final
diagram of Feynman

positron travels back in time

positronic
program

Positronic::Variables (unfortunately not
in CPAN :-(   ;-)

Deutsch’s CTC (closed
time-like curves)

Test::Harness 3.0
(Curtis Poe)

TAP::Parser
will become T::H 3.0
dev release next week

*
TAP
Version 13 or 14 of TAP
TAP version 1,
January 30 1988

July 8 1996, version 5, all non
tap ignored
Bail out!
v13: understands TAP
version syntax

* TAP Parsers
runtests
gets this right. prove does not

Test::Harness
issues
v difficult to upgrade TAP
difficult to
provide alternative view
confused with incorrect test counts
difficult
to track down skip and todo
multi language tests in suite
difficult

why not refactor T::H?
->
20 years of cruft
-> several people have tried and
failed
-> dangerous to break the tool chain

design
goals:
backward compatible
runs on perl 5.005
non
non-core modules
runs everywhere T::H does
MVC
no
bugs
support new TAP versions

support
multiple languages test using drivers program

todo:
*improve
coverage (btw, theres a bug in Devel::Cover)
*optimize
(optimized runtests catching up with prove but return so much more
information)

future plans:
parallel
test runs
GUI and HTML views
improved diagnostics
via a yaml subset
repeatable shuffles
runtime env
description

who’s using it
Yahoo!
(tagging of the tests)
xmms2 (multi languages tests)
Smolder
(run locally, display remotely)

problems with Test::TAP::HTMLMatrix
(internals is yaml, not xml, no good for document which test reports
are)

Automated Testing of
Open Source software (Gabor Szabo)

Gabor
Szabo

CPAN::Forum

test
automation

QA day:
* TAP
*
FIT
* Selenium
*
Automation in OSS <- subject of the talk

Business
value
* reduce feedback cycle
* continuous builds
*
automated smoke (regression) tests

* report
generation
* overview
* current status
*
drill down to see where did something break

*
accountability

companies VS open source
limited
budget for QA - no paid QA people
market pressure releasing
buggy soft = release often, release soon

open
source:
 test locally, report remotely
 security
consideration by downloading software

 szabgab.com

 * 
perl 5 development:
 Perforce
 RT
 rsync
to get source
 commit msg in mailing list
 TAP
 Smoke
(C compiler, Working perl, Test::Smoke)
 db.test-smoke.org
(not updated any more)
 www.test-smoke.org

centralization
or decentralization of smoke testing

perl 5:
easy participation

 * Parrot testing

multi
language testing (perl, PASM,PIR)
 smoke: use TAP
and Test::TAP::HTMLMatrix
(will be replaced by Smolder)

 * pugs
 subversion
and SVK
 Needs
(Glasgow Haskell Compiler), Perl and Test::TAP::HTMLMatrix

 
* CPAN

       
CPANPLUS
+ Test::Reporter

       
easier is : CPAN + CPAN::Reporter

 *
SQLite
  CVS, tests written in C and TCL
       
very good coverage (98%)
   
    no automated smoke testing
       
CVS HEAD is currently
broken

        

       
* NUT - Network UPS tool
       
use BuildBot for automated build
       
no automated test!
       
need the device to be tested
       
the system might shut down during test

       
* Ruby
       
use subversion
       
unit tests written in Ruby
       
rubinius has separate test suite
       
no automated smoke testing

       
* PGSQL
        
test suite: home grown perl scripts
        
long and frightening list on how to setup … but is easy
        
need registration

How to
find vulnerabilities in perl code (mock)

10k
modules on CPAN
500k from lang:perl on google code search

anatomy
of a vulnerability
user manipulatable
causes harm
usually
found in the boundaries between systems
(perl/sql, perl/web,
perl/fs, perl string/unicode)

sql injection
xss
Flash
cross-domain-policy

google.com/codesearch/

lang:perl
open\s+[A-Z0-9]+,\s*\”.*\$
gives > 19k results

App::Ack,
App::Grepl

lang:perl
(SELECT|DELETE).*FROM.*=\s*’?[\$\@]

methodology:
find
harm and also find something to manipulate
you can
manipulate:
content (taint mode protect against this)
structure
race
conditions (difficult to find and rarely manipulatable)
predictable
state
data leakage

any variable in a
template is potentially a XSS

stompy
- a tool to detect bad prngs
http://lcamtuf.coredump.cx/stompy.tgz

SideJacking
- is your session encrypted, or just your login
http://www.erratasec.com/sidejacking.zip
/

Fuzzing

PeachFuzz
http://peachfuzz.sourceforge.net

Follow
the data flow from user manipulatable input to causing harm

don’t
forget XS

http://sketchfactory.com

Introduction
to Moose (Stevan Little)

use Moose
imports:
 * keywords has, extends, with, before,
after, around, super, override, inner, augment
 *
use strict and use warnings
 * Carp::Confess and
Scalar::Util::Blessed

 no moose ; 1;

Moose::Util::TypeConstraints

pseudo
typing for perl5 -> its actually a validator

->meta
returns meta class
metaclass defines the class
metaclass
is itself an instance of a metaclass

its for
 *
introspecting
 * modify classes (add/remove method,
add/remove attributes)
 * programmatically create
classes

attribute delegation

type
constraints unions

type coercions
 *
create subtype
 * add coerce attribute
 *
use coerce to precisely coerce (what and how) data

 Benefits
of Moose
  * code is less tedious
          
* no need to worry about basic mechanics of OO likes
                  
* object initialization
                        
* object destruction
                        
* attribute storage, access and initialization
               
* less tedium means many typo errors are all but eliminated
       
* code is shorter
               
* Moose declarative style allows you say more with less
               
* less code == less bugs
       
* less low-level testing needed
          
* no need to verify things which are covered by Moose test suite (3k
tests)
 * code becomes more descriptive (code is
documentation)

 Drawbacks:
 
* has fairly heavy compile time cost
         
* not good for non-persistent environments
               
* looking to use .pmc to reduce this burden
       
* some Moose features are slow at times
         
* speed is directly proportional to the amount of features used
       
* Extending non-hash based classes is tricky
         
* e.g: IO::* (use Class::InsideOut
or Object::InsideOut
or use delegation)

Matt Trout is hacking the
lexer to lift some subroutines from compile time to runtime ( or the
other way round, cant remember what he said)

Role
system is very inefficient at the moment

Kwalitee
(Xavier Caron)

definition attempt:
 *
approx of “Quality”
 * confidence
  
* through passing tests, but thats not enough
        
* but correlation exists if there is functional test coverage
        
* bug = diff between expectation and implementation
        
* bug = diff between test, documentation and code
        
* you tend to the goad, but you wont reach it

        
* ages before
         
* literature
               
* CPAN
               
* articles, conferences,
               
* Read, learn, evolve
       
* before
        
* generate skeleton
        
* write tests ( a tad of XP)
       
* while
       
* after
         
* test
                 
* measure pod coverage
                       
* measure tests code coverage
                       
* measure func test coverage
               
* generate synthetic reports
       
* way after (release)

       
“Always code as if the guy who ends up maintaining your code will be a
violent psychopath who knows where you live” Damian Conway

       
SICP’s
preface
:
       
“Thus, programs must be written for people to read, and only
incidentally for machines to execute.”

       
* Pre requisites:
        
* version control
        
* version control standards
        
* coding standards
        
* ticket tracker
        
* text editor or IDE

       
* do not reinvent the wheel - avoid repeating others errors
       
* use CPAN
       
“I code in CPAN, the rest is syntax.” - Audrey Tang

       
programmers triptych

       
pod (hubris)
       
tests (laziness)
       
code (impatience)

       
At the beginning
        
file tree structure
       
Use a dedicated CPAN module
        
Module::Starter ( or Module::Starter::PBP)

       
Testing for dummies
       
test = confront intention * implementation
       
using techniques (directed or constrained random test)
       
and a reference model (OK ~ no <> vs reference)

TDD
test
suite ~ executable specification

“old tests
don’t die, they just become non-regression tests!” chromatic &
Michael G Schwen

tester:
“is this
correct?”
“Am I finished?”

code
coverage <> functional coverage

how
do I measure functional coverage in perl?

HDVL
there is SystemVerilog

for perl: Test::LectroTest

TAP

skip:
because external factor
todo: not yet implement

CPANTS
define
kwalitee
metrics
(13)

assertions

“dead
programs tell no lies” Hunt and Thomas, Pragmatic programmer

Test::LectroTest

most
test are directed

an alternative is “constrained
random testing”
let the machine do the dirty job instead
(pseudo) randomly (like in hardware testing)
-> use
Test::ElectroTest module
 -> stick a type to
each function parameter
 -> add constraints
to parameters (i.e restrain to subsets)

refactor
early, refactor often
(on feature branches)

there
is technique and there is commitment

“At that
time [1909] the chief engineer was almost always the chief test pilot
as well. That had the fortunate result of eliminating poor engineering
early in aviation.” igor sikorsky

High
Order Parsing in perl (Mark Jason Dominus)

Parsing
= unstructured -> data structure

closed
vs open system

open system
 +
flexible, powerful, unlimited
 _ require more
understanding

 Parse::RecDescent
is a really excellent closed system
 open system : HOP::Parser

example:
web app where user input is math function
we want a graph
out of it
easy solution: use eval to run user input
into compiled perl code
 cangowrong:
       
* input is “rm -rf”
       
* in perl ^ means bitwise exclude but not exponentiation
       
* …
alternative: implement an evaluator for expression
 *
input: string
  * output: compiled code or abstract
syntax tree or specialized data structure or expression object or ..

structure
of an expression -> grammars

expression
-> “(” expression “)” | term (”+” expression | nothing)

term
-> factor (”*” term | nothing)

factor
-> atom (”^” NUMBER | nothing)

atom
-> NUMBER (argh!, something’s missing here)

lexing

idea:
preprocess the input
humans do this when they read
 
* first, turn the seq of char into a sequence of words
       
* then try to understand the struct of the sentence based on meanings
of words
       
* this is called lexing

lexing: is mostly matter
of pattern matching

perl actually has special
regex features just for this purpose

tokens

sub
type{}
sub value{}

recursive descent
parsing

idea: each grammar rule becomes a
function

parsers

easy
one: nothing
others: parsers for a specific token

YAPC::Europe: Tuesday


Wednesday, August 29, 2007

it’s just my notes, modified for
obvious spelling errors fixes and URLs for the interesting bits. It may
contains errors. I’ll post proper and scoped articles later.

Upate: Fixed broken links

Larry Wall’s Keynote

scripting
languages

past, present and future

ruby
most direct competitor for perl

perl 6 mix between
pure scripting and programming language

lua,
applescript: niche player

failed: tcl (due to lack
of extensibility), *sh (clumsy addition of layers of features)

early
binding vs late binding
perl6: all method are virtual by
default

single dispatch, multiple dispatch
single:
python, perl5
multiple: perl6, dylan

eager
or lazy evaluation
haskell: very lazy evaluation
perl6:
scalar is eager by default, list is lazy

eager
typology, lazy typology:
types introduced in perl6 for the
multiple dispatch
fixes e.g: prototyping

removed
from perl6 punctuation that were not really necessary
introduce
a new one for scoping

mutable, immutable classes
java
classes are immutable -> fast
ruby classes are mutable
-> ruby slow
perl6 will have a mix

class
based OO, prototype based OO
perl6 will be classed based, but
meta data will allow prototype based OO
(see Moose in perl5)

perl6:
given … when … for regexp?

Selenium,
an introduction to web testing (Barbie)

benefits over
Mechanize: test javascript as it runs

Thoughworks
released
as open source on openQA

use javascript and iframes
in the browser
core runs the tests and interrogates the DOM
RC
server and core communicate via AJAX

Core, Remote
Control, IDE (firefox plugin)

Core: issues with Opera

RC:
java, requires JRE version 1.5.0 or higher
experimental
support for SSL
language hooks for
java, .Net (C#),
Perl, PHP, Python, Ruby

Mozilla same origin policy

IDE:
record/playback, edit and debug tests
include Selenium Core

cpan>
install Alien::SeleniumRC

(cant
upgrade due to how versions are dealt with)

cpan>
install Test::WWW::Selenium

$
selenium-rc

use WWW::Selenium::Util
qw(server_is_running)

Evolving
architecture - make development easy and your site faster (Leo Lapworth)

evolution

running
website
 * servers improved
 *
architecture too
 * development tools improved
 *
language

  * templates

 hard
coded html in the beginning
 now templates
 Template
 
 
* servers

       
from development on 1 server to 3-tier servers

       
svk with subversion
       
trac

       
* tests

       
* make your site faster

       
mod_perl (code caching)

       
Apache::SizeLimit
(safety net)
       
-> set it high
       
-> check it

  * front/back end split

       
(sees Omnigraffle schema)

       
add caching

       
search result sets

       
individual items
       
lookups for info from database

       
lookup from external sources

       
put caching methods in one packages

       
separate cache for each backend servers?
       
-> share them (using memcached)

       
perlbal
-> load balancer/proxy

       
mod_gzip

       
cache headers (expiry)

       
/includes/js/<version>/common.js -> can be cached
forever

       
ensures user has version which matches html

       
use include file to update all pages

       
* handling images: MogileFS

       
* centralize
       
* test
       
* cache
       
* kiss (esp. perlbal)

XML::Compile::SOAP (Mark Overmeer)

XML
sucks (verbose, looks simple but its not)
 XML schema
 WSDL,
SOAP

 avoid learning XML and Schema

 pure
perl, compliant, complete, validating, xml message reading and writing

 use
XML::Compile::Schema;

 good:
 automatic
name-spaces
 type structures hidden (inheritance etc)
 template
generator

 limitation:
 only
name-space based schemas
 mixed content only via hooks
 schemas
themselves not validated
 you need a schema to use
the module

SOAP (PayLoad - all XML -, Transport - in
application)
Payload = Body + Header ( Envelope)

two
kinds of SOAP:
Document
* well defined body
*
requires longer schemas

XML-RPC
 *
interface quick and dirty
 * SOAP::Lite
special
 * discouraged in SOAP 1.2

WSDL
message
structure and transports details are grouped together.

XML::Compile::WSDL

SOAP
client/server implementation still under construction
 
use
BigInt instead of sloppy int -> slight reduction in performances.

Gluing
a bank together (UBS) (Paul Johnson)

move lots of money around to
avoid interests or to gain interests

Cash management

CPAN

primary
development is outsourced

needs to customize the
product

needs to be integrated

database
web
servers
communications
high availability
monitoring
logging
archiving
deployment

initially
role is automated testing

perl as a development
language is  not allowed in UBS
but perl to glue
thing together is Ok, then development could be done

Oracle
100s
of GB

* Web server
IHS:
IBM re branded version of Apache

* Communication

multiple
sources
multiple format

 - message
transfer: how amount goes from what bank to what other bank in what
currency
IBM MQSeries

 - mail
 -
SMS
 - IRC
 - file transfer

 pack
and unpack

use  Spreadsheet::ParseExcel

use
MQSeries
; # written and maintained by Morgan Stanley people

system
handles many millions of money currency

if system
breaks, huge amount of money is lost

monitoring
-> Nagios

logs

50 GB a day
require application restart to log rotate

so
he write wrappers with named pipes, correct formatting including
timestamps

use Log::Log4Perl

Deployment:

Sun
packages

package creation

mini
CPAN burnt on CD

Extra development

internal
part base on Catalyst
(DBIx::Class,
Template
Toolkit
)

Automated testing

Test::*

use Test::WWW::Selenium

Trexy (Nigel Hamilton)

trexy.com

remember
search trails

my trails - all trails - blaze a trial

30
millions incoming links

Sys::Statistics::Linux::MemStats

pingability.com

webmin

Template::Simple

The
Goo

perceptrons
sensors

http://blog.thegoo.org

Tech
Pub Crawl: first Tuesday of the month in London
flag-and-bell.com
FREE BEER

memcached (Leon Brocard)

network
effect
-> scaling?

temporary storage
area fro frequently accessed data can be stored for rapid access

trade
memory/disk speed

One Server:

MySQL
query cached - invalidated on write

Disk - Cache::FileCache
scales
really well
memory bound

mod_perl
only
one per child

shared memory
not as fast as
you might think

cache is separate on each

lower
hit ratio
higher miss ratio

memcached
 giant
hash table distributed across machines

 never
blocks
 libevent
 epoll/kqueue
 slab
allocator
 least request used
 thread
per cpu (optionally)
version 1.2.x are much better

facebook:
3TB memcached

use Cache::Memcached

Pattern:
fetch
from cache
if there return
       
else calculate, place in cache, return

cache, not a
database
-> cant dump
-> no
persistence
-> no redundancy
-> no
access by id
-> …

time to live

smart
caching
timestamps, version number in key
cache
forever

low CPU

Failover?
doest
do it for you
replace failed server with another with same ip
or
use consistent hashing

limits:
keys: max
250 chars
values: max 1MB

Testing
*
disable memcached

future:

consistent
hashing
binary protocol
more statistics

http://www.danga.com/memcached/

has
to push the keys to all memcached servers

memcached, perlbal, mogileFS,
Djabberd,Gearman

TheSchwartz

Net::Proxy (Philippe Bruhat)

connectors

a
connector handles the pairs of socket (one for each client)

Use?
 *
escape the corporate proxy
CONNECT method
(abuse of
the the CONNECT, normally for SSL?)
 * avoid
Intrusion Detection Systems
  * early stage of ssh
negotiation is not encrypted and can be detected by IDS by doing a
m/ssh/
       
* use hooks to hide ssh signature using one Net::Proxy
before the firewall and another Net::Proxy the other side of the
firewall to decrypt ssh signature
 * add SSL support
to an application that doest support it
 * run two
servers on the same port
   we want to run
sshd and https on the same port
         
* in ssh negotiation, server speaks first
               
* in http/ssl: client speaks first
   
* Net::Proxy uses that to make it possible

 *
todo:
   * write a connector fully
compatible with GNU httptunnel
        
* enhance the httptunnel protocol to support multiple connections.
        
* implement reverse connectors (as you cannot connect to
machines
behind firewalls at the moment)
        
* implement DNS tunnel connectors
        
* implement UDP connectors
        
* implement a connector that can be plugged to the STDIN/STDOUT of an
external process, like the ProxyCommand option of OpenSSH
        
* finish the starttls connector
        
* implement SOCKS connectors

YAPC Europe


Tuesday, August 28, 2007

Vienna, Austria:

YAPC::Europe perl conference has started and the theme is “Social Perl”
Next year’s conference will be held in Copenhagen, Denmark.

Today’s schedule is interesting and I’ll post some notes later.

Transport-r


Sunday, August 12, 2007



Transport-r

Originally uploaded by Ahmed Zahid

It’s good to start a sunday by finding wonderful picture in your contact’s photo stream. Here, Ahmed Zahid is sharing again yet another wonderful sea-themed shot. I like the 3 three mooring lines that leads the eye to boat. I like its curves, it looks like a venetian gondola. Lighting and reflections are gorgeous as usual.