***** infoCopter.com/perl *****

Character Encoding and Safe Queries

Perl offers built-in functions for these kind of needs

[ Binary-Arithmetics ]

Consider following code:

print "hex FF = ",                  hex FF,                 "\n",
        "unpack(B8,'A') = ",        unpack(B8,'A'),         "\n",
        "unpack(H8,'HALLO') = " ,   unpack(H8,'HALLO'),     "\n",
        "unpack(C, 'A') = ",        unpack(C, 'A'),         "\n";

Output would be:

hex FF                =       255
unpack(B8,'A')        =       01000001
unpack(H8,'HALLO')    =       48414c4c
unpack(C, 'A')        =       65

Most wanted conversions

Decimal to Hex

255 -> FF
$decimal = 3456;
$hex = &dec2hex($decimal);
print "decimal $decimal = '$hex'\n";

sub dec2hex($) { return sprintf("%lx", $_[0]) }
Char to Hex

ABC -> 41 42 43
my $hex = &iso2hex('ABC');
print "iso 'ABC' = '$hex'\n";

sub iso2hex($) {
	my $hex = '';
	for (my $i = 0; $i < length($_[0]); $i++) {
		my $ordno = ord substr($_[0], $i, 1);
		$hex .= sprintf("%lx ", $ordno);
	}

	$hex =~ s/ $//;;
	$hex;
}
sub iso2hex_new($) {
        my $hex = '';
        for (my $i = 0; $i < length($_[0]); $i++) {
                my $ordno = ord substr($_[0], $i, 1);
                my $hx = sprintf("%lx ", $ordno);
                   $hx = "0$hx" if length($hx) < 3;
                $hex .= $hx;
        }

        $hex =~ s/ $//;
        $hex;
}
See also: iso2hex
Hex to ISO

41 42 43 -> "ABC"
sub hex2iso ($) {
	my $iso = '';
	(my $hex = $_[0]) =~ tr/ //d;
	for (my $i = 0; $i < length($hex) + 1; $i += 2) {
		my $char = pack('H8', substr($hex, $i, 2));
		$iso .= $char;
	}
	$iso;
}
sub hex2iso_new ($) {
	my $iso = '';
	(my $hex = $_[0]) =~ tr/ //d;
	for (my $i = 0; $i < length($hex); $i += 2) {
		my $char = pack('H8', substr($hex, $i, 2));
		$iso .= substr($char, 0, 1);
	}
	$iso;
}
Decimal to Binary

255 -> 11111111
$decimal = 153;

$binmode = unpack('B8', pack('C', $decimal));

print "Decimal $decimal = '$binmode'\n";

# <-- Decimal 153 = '10011001'

or better
sub dec2bin($) { return sprintf("%b", $_[0]) }
or with leading Zeroes
sub dec2bin($) {
	my $bin = sprintf("%b", $_[0]);
	my $padding = 0;
	   $padding = 8 - length($bin) % 8 if length($bin) % 8;

	return substr('00000000', 0, $padding) . $bin;
}
Hexadecimal to Decimal

FF -> 255
sub hex2dec($) {
        eval "return sprintf(\"\%d\", 0x$_[0])";
}

or shorter:
sub hex2dec($) { return hex $_[0] }
Decimal to Addends
my $res = &dec2addends( number => $ARGV[0] || 153 );

foreach ( @{$res->{'add_numbers'}} ) {
        print "------> $_\n";
}

sub dec2addends (%) {
	my %args = @_;
	my %hash = (); # init

	$hash{'bin'} = dec2bin($args{'number'});

	my $factor = 2 ** (length($hash{'bin'}) - 1);

	my @add_numbers = ();
	for (my $i = 0; $i < length($hash{'bin'}); $i++) {
		my $add_number = substr($hash{'bin'}, $i, 1) * $factor;
		push(@add_numbers, $add_number) if $add_number;
		$factor /= 2;
	}

	$hash{'add_numbers'} = \@add_numbers;
	\%hash;
}


e.g. Decimal 1000:
bin = '1111101000'
add_numbers = 'ARRAY(0x806552c)'
------> 512
------> 256
------> 128
------> 64
------> 32
------> 8

"Royce Kemp"  wrote in message
news:b5923048.0201121758.6192b156@posting.google.com...
> I want to be able to take ASCII character strings like the ones shown
> below and store them into a file converted to binary...but not the
> ASCII binary representation....rather the literal binary.
>
> 80000000000000000000000000000000
> 66e94bd4ef8a2c3b884cfa59ca342b2e
>
> So, 80000000000000000000000000000000 would be stored as
>
> 1000 0000 0000 0000 0000 ....
>
> and 66e94bd4ef8a2c3b884cfa59ca342b2e would be stored as
>
> 1010 1010 1110 1001 0010 ....
>
> is there a way to use the pack/unpack functions to do this for me? if
> not, how can i control precisely what binary values are written to the
> file.
>
> thanks in advance.
> -r


URI encoding

URI::Escape - Escape and unescape unsafe characters

This module provides functions to escape and unescape URI strings as defined by RFC 2396 (and updated by RFC 2732). URIs consist of a restricted set of characters, denoted as "uric" in RFC 2396. The restricted set of characters consists of digits, letters, and a few graphic symbols chosen from those common to most of the character encodings and input facilities available to Internet users.

use URI::Escape;

my $safe = uri_escape("10% is enough\n");

print "safe = '$safe'\n"; # -- 10%25%20is%20enough

Inline Subroutine of uri_escape

If you couldn't install the related CPAN module:

#!/usr/bin/perl -w
use strict;

print &uri_escape($ARGV[0]), "\n";

sub uri_escape {
	my $text = $_[0];
	return undef unless defined $text;

	# Build a char to hex map
	my %escapes = ();
	for (0..255) {
		$escapes{chr($_)} = sprintf("%%%02X", $_);
	}

	# Default unsafe characters.  RFC 2732 ^(uric - reserved)
	$text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g;

	$text;
}



Transforming Character Sets

/usr/bin/iconv -f "UTF-8" -t "ISO-8859-1" temp_file



Web Encoding to ISO-8859-1

ä = &#228; ö = &#246; ü = &#252;

my $string = 'Z&#252;rich f&#228;nde ich sch&#246;n';

while ($string =~ /\&#\d+;/) {
        my $tmp_string = $string;
           $tmp_string =~ s/.*\&#(\d+);.*/$1/;

        $char = pack(C, $tmp_string);
        $string =~ s/\&#$tmp_string;/$char/g;
}

print "asc = '$string'\n";
© reto :)