Perl Basics
Introduction to Perl
FAQ's
CGI
Regular Expressions

PC Overview
Cool Stuff
My Modules
Success Stories
Links
Perl in the News
Logos

PC Internals
About
Contact
Handy Logos
What's new

Character Encoding and Safe Queries

Perl offers built-in functions for these kind of needs
See also UTF-8 Conversion
P-friendly

[ home ] - [ search ] - [ sitemap ]


[ Binary-Arithmetics ]

Consider following code:

print "hex FF = ",                  hex FF,                 "\n",
        "unpack(B8,'A') = ",        unpack(B8,'A'),         "\n",
        "unpack(H8,'HALLO') = " ,   unpack(H8,'HALLO'),     "\n",
        "unpack(C, 'A') = ",        unpack(C, 'A'),         "\n";

Output would be:

hex FF                =       255
unpack(B8,'A')        =       01000001
unpack(H8,'HALLO')    =       48414c4c
unpack(C, 'A')        =       65

Most wanted conversions

Decimal to Hex

255 -> FF
$decimal = 3456;
$hex = &dec2hex($decimal);
print "decimal $decimal = '$hex'\n";

sub dec2hex($) { return sprintf("%lx", $_[0]) }
Char to Hex

ABC -> 41 42 43
my $hex = &iso2hex('ABC');
print "iso 'ABC' = '$hex'\n";

sub iso2hex($) {
	my $hex = '';
	for (my $i = 0; $i < length($_[0]); $i++) {
		my $ordno = ord substr($_[0], $i, 1);
		$hex .= sprintf("%lx ", $ordno);
	}

	$hex =~ s/ $//;;
	$hex;
}
sub iso2hex_new($) {
        my $hex = '';
        for (my $i = 0; $i < length($_[0]); $i++) {
                my $ordno = ord substr($_[0], $i, 1);
                my $hx = sprintf("%lx ", $ordno);
                   $hx = "0$hx" if length($hx) < 3;
                $hex .= $hx;
        }

        $hex =~ s/ $//;
        $hex;
}
See also: iso2hex
Hex to ISO

41 42 43 -> "ABC"
sub hex2iso ($) {
	my $iso = '';
	(my $hex = $_[0]) =~ tr/ //d;
	for (my $i = 0; $i < length($hex) + 1; $i += 2) {
		my $char = pack('H8', substr($hex, $i, 2));
		$iso .= $char;
	}
	$iso;
}
sub hex2iso_new ($) {
	my $iso = '';
	(my $hex = $_[0]) =~ tr/ //d;
	for (my $i = 0; $i < length($hex); $i += 2) {
		my $char = pack('H8', substr($hex, $i, 2));
		$iso .= substr($char, 0, 1);
	}
	$iso;
}
Decimal to Binary

255 -> 11111111
$decimal = 153;

$binmode = unpack('B8', pack('C', $decimal));

print "Decimal $decimal = '$binmode'\n";

# <-- Decimal 153 = '10011001'

or better
sub dec2bin($) { return sprintf("%b", $_[0]) }
or with leading Zeroes
sub dec2bin($) {
	my $bin = sprintf("%b", $_[0]);
	my $padding = 0;
	   $padding = 8 - length($bin) % 8 if length($bin) % 8;

	return substr('00000000', 0, $padding) . $bin;
}
Hexadecimal to Decimal

FF -> 255
sub hex2dec($) {
        eval "return sprintf(\"\%d\", 0x$_[0])";
}

or shorter:
sub hex2dec($) { return hex $_[0] }
Decimal to Addends
my $res = &dec2addends( number => $ARGV[0] || 153 );

foreach ( @{$res->{'add_numbers'}} ) {
        print "------> $_\n";
}

sub dec2addends (%) {
	my %args = @_;
	my %hash = (); # init

	$hash{'bin'} = dec2bin($args{'number'});

	my $factor = 2 ** (length($hash{'bin'}) - 1);

	my @add_numbers = ();
	for (my $i = 0; $i < length($hash{'bin'}); $i++) {
		my $add_number = substr($hash{'bin'}, $i, 1) * $factor;
		push(@add_numbers, $add_number) if $add_number;
		$factor /= 2;
	}

	$hash{'add_numbers'} = \@add_numbers;
	\%hash;
}


e.g. Decimal 1000:
bin = '1111101000'
add_numbers = 'ARRAY(0x806552c)'
------> 512
------> 256
------> 128
------> 64
------> 32
------> 8

"Royce Kemp"  wrote in message
news:b5923048.0201121758.6192b156@posting.google.com...
> I want to be able to take ASCII character strings like the ones shown
> below and store them into a file converted to binary...but not the
> ASCII binary representation....rather the literal binary.
>
> 80000000000000000000000000000000
> 66e94bd4ef8a2c3b884cfa59ca342b2e
>
> So, 80000000000000000000000000000000 would be stored as
>
> 1000 0000 0000 0000 0000 ....
>
> and 66e94bd4ef8a2c3b884cfa59ca342b2e would be stored as
>
> 1010 1010 1110 1001 0010 ....
>
> is there a way to use the pack/unpack functions to do this for me? if
> not, how can i control precisely what binary values are written to the
> file.
>
> thanks in advance.
> -r


URI encoding

URI::Escape - Escape and unescape unsafe characters

This module provides functions to escape and unescape URI strings as defined by RFC 2396 (and updated by RFC 2732). URIs consist of a restricted set of characters, denoted as "uric" in RFC 2396. The restricted set of characters consists of digits, letters, and a few graphic symbols chosen from those common to most of the character encodings and input facilities available to Internet users.

use URI::Escape;

my $safe = uri_escape("10% is enough\n");

print "safe = '$safe'\n"; # -- 10%25%20is%20enough

Inline Subroutine of uri_escape

If you couldn't install the related CPAN module:

#!/usr/bin/perl -w
use strict;

print &uri_escape($ARGV[0]), "\n";

sub uri_escape {
	my $text = $_[0];
	return undef unless defined $text;

	# Build a char to hex map
	my %escapes = ();
	for (0..255) {
		$escapes{chr($_)} = sprintf("%%%02X", $_);
	}

	# Default unsafe characters.  RFC 2732 ^(uric - reserved)
	$text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g;

	$text;
}



Transforming Character Sets

/usr/bin/iconv -f "UTF-8" -t "ISO-8859-1" temp_file



Web Encoding to ISO-8859-1

ä = &#228; ö = &#246; ü = &#252;

my $string = 'Z&#252;rich f&#228;nde ich sch&#246;n';

while ($string =~ /\&#\d+;/) {
        my $tmp_string = $string;
           $tmp_string =~ s/.*\&#(\d+);.*/$1/;

        $char = pack(C, $tmp_string);
        $string =~ s/\&#$tmp_string;/$char/g;
}

print "asc = '$string'\n";
home - feedback - search

$Id: character-encoding.htm,v 1.6 2004/03/23 01:13:00 reto Exp $
© 1998-2004 reto :)