imaginary family values presents

yesh omrim

a blog that reclines to the left

Logo

The struggle against ASCIImperialism continues

15 January 2009

I learned two things about Linux this week. First, even in UTF-8 locales, fold thinks that any byte with a value above 127 is a space.

$ echo 'superextraordinarísimo' | fold -s -w 18
superextraordinar
�simo
$ echo 'superextraordinarísimo' | fold -s -w 18 | od -c
0000000   s   u   p   e   r   e   x   t   r   a   o   r   d   i   n   a
0000020   r 303  \n 255   s   i   m   o  \n
0000031
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Second, even if you can solve a tricky document-recoding problem with nothing more than sed and awk, that doesn’t mean you should.