Text Format Converting

cmd:expand - convert tabs to spaces

$ expand  -4
aa     ssss
aa  ssss

help

expand - convert tabs to spaces

Synopsis

expand [-tab1[,tab2[,...]]] [-t tab1[,tab2[,...]]] [-i] [--tabs=tab1[,tab2[,...]]] [--initial] [--help] [--version] [file...]

Description

This documentation is no longer being maintained and may be inaccurate or incomplete. The Texinfo documentation is now the authoritative source.

This manual page documents the GNU version of expand. expand writes the contents of each given file, or the standard input if none are given or when a file named `-' is given, to the standard output, with tab characters converted to the appropriate number of spaces. By default, expand converts all tabs to spaces. It preserves backspace characters in the output; they decrement the column count for tab calculations. The default action is equivalent to -8 (set tabs every 8 columns).

Options

-, -t, --tabs tab1[,tab2[,...]]

If only one tab stop is given, set the tabs tab1 spaces apart
instead of the default 8. Otherwise, set the tabs at columns tab1,
tab2, etc. (numbered from 0) and replace any tabs beyond the
tabstops given with single spaces. If the tabstops are specified
with the -t or --tabs option, they can be separated by blanks as
well as by commas.

documented on: 1999.11.26

cmd:od

$ od -c imptest.txt
0000000   1   0   0  \t   M   a   x       S   y   d   o   w  \n   1   0
0000020   1  \t   C   o   u   n   t       D   r   a   c   u   l   a  \n
0000040

$ printf 'aa\tb\n\ncc\n1234567123456789' | od -c
0000000   a   a  \t   b  \n  \n   c   c  \n   1   2   3   4   5   6   7
0000020   1   2   3   4   5   6   7   8   9
0000031

misc commands

$ yes 1234567 | head -3
1234567
1234567
1234567
Broken Pipe

$ echo tht{33,44,555}afaf
tht33afaf tht44afaf tht555afaf

$ stty -echo; cat -v; stty echo
  -- press any key then CR

$ long_foreground_task
^Z # suspend it
bg # revoke bk ground task

cmd:recode

dos2unix and unix2dos  (scripts)
recode     create alias:
                  alias dos2unix='recode ibmpc..lat1'
                  alias unix2dos='recode lat1..ibmpc'

European chars to ascii

better:

recode -f latin1..text

or:

recode -f latin1..ascii

documented on: 2005.08.19

fixing .txt files sent from a MS$ user

Newsgroups: gmane.linux.debian.user
Date: Fri, 4 Apr 2008

On Fri, Apr 04, 2008, Mumia W.. wrote:

> >>My niece sends some of her schoolwork to my wife (e.g. essays) for her
> >>to read.  First she sent .doc files which I can't access properly (no, I
> >>do no run OO) although I could get the jist.  I then suggested that she
> >>send plain text.

> That seems to be in Microsoft code page 1250 (cp1250). Install 'recode'
> and do this:
>
> recode cp1250..ascii < email.txt

Thanks, I had an "ambiguous" error re linefeeds and just tried the other cp's and found that cp1258 works fine.

thanks.

Douglas A. Tutty @porchlight.ca

PC graphics characters

ASCII art

http://en.wikipedia.org/wiki/ASCII_art

PC "Block ASCII's" or "High ASCII's" use the extended characters of the 16 Bit code page 437, which is a proprietary standard that was introduced by IBM in 1979 (ANSI Standard x3.16) for the IBM PC and MS DOS operating system. "Block" ASCII's were widely used on the PC during the 1990's until the Internet replaced BBS' as the main communication platform for "computer freaks" around the world. "Block" ASCII's were dominating the PC Text Art Scene.

Microsoft Windows does not support the ANSI Standard x3.16. You can look at "Block ASCII's" with a Text Editor using the Font "Terminal", but it will not look exactly as it was intended by the artist (see examples below).

ANSI art

http://en.wikipedia.org/wiki/ANSI_art

ANSI art is a computer artform that was widely used at one time on BBSes. It is similar to ASCII art, but constructed from a larger set of 256 letters, numbers, and symbols - all codes found in IBM codepage 437, often referred to as extended ASCII and used in MS-DOS environments. ANSI art also contains special ANSI escape codes that color text with the 16 foreground and 8 background colours offered by ANSI.SYS, an MS-DOS device driver loosely based upon the ANSI X3.64 standard for text terminals.

ANSI art is considerably more flexible than ASCII art, because the particular character set it uses contains symbols intended for drawing, such as a wide variety of box-drawing characters and block characters that dither the foreground and background color. It also adds accented characters and math symbols that often find creative use among ANSI artists.

ANSI art

> is there a howto (of sorts) anywhere?

Not AFAIK. But xterm basically understands ansi sequences, so it should not be difficult to write a filter that would produce the picture by means of

cat xxxx.ans |filter

You could use either

 -- the special xterm mode which displays box characters
    (something like ESC(O , or something similar, forgot what it
    is).
or

-- a utf-8 capable xterm

The problem is that not only ansi sequences (for colours and cursor position) must be interpreted. xterm does this by default. But also the characters themselves must be translated from PC-DOS ("codepage 437") to the characters understood by your xterm (iso-8859-1 or utf-8).

As a quick test, I tried some of the ansi art examples in http://www.acid.org/ftp/aaa-8991.zip on my utf-8 capable xterm, simply using iconv to convert codepage 437 to utf-8, e.g.:

iconv -f 437 -t utf-8 tohs.ans

I suppose that if you have a legacy xterm with iso-8859-1 (unfortunately still the default in Debian) it would have to be

iconv -f 437 -t iso-8859-1 tohs.ans

This gives some idea of what it should look like. It becomes better when you select reverse video (control-middle click, then select reverse video). But getting the true glory of ansi art, including the proper colour scheme, would require a specially-written filter, I think.

The easiest is to just use the TYPE command in an ms-dos environment (dosemu) with ansi.sys.

Jan Willem Stumpel

UTF-8 to ISO 8859-1 tool

Newsgroups:  gmane.linux.debian.user
Date:        Thu, 4 May 2006 14:18:21 +0200

> What editor can change UTF-8 to ISO 8859-1?

use recode :

recode u8..l1

wil do what you want.

jmt

UTF-8 to ISO 8859-1 tool

> What editor can change UTF-8 to ISO 8859-1?
> Running Debian Sarge. Oo doesn't seem to. MC doesn't know what's what.
> Who?

$ iconv -f UTF-8 -t ISO8859-1 foo.html >foo.latin1.html

You could do the conversion with emacs, but the above method is more suitable for batch processing. I think this would cut out a step for printing:

$ iconv -f UTF-8 -t ISO8859-1 foo.html |a2ps

> Rationale: I change html pages to text to print with a2ps. But then all
> accents show up as "garbage" utf-8 sequences. It seems to "know" ISO 8859-1.

What locale are you running? I think you could try the following:

$ LANG=en_GB.UTF-8   a2ps foo.html

$ export LANG=en_GB.UTF-8
$ a2ps foo.html
$ a2ps bar.html

Please post what works or doesn't work!

Adam Funk

UTF-8 to ISO 8859-1 tool

> Please post what works or doesn't work!

Thanks Adam!

iconv -f UTF-8 -t ISO_8859-15 chikung.txt > chikung.converted.txt

does the ticket.

I don't print html pages with a2ps because it uses too much ink + paper:

iconv -f UTF-8 -t ISO_8859-15 chikung.txt | a2ps --medium=Letterdj -o chikung.ps

does the job in one step. I'll make that an mc immediate command…

Hugo Vanwoerkom

European chars to ascii

Newsgroups:  gmane.linux.debian.user
Date:        Sat, 20 Aug 2005 00:36:14 +0200

> Is there any tools that can convert European characters to plain
> 7bit-Ascii?

Or, if you'd like to specify the special characters' hex codes (in case you have problems entering them directly…), you could write instead

#!/usr/bin/perl

%mapping = (
    'e4' => 'ae',
    'f6' => 'oe',
    'fc' => 'ue',
    'df' => 'ss',
    # ...
);

$set = join '', map "\\x$_", keys %mapping;

while (<>) {
    s/([$set])/$mapping{sprintf "%x", ord $1}/ge;
    print;
}

Almut Behrens

P.S. Normally, you'd use iconv for encoding conversions. However, "iconv -f 8859_1 -t ASCII isolatin1-file" doesn't work, because ASCII can only represent a subset of characters present in 8859_1 — which makes iconv complain…

European chars to ascii

> P.S. Normally, you'd use iconv for encoding conversions.  However,
> "iconv -f 8859_1 -t ASCII isolatin1-file" doesn't work, because ASCII
> can only represent a subset of characters present in 8859_1 -- which
> makes iconv complain...

recode is better, using the -f option:

ay:~> latin1 | recode -f latin1..ascii
20  !"#$%&'()*+,-./0123456789:;<=>?
40 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
60 `abcdefghijklmnopqrstuvwxyz{|}~
A0  !clbSS"(c)a<<not-(R)^2^3'uP.,^1o>>1/41/23/4?
C0 `A"AAAEC`E`ID~N`O"OxO`U
E0 `a"aaaec`e`id~n`o"o:o`u

ay:~> latin1 | recode -f latin1..text
20  !"#$%&'()*+,-./0123456789:;<=>?
40 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
60 `abcdefghijklmnopqrstuvwxyz{|}~
A0  ``''
C0 A`A^A"C,E`E'E^E"I^I"O`O^O"U`U^U"
E0 a`a^a"c,e`e'e^e"i^i"o`o^o"u`u^u"

Vincent Lefevre

Need help converting files to unix format

Newsgroups: comp.os.linux.misc

So you have a directory full of text files that need converting? Actually, you should be able to compile those .c files anyway, maybe something else is wrong?

But you can try this:

find . -type f | xargs recode dos..

recode is a powerful general purpose file format changing tool. I think it will harmlessly skip non-text files, but you better check the man page (or test it out) first. recode.rpm is small and easy to install, but it has too many features to think about. And the above example really does end in two periods, that's not a typo.

If you only want to change .c files, you can alter the command to: find . -type f -name *.c | xargs recode dos..

Good luck!

Wayne Pollock

cmd:AutoConvert

Usage

iconv -f GB2312 -t BIG5 file.sample.cc.gb2312 > file.sample.cc.big5
cat file.sample.cc.big5 | iconv -f BIG5 -t GB2312

cmd:AutoConvert

Description

AutoConvert is an intelligent Chinese Encoding converter. It uses built-in functions to judge the type of the input file's Chinese Encoding (such as GB/Big5/HZ), then converts the input file to any type of Chinese Encoding you want. You can use autoconvert to automatically convert incoming e-mail messages. It can also optionally handle the UNI/UTF7/UTF8 encoding.