head -10 file
http://debianlinux.net/text_management.html
Table of Contents
Text Management Portals
Unicode Text Tools
Web based Text Editors
Collaborative Text Editors
Screen Text Editors
Stream Text Editors
Screen XML Editors
Stream XML Editors
Screen HTML Editors
Stream HTML Editors
Binary Editors
Text Comparison
Text Conversion
TypeSetting & PostScript Tools
Text Synthesis & Recognition
documented on: 2006.06.10
to keep the first 10 lines of "file":
head -10 file
to skip the first 10 lines of "file":
tail +11 file
to skip the last 10 lines of "file":
head -n -10 file
to keep the last 10 lines of "file":
tail -10 file
to print lines 11-20 of a file:
sed -e 1,10d -e 20q file
to cut lines by criteria, use 'grep', 'grep -v' or 'sed'.
to cut a file into pieces, use split
Use head to keep the first 50m
head -c 50m
to skip the first several bytes using dd
$ seq 5 | dd ibs=1 skip=6 4 5 4+0 records in 0+1 records out
to skip the last 50m
head -c -50m
use cut.
documented on: 2007.04.11
head
head -c 50m
-n, --lines=[-]N print the first N lines instead of the first 10; with the leading `-', print all but the last N lines of each file -c, --bytes=[-]N print the first N bytes of each file; with the leading `-', print all but the last N bytes of each file
SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.
> > Is there any ready-made tool to print lines from a file *after* a given > > line number?
Well, tail does that just fine; to skip the first 10 lines of "file":
tail +11 file
Another option is sed:
sed 1,10d file
The sed approach generizes better; to print lines 11-20 of a file:
sed -e 1,10d -e 20q file
-Ken Pizzini
All lines after line 10:
sed -n '11,$ p' <infile
Ken
documented on: 07-19-99
ff -l . | cut -c1-12,29-
ls -l | cut -c30-42,56-
head /usr/X11R6/lib/X11/rgb.txt | cut -f 3 # X ls -l | cut -d ' ' -f 1,9 # not working!
-c list The list following -c specifies character posi- tions (for instance, -c1-72 would pass the first 72 characters of each line).
![]() |
Starting from 1. |
-f, --fields field-list Print only the fields listed in field-list. Fields are sepa- rated by a TAB by default.
-d, --delimiter delim For -f, fields are separated by the first character in delim instead of by TAB.
![]() |
-d 'x' would normally always follows -f, |
cut -d':' -f 2
$ cut -d: -f1,5 /etc/passwd | head root:Super-User daemon: adm:Admin lp:Line Printer Admin smtp:Mail Daemon User uucp:uucp Admin
![]() |
-d ' ' is not good. |
It can't imitate the awk field selection: Better use -c to pick out the range if you can.
Newsgroups: comp.unix.shell
> > I need to eliminate the second column of a certain file. > > Sounds like a job for cut.
Only if the columns are delimited by *exactly one* space. If they look like:
Margolin Barry Other stuff Doherty John More other stuff
then cut by itself is useless. You could, however, use sed to collapse all the spaces into a single space and then pipe that to cut.
Barry Margolin
Newsgroups: comp.unix.shell Date: Tue, 5 Dec 2006 12:23:46 -0500
> #-- extract gzip for binaries.out > head -c +$GZIPBYTE binaries.out >gzip 2>$NUL > [ $? -eq 0 -a -f gzip ] || { NO; EXIT $BINARIES_EXTRACTION_FAILED; } > > The problem is that the script need to be portable on Linux and most of > Unix flavor (HP-UX, AIX, SCO, Solaris, UnixWare) and "head -c" is not > supported everywhere (not on SCO, UnixWare, Solaris). > > Is it possible to reproduce the head -c behaviour with dd command (or > with another unix commands ?)
dd bs=$GZIPBYTE count=1 if=binaries.out of=gzip
For very large files you might need to make bs <= system RAM and count=$GZIPBYTE / bs
Bill Marcum
fold -s -w 132 bigfile | lp
Use fmt instead for more advanced controls!
The fold utility is a filter that will fold lines from its input files, breaking the lines to have a maximum of width column positions (or bytes, if the -b option is specified).
-b, --bytes count bytes rather than columns -s, --spaces break at spaces -w, --width=WIDTH use WIDTH columns instead of 80
`-s' `--spaces' Break at word boundaries: the line is broken after the last blank before the maximum line length. If the line contains no such blanks, the line is broken at the maximum line length as usual.
fold and cut(1) can be used to create text files out of files with arbitrary line lengths. fold should be used when the contents of long lines need to be kept contiguous. cut should be used when the number of lines (or records) needs to remain constant.
echo "\ Updated ${PKG_INSTALL_ROOT}/etc/inet/services with new netbios and swat \ names and made backup of original ${PKG_INSTALL_ROOT}/etc/inet/services \ as ${PKG_INSTALL_ROOT}/etc/inet/services:presamba." | fold -s -w 60 | \ while read line; do echo postinstall: $line done
documented on: 2002.12.10
fmt -t -c -w 80000
$ file /usr/bin/fmt | fmt -w 40 /usr/bin/a2ps: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped
*Tags*: word wrap, :wordwrap, :formatter
fmt - simple optimal text formatter, Reformat each paragraph in the FILE(s)
-c, --crown-margin preserve indentation of first two lines -p, --prefix=STRING combine only lines having STRING as prefix -s, --split-only split long lines, but do not refill -t, --tagged-paragraph indentation of first line different from second -u, --uniform-spacing one space between words, two after sentences -w, --width=WIDTH maximum line width (default of 75 columns)
converted by "fmt -t -c -w 80000"
From
Data standards make sure that the terms people use mean the same thing. The International Classification of Diseases (ICD) is such an example. Canada is in the process of upgrading ICD from the old version of ICD-9 to the new version of ICD-10 nationwide (ICD-10, 2005; Healthcare Financial Management Association, 2004). The US however, falls behind the whole world in adopting the new ICD-10 standard. They are still using ICD-9. Even their latest research papers focus on the old ICD-9 (Glance, Laurent, Dick, Andrew, Osler, Turner, & Mukamel, Dana, 2006; Bazarian, Jeffrey, Veazie, Peter, Mookerjee, Sohug, & Lerner,, 2006; Williams, Charles, Hauser, Kimberlea, Correia, Jane, & Frias, Jaime, 2005).
to
Data standards make sure that the terms people use mean the same thing. The International Classification of Diseases (ICD) is such an example. Canada is in the process of upgrading ICD from the old version of ICD-9 to the new version of ICD-10 nationwide (ICD-10, 2005; Healthcare Financial Management Association, 2004). The US however, falls behind the whole world in adopting the new ICD-10 standard. They are still using ICD-9. Even their latest research papers focus on the old ICD-9 (Glance, Laurent, Dick, Andrew, Osler, Turner, & Mukamel, Dana, 2006; Bazarian, Jeffrey, Veazie, Peter, Mookerjee, Sohug, & Lerner,, 2006; Williams, Charles, Hauser, Kimberlea, Correia, Jane, & Frias, Jaime, 2005).
From
In US, although Open Source health care software have been actively developed, for example OpenEMR (2006), they have not received the adequate attention yet. This is because of the private and proprietary nature of the US Health industry. However, not all institutes or organizations in Canada have fully understood the damage that private and proprietary bring to the pan-Canadian interoperable EHR system, even after Infoway has taken the Open Source initiative. For example, Ontario's ePhysician Project is a pay-per-month web portal software, contracted to GE Healthcare for 15 years (Hamilton, 2005). The solution is both proprietary and exclusive. The Ontario government managed to fund $128 million, but that only covers about "10 per cent of what it would cost" for it to be fully accessible for all physicians (Hamilton, 2005, p. 1).
to
In US, although Open Source health care software have been actively developed, for example OpenEMR (2006), they have not received the adequate attention yet. This is because of the private and proprietary nature of the US Health industry. However, not all institutes or organizations in Canada have fully understood the damage that private and proprietary bring to the pan-Canadian interoperable EHR system, even after Infoway has taken the Open Source initiative. For example, Ontario's ePhysician Project is a pay-per-month web portal software, contracted to GE Healthcare for 15 years (Hamilton, 2005). The solution is both proprietary and exclusive. The Ontario government managed to fund $128 million, but that only covers about "10 per cent of what it would cost" for it to be fully accessible for all physicians (Hamilton, 2005, p. 1).
as shown the -t switch (indentation of first line different from second) works great. had it not with the weird parargraph break problem, it could be a very good paragraph reformatter.
Par is a paragraph reformatter, similar to the standard Unix fmt filter, but better. It uses a dynamic programming algorithm, which produces much better-looking line breaks than the greedy algorithm used by fmt. It can also deal correctly with a variety of quotation and comment conventions.
http://www.cs.berkeley.edu/~amc/Par/
split - split a file into pieces
ls thefile split -b 500k !$ !$. split -b 500k !$ !$.split. !! | xargsi -t split {} ~+1/{}. -d -a 3 -b 500k/10m # space ok!
!! | split -l 1000 - tmp.split. perl -e '$lmt=30; $pre="tmp.split."; $si="aa"; foreach (1..$lmt){ print "$pre$si\n"; $si++} ' ls tmp.split.?? | doeach.pl fileh ftt0 @~cat @_@~ rm tmp.split.??
$ jot 10 | split -l 3 - $ ls xa? | doeach.pl echo @~cat @_@~ echo `cat xaa` 1 2 3 echo `cat xab` 4 5 6 echo `cat xac` 7 8 9 echo `cat xad` 10
$ split --help Usage: split [OPTION] [INPUT [PREFIX]] Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N use suffixes of length N (default 2) -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file -l, --lines=NUMBER put NUMBER lines per output file --verbose print a diagnostic to standard error just before each output file is opened --help display this help and exit --version output version information and exit SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
jot 30 | split -l 1 - tmp.split. $ echo tmp.split.* | fold -sw 68 tmp.split.aa tmp.split.ab tmp.split.ac tmp.split.ad tmp.split.ae tmp.split.af tmp.split.ag tmp.split.ah tmp.split.ai tmp.split.aj tmp.split.ak tmp.split.al tmp.split.am tmp.split.an tmp.split.ao tmp.split.ap tmp.split.aq tmp.split.ar tmp.split.as tmp.split.at tmp.split.au tmp.split.av tmp.split.aw tmp.split.ax tmp.split.ay tmp.split.az tmp.split.ba tmp.split.bb tmp.split.bc tmp.split.bd $ perl -e '$lmt=30; $pre="tmp.split."; $si="aa"; foreach (1..$lmt){ print "$pre$si\n"; $si++} ' | xargs | fold -sw 68 tmp.split.aa tmp.split.ab tmp.split.ac tmp.split.ad tmp.split.ae tmp.split.af tmp.split.ag tmp.split.ah tmp.split.ai tmp.split.aj tmp.split.ak tmp.split.al tmp.split.am tmp.split.an tmp.split.ao tmp.split.ap tmp.split.aq tmp.split.ar tmp.split.as tmp.split.at tmp.split.au tmp.split.av tmp.split.aw tmp.split.ax tmp.split.ay tmp.split.az tmp.split.ba tmp.split.bb tmp.split.bc tmp.split.bd
documented on: 1999.10.26
csplit /tmp/PHoss.log '/^>>*$/' '{*}' csplit -f chp11_ chap11.lst '/^listing /' '{*}'
The csplit program splits a file according to context. It's part of the GNU textutils.
$ csplit --help Usage: csplit [OPTION]... FILE PATTERN... Output pieces of FILE separated by PATTERN(s) to files `xx01', `xx02', ..., and output byte counts of each piece to standard output. -b, --suffix-format=FORMAT use sprintf FORMAT instead of %d -f, --prefix=PREFIX use PREFIX instead of `xx' -k, --keep-files do not remove output files on errors -n, --digits=DIGITS use specified number of digits instead of 2 -s, --quiet, --silent do not print counts of output file sizes -z, --elide-empty-files remove empty output files --help display this help and exit --version output version information and exit Read standard input if FILE is -. Each PATTERN may be: INTEGER copy up to but not including specified line number /REGEXP/[OFFSET] copy up to but not including a matching line %REGEXP%[OFFSET] skip to, but not including a matching line {INTEGER} repeat the previous pattern specified number of times {*} repeat the previous pattern as many times as possible A line OFFSET is a required `+' or `-' followed by a positive integer.
Using it to split the tycpp samples. (Teach Yourself C++, http://web30.eppg.com/program/zip/tycpp.zip)
csplit -f chp11_ chap11.lst '/^listing /' '{*}'
— perfect. for this particular case, still need to remove the listing … at the top fo the cc files, though.