html2text -o outfile.txt <file> lynx <url> | html2text -ascii -nobs -width 76 -style pretty | less
html2text - An advanced HTML to text converter
markdown - Text-to-HTML conversion tool http://daringfireball.net/projects/markdown/
parsewiki - Documentation System Based on ASCII Text no English doc found on web. Very few people use.
txt2html - Text to HTML converter This package has been orphaned, since 2004-03-13 as v2.23-1.
txt2tags - a Python conversion tool generating HTML/SGML/LaTeX/man/MoinMoin/mgp/PageMaker files
unhtml - Remove the markup tags from an HTML file w3m - WWW browsable pager with excellent tables/frames support
documented on: 2005.02.20
html2text -o outfile.txt <file> lynx <url> | html2text -ascii -nobs -width 76 -style pretty | less
An advanced HTML to text converter
html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] \ [ -rcfile <file> ] [ -style ( compact | pretty ) ] [ -width <w> ] \ [ -o <file> ] [ -nobs ] [ -ascii ] [ <input-url> ] ... Formats HTML document(s) read from <input-url> or STDIN and generates ASCII text.
-rcfile <file> Read <file> instead of "$HOME/.html2textrc" -style compact Create a "compact" output format (default) -style pretty Insert some vertical space for nicer output -width <w> Optimize for screen widths other than 79 -o <file> Redirect output into <file> -nobs Do not use backspaces for boldface and underlining -ascii Use plain ASCII for output instead of ISO-8859-1
documented on: 2005.02.21
Hi, I like your program very much!
> One feature request: Could you add some URL handling features? The > html2textrc(5) feature is really nice, but it lacks the ability to handle > urls. I.e., I'm hoping to use this tool to convert web pages to wiki format, > which means I want to be able to define that urls be translated in plain > text also. E.g., I hope I can use html2text to tranlaste urls into wiki > format like, !http://www.google.com[] This Link points to google?, or in > DokuWiki? format -- [ http://www.google.com | This Link points to google > ]]. Thanks.
Think of html2text as a filter. It aims to show you what you would see if you loaded the file in a grafical browser, without being a browser. The arguments of HTML elements are not interpreted, with exception for "IMG ALT", which is used to give a good substitute for images that cannot be represented in plain text media. While I understand that it might be desiderable for Wiki source code pre-processing to have the "A HREF" argument contents displayed verbatim, this would completely break with the idea of a filter. html2text is expected not to bother about markup as long as it does not contain any structural information (so called logical markup, think of headings, lists and so on). "A" does not.
> One suggestion: according to the manual, "html2text will not follow > redirections (HTTP 301/307). Proxy servers are not supported." This turns > html2text to a very limited use. Why not just remove all url featching code, > and rely on cat/lynx to feed html files to html2text? I.e., do it the unix > way -- "do only one thing, but do it the best". In fact, in my alias, I > always use "lynx -source" to pipe stuff to html2text -- I don't want that my > code sometimes work, but sometimes don't (I think redirections happen quite > often on the web).
As already stated in the documentation, the HTTP implementation in html2text is rather basic: All it does is more or less to issue a "GET" request. It's more a gimmick than a core function. But that's not sufficient for removing it completely and for disappointing all of the other users that might find it usefull. Thus, if the HTTP engine in html2text does not fit your needs, just don't use it.
MartinBayer
I am looking for a utility which will turn all of my lower case HTML (written in, say, notepad) into upper case and perhaps indent it too. Does anyone know if such a program exists?
You can user Dave Raggett's "tidy" to force all tags into upper case, or lower case. Actually, this tool can do so much more to beautify your code, for me it's a must-have.
It's a command-line tool (great for mass manipulation), but it has also been integrated in a number of GUI programs. See http://www.w3.org/People/Raggett/tidy/ for more info.
Matthias
> I would just like to thank Matthias and all the others who have pointed me > in the direction of 'Tidy'. From what I have read, it is does exactly the > sort of thing I am looking for. My problem now is figuring out how it > works! Excuse me for being thick, but the blurb on www.w3.org about the > program is virtually incomprehensible for a novice! All that stuff about > stderr and stdout doesn't mean a thing to me.
Download tidy, and save it in one of the directories which are on your default path.
Open an MS-DOS window.
In it, connect to the directory where your HTML files are by typing
cd <path to directory>
Now, for each file you want to work on, type
tidy -i -u -m <filename>
Where -i means 'indent'; -u means 'upper-case tags', and -m means 'modify in place'.
Simon Brooke
> works! Excuse me for being thick, but the blurb on www.w3.org about the > program is virtually incomprehensible for a novice! All that stuff about > stderr and stdout doesn't mean a thing to me.
If you really want to do more of this web stuff, you might as well get used to the technobabble right now. There is more to come :-).
However… HTML-Kit ( http://www.chami.com/html-kit/ ) has a nice graphic frontend for tidy, including the lowercase/uppercase thing. Programs for other platforms are mentioned at the "tidy"- homepage.
Matthias
Newsgroups: comp.infosystems.www.authoring.html Date: 2000/03/07
>I am looking for a utility which will turn all of my lower case HTML >(written in, say, notepad) into upper case and perhaps indent it too. >Does anyone know if such a program exists?
With the upcoming shift from html to xml/xhtml I would advice against converting your elements to upper case. XHTML is case sensitive, and only lower-case is allowed.
However, if you find documents hard to read with only lowercase, I would advice that you got an editor with source highlightning. One such Freeware program I recommend for the windows platform is 1stPage 2000, downloadable from <URL:http://www.evrsoft.com/> It also has a tool called HTML tidy which will convert case, and fix indentation for your html-documents.
If you're just after tidy, it can be downloaded for several platforms from <URL: http://www.w3.org/People/Raggett/tidy/>. It can fix other problems with the html, such as nesting errors, word-generated html.
Arve Bersvendsen
Newsgroups: alt.html.writers Date: 2000/01/17
>I am searching an HTML Sourcecode Beautifier, "tidy" which i got on the w3c >site doesnt work really, if the prog has also a function to check the code >it would be fine.
Perhaps if you could tell us which aspect of tidy doesn't work for you we might be able to offer some suggestions.
I find Tidy very useful.
Calum
p.s. We're discussing http://www.w3.org/People/Raggett/tidy/
Hi. Request canceled, I just found a tool that fits my needs perfectly. Thank you for your attendance.
BTW : If you are interested … http://freshmeat.net/projects/htmltidy/
Martin
Note, htmltidy in freshmeat.net *is* Raggett's tidy.
Actually, the most up to date development branch is at http://tidy.sourceforge.net/
Tong
txtfmt is an ASCII text formatter utility which formats XML documents into ASCII text. It is most useful for formatting e-mail messages. It handles paragraphs, bullets, tables, and more.
Matthew Campbell - April 16th 1999, 03:20 EST
rm -f err; ls *htm | doeach.pl 'tidy -wrap 72 -raw @_' @g ~+1/@_ 2@g@g err
tidy --show-warnings no --force-output yes -quiet -indent -raw -wrap ${hfrm:-5000} -asxml --write-back yes
tidy --write-back yes --show-warnings no --clean yes --force-output yes -wrap 5000 l-grub-1-1.html tidy -quiet -upper -asxml -numeric -indent --show-warnings no --clean yes --force-output yes -wrap 5000 /home/tong/try/grub/l-grub/l-grub-1-1.html > $tf.grub.htm
HTML TIDY is a free utility to fix mistakes made while editing HTML and to automatically tidy up sloppy editing into nicely layed out markup. It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.
The maintenance of Tidy has now been taken over by a group of enthusiastic volunteers at Source Forge, see http://tidy.sourceforge.net.
make mv tidy ~/local/bin
rm -f err; ls *htm | doeach.pl 'tidy -wrap 72 -raw @_' @g ~+1/@_ 2@g@g err
![]() |
!! |
tidy nowhere.htm tidy 4dos650.htm tidy -f err 4dos650.htm
documented on: Sat 11-14-98 22:38:32
texi2html is a Perl script that converts GNU's Texinfo files to HTML.
The program takes Texinfo files (and not info ones) and produces a set of HTML files. The quality of the output is close to the printed output and is much better than an info->HTML gateway.
http://wwwinfo.cern.ch/dis/texi2html/ isn't maintained anymore.
Texi2html's current Homepage
Last modified: Mon Nov 22 14:06:17 MET 1999
documented on: 1999.11.23 Tue 10:02:01
Natural Language: English; Operating System: Linux; …
New Linux tools: Adobe SVG viewer and Quick … Users of the Linux platform will be glad to hear of the release of Adobe's SVG viewer: also JXML's Quick Java/XML toolkit now has explicit Linux support. …
alphaWorks … UNIX scripts are included. What is XML Viewer for Java TM ? XML Viewer for Java is a Java application …
XML tools by category http://www.garshol.priv.no/download/xmltools/cat_ix.html
XML tools by platform http://www.garshol.priv.no/download/xmltools/plat_ix.html
By: IBM alphaWorks Version: 15.Sep.99 release Platforms: Java Category: XML browsers Info: http://www.alphaworks.ibm.com/tech/xmlviewer
XML Viewer is a simple Java application that can display both raw XML source and a tree view of any well-formed XML document. XML Viewer is also DTD-aware and can show DTDs as well and show the declaration of any element or attribute.
Release 1.95.1
A reference manual is available in the doc/reference.html in this distribution.
Discussion related to the direction of future expat development takes place on expat-discuss@lists.sourceforge.net. Archives of this list may be found at http://www.geocrawler.com/redir-sf.php3?list=expat-discuss.
./configure --prefix=/opt
make
pkg=expat-1.95.1 make -n install | tee /export/pub/installs/logs/$pkg.log.0 make install | tee /export/pub/installs/logs/$pkg.log.1
documented on: 2000.12.21 Thu 20:24:24