“Butterflies stir a breeze
and the ripples flow unceasingly:
far away the cyclones swirl.
It's a whole, connected world.”*
More details for the very curious
A short description for Read Me haters
Cyclone is a text converting utility that uses Apple Text Encoding
Converter.
Highlights include:
Version history
Details for the curious
License/distribution:
Cyclone is free for any use (except for abuse).
Any distribution is encouraged but Abracode retains copyright for the program and does not wish
to see any modified/incomplete versions distributed.
Cyclone Requirements:
Theory of operation
Text Encoding Converter (called TEC) is a Mac OS engine for handling
different languages using different character sets. It supports
many standards, it is robust and pretty fast. Many applications
use it for their internal conversion needs and that's great but
I could not seem to find a plain converter using this engine.
So here comes the Cyclone.
Highlights revisited
Because Cyclone is using TEC's conversion maps, it will grow with
TEC even if the program itself will not be developed. When more
encodings appear in future incarnations of TEC or any maps are
corrected or modified, Cyclone is supposed to use them as if nothing
has changed (OK — one exception — I hard-coded the names of encodings,
because I was not satisfied with the names returned by TEC, but
if Cyclone will not find the name for any new encoding in its
own resources, it will use the name given by TEC). TEC does not
change line endings properly so I added this option (any bugs
in this field are mine), look “More details” section for specifications.
Cyclone can convert many files dragged at it or chosen from standard
file dialog.
Conversions
When you look at the conversion dialog you will see the two sets
of pop-ups, left for input, right for output. Choose the standard/platform
first, then specific encoding and lastly the variant (if any variant
for the given encoding exists). You may choose whatever you want
for input and output encodings, but you must be aware that not
all conversions make sense — you cannot translate from Chinese
to Greek with TEC (not yet :-)). Sometimes you will get an error,
but sometimes not. You are responsible for choosing a valid encodings
for input and output. You may use content sniffers, which can
help with input encoding (look “More details” section for description
of sniffers), but do not rely on it.
Preferences
I implemented the following options to make my life easier (and
hopefully yours too):
Multiple file settings
More details for the very curious
Content sniffers
Content sniffing is a feature offered by TEC and used by Cyclone
when checked in preferences.
When this option is active, Cyclone tries to suggest what input
encoding is used. Unfortunately in current TEC version (1.5)
can guess content ONLY for far-east languages. So if you are using
these languages frequently, this option is for you. Otherwise
you will be annoyed that Cyclone (or TEC, to be precise) suggests
Chinese or Japanese every time you want to convert a plain ASCII.
This option is turned off by default.
Content sniffing is not working correctly.
I do not use it and people seem not to care about it — this is why it is not fixed yet.
Sniffers available in TEC 1.4.3 and 1.5 (in order of appearance):
Macintosh:
Line Breaks
As mentioned before, TEC does not change the line breaks to match
the output standard. For example when you convert from Mac to
Windows, everything is converted OK except for line endings, which
remain in Mac standard. So the option to change the line breaks
has been added. Here are the rules for output standards when "Match output standard"
option is chosen:
Unicode and HTML
HTML writers please note, that if you are building a page where
most (or all) characters are ASCII, the encoding of choice
for you is Unicode UTF-8. If all characters are ASCII, the length
of your page will be exactly the same as if no Unicode is used.
To inform a browser that the Unicode UTF-8 is used, type:
<META HTTP-EQUIV="content-type" CONTENT="text/html;charset=UTF-8">
between <HEAD> and</HEAD> at the beginning of your file.
You should not use the PS = 0xE280A9 as a line break because most browsers do not support it.
More Unicode notes
The registered type for standard Unicode (UTF-16) text is 'utxt'
(used for file and clipboard), while plain 8-bit text uses 'TEXT'.
You may not be able to see the content of the clipboard or paste
it if the application you use does not support Unicode. Unicode
UTF-8 and UTF-7 remain 'TEXT'.
Each standard Unicode (UTF-16) text produced by Cyclone has a
byte-order mark (0xFEFF) at the beginning to ensure 100% portability.
Scripting
Beginning with version 1.1 “Cyclone” is scriptable via AppleScript. Please see
the sample scripts provided in “Scripting” folder. A document entitled
“Encodings Dictionary” contains predefined encoding names which can be used in scripts.
Available AppleScript commands:
convert <file_list> from <encoding> to <encoding>
convert clipboard from <encoding> to <encoding>
convert text <some_text> from <encoding> to <encoding>
Beginning with version 1.3 you may pass an Interent name for encoding.
This option is available with any “convert” command: “convert”,
“convert text”, “convert clipboard”:
convert some_file from "ISO-8859-1" to "UTF-8"
Setting Options:
set option <an_option>
Cyclone 1.5 adds support for optional setting of line breaks in exported file.
This option is available with any “convert” command: “convert”,
“convert text”, “convert clipboard”:
The following sample demonstrates the syntax:
convert a_file from Mac_Roman to "UTF-8" with UnixLineBreaks
The available options are:
The future
Beginning with version 1.6 the sources are opened and developers are welcome to submit code additions.
Small print
The author gives no warranty for this software and takes no responsibility
for any damages that it may cause. If you cannot accept it, please
delete your copy.
All trademarks are properties of their owners.
* the quotation is from Peter Hammill (“Gaia”).