Typing Bangla TeX/LaTeX Source (bangtex) Directly in Native Bangla (Unicode/UTF-8)


Introduction

bangtex: Bangla TeX and LaTeX

For typesetting Bangla documents in TeX/LaTeX, Palash B. Pal's excellent bangtex package is the best.

bangtex uses a particular ASCII transliteration of Bangla (a specific orthographically accurate romanization of Bangla), which also has a few mechanisms for better readability (such as the \*...* construct and the superfluous use of the letter o). The seicor script can further improve readability of the source.



The Problem

Can we avoid romanization in typing LaTeX source in Bangla?

As a native Bangla user used to exchanging Bangla emails and writing Bangla webpages in html text, I find it very convenient to type text directly in Bangla into text editors which display the typed text in native Bangla symbols and use the now-standard (and essential) font-independent unicode UTF-8 character encoding (instead of any romanized form) for text storage.

So for me, it seemed painful to learn a specific romanized form (ASCII transliteration) of written Bangla for typing LaTeX source documents.

Could I somehow write TeX/LaTex source files directly in Bangla and still use bangtex?



The Solution: Use This Script to Convert Bangla Unicode into bangtex's Transliterated ASCII Format

Typing LaTeX source documents directly in unicode Bangla

A simple solution was to prepare the LaTeX source document using unicode UTF-8 encoded Bangla text, and then use a special script called uni2bangtex.perl to convert it into transliterated ASCII in bangtex format. Since unicode UTF-8 encoding is a superset of ASCII, the ASCII needed to type the LaTeX commands can be freely mixed within the UTF-8 Bangla source text.

This means that I can use any unicode UTF-8 editor to prepare the LaTeX source directly in Bangla, with any appropriate Bangla keyboard input method (phonetic, inscript, etc) and any Bangla font for UTF-8 (usually truetype or opentype) --- see the screenshot below. If your native language is Bangla, you will probably find this to be a faster, more pleasant, more intutive, and less error-prone way to type the source LaTeX document than using a specific romanized form (ASCII transliteration) of Bangla.



An Example

Preparing a sample LaTeX Bangla document

Let us go through an example showing how to prepare a sample Bangla LaTeX document named smpldoc:



System Requirements

What you need on your computer

For a setup like the one I have described above, you will need to have the following installed on your computer:



Download

The perl script file and this webpage

Download the perl script uni2bangtex.perl. The script needs perl-5.8.5 or newer.

You can also download a complete tarball of this webpage, with all the sample files and images in it.



About unicode and UTF-8

What is unicode and UTF-8?

Unicode is a standard for font-independent and orthographically accurate digital representation of written language using character codes. The role unicode plays for general languages is identical to the role played by the ASCII code for English. In particular, there is a perfect one-to-one correspondence between Bangla unicode and written Bangla which preserves all spellings.

In this way, unicode can be viewed as an extension of ASCII to encode the characters of all other languages. In fact, a specific unicode encoding scheme called UTF-8 is designed in such a way that it is a direct superset of ASCII. Thus a UTF-8 text document can contain ASCII characters, and an ASCII text document is simply a special type of UTF-8 text document.

To learn more, see the UTF-8 and Unicode FAQ for Unix/Linux. On recent linux systems, you can look up the manpages for unicode(7), utf-8(7), and charsets(7).


Why use unicode for encoding Bangla text in this way?

We note the following points in support of unicode for encoding Bangla text.



Appendix

Setting up Bangla unicode text support in Linux / X windows

You need three things for using Bangla unicode text on Linux:
  1. A text editor with UTF-8 support. (This means a simple character based text editor, not a word processor such as OpenOffice or MS word.)

    This may already be present in your system, as most modern operating systems with graphical desktops have a default GUI editor with this feature. For example, many Linux distributions include either GNOME's default text editor gedit or KDE's default text editors Kate and/or Kwrite, MS Windows comes with Notepad, Mac OS X has TextEdit, etc, and all these now support multilingual UTF-8. If your system does not have it, you may want to install and use the simple free classic UTF-8 editor yudit. Other choices are possible, such as the GNU super-editor Emacs. See Wikipedia's Comparison of text editors.

  2. To display Bangla text, the editor will need a font capable of rendering unicode UTF-8 Bangla. This will usually be an opentype or truetype font.

    Note that this font is only for displaying Bangla in the text editor in which you prepare the LaTeX source document, and has nothing to do with the font of the final document output by bangtex (such as bpsf).

    Again, most modern operating systems now come with default fonts for displaying most of unicode UTF-8, and so it may not be necessary to install any special Bangla UTF-8 font, unless you do not like the system default fonts for displaying Bangla.

    See Bangla script display help at Wikipedia and Bangla Wiktionary for more details.

    If you need to install Bangla fonts:

  3. A keyboard input method for typing Bangla UTF-8 characters using a romanized keyboard (usually QWERTY).

    Once again, most modern operating systems provide keyboard layouts for various languages and a way to switch between various layouts. The default layout usually is a form of English, which maps keyboard scan codes into ASCII characters. Switching to a different layout will cause this map to change, and a Bangla layout will map keyboard scan codes into Bangla UTF-8 characters instead of ASCII characters.

    See Bangla script input help at Wikipedia and Bangla Wiktionary for more details.

    There are different types of layouts for Bangla available, such as phonetic, non-phonetic, etc.

    If you are used to typing on QWERTY keyboards using primarily a language with essentially Roman script (English, German, French, Spanish, Italian, etc) and you are new to Bangla typing, then you will probably find a phonetic layout to be the easiest to use. For an X-window based system, a Bangla phonetic layout called probhat (picture of layout) is generally available. I personally use a variant of it, which I call suprobhat. Another possibility for a modern phonetic layout is baisakhi (PDF document, picture of layout), developed by SNLTR.

    Also see this explanation of phonetic Bangla typing.


Low Level Keyboard Layout Switching in X windows

Warning! You should not use this method unless you really know what you are doing, or else it can make your computer unusable. If you use a modern distribution of Linux with a graphical desktop manager such as GNOME or KDE, you will most likely have a way (perhaps a menu in your desktop manager or a graphical applet) to switch your keyboard layout, and you should use it to select a Bangla layout of your choice (e.g., Probhat).

If you really want to use this low level method (bypassing your desktop manager) to switch to a new keyboard layout, use the setxkbmap command to directly instruct the X windows server to select or switch to an xkb keyboard layout for X, which are found in the directory /etc/X11/xkb/symbols/. Look there for a file named in (for India), or bd (for BanglaDesh), or ben or bang, which should have an entry for the Probhat layout, named ben_probhat, or simply probhat. (You can also download the layout here. I personally use a variant of probhat which I call suprobhat.)

E.g., if the file /etc/X11/xkb/symbols/in has a layout entry called ben_probhat, you can activate it by a command such as

    setxkbmap -model pc101 -layout "us,in(ben_probhat)" -option "grp:shifts_toggle,grp_led:num"
or
    setxkbmap -model pc101 -layout "us,in(ben_probhat)" -option "grp:shift_toggle,grp_led:num"
depending on the version of your X. This will set things up in xkb so that pressing the two shift keys together will toggle between the standard US (English) and the ben_probhat (Bangla) keyboard layouts.



Web Resources

List of usefule websites



Abhijit Dasgupta
Thu Oct 7 03:11:55 EDT 2010