0
0
mirror of https://github.com/rkd77/elinks.git synced 2025-09-21 19:46:23 -04:00

Bug 381: Store codepage-to-Unicode mappings as dense arrays.

Previously, each mapping between a codepage byte and a Unicode
character was stored as a struct table_entry, which listed both the
byte and the character.  This representation may be optimal for sparse
mappings, but codepages map almost every possible byte to a character,
so it is more efficient to just have an array that lists the Unicode
character corresponding to each byte from 0x80 to 0xFF.  The bytes are
not stored but rather implied by the array index.  The tcvn5712 and
viscii codepages have a total of four mappings that do not fit in the
arrays, so we still use struct table_entry for those.

This change also makes cp2u() operate in O(1) time and may speed up
other functions as well.

The "sed | while read" concoction in Unicode/gen-cp looks rather
unhealthy.  It would probably be faster and more readable if rewritten
in Perl, but IMO that goes for the previous version as well, so I
suppose whoever wrote it had a reason not to use Perl here.

Before:

   text	   data	    bss	    dec	    hex	filename
  38948	  28528	   3311	  70787	  11483	src/intl/charsets.o
 500096	  85568	  82112	 667776	  a3080	src/elinks

After:

   text	   data	    bss	    dec	    hex	filename
  31558	  28528	   3311	  63397	   f7a5	src/intl/charsets.o
 492878	  85568	  82112	 660558	  a144e	src/elinks

So the text section shrank by 7390 bytes.

Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-Os -ggdb -Wall"
This commit is contained in:
Kalle Olavi Niemitalo
2006-09-24 16:55:29 +03:00
committed by Kalle Olavi Niemitalo
parent 0e88f8ba28
commit 4a5af7fd26
3 changed files with 4494 additions and 4139 deletions

View File

@@ -23,19 +23,52 @@ for i in $codepages; do
echo "/*** $i ***/"
echo
echo 'const struct table_entry table_'$i' [] = {'
# TODO: Comments inside of the structure are ugliness in a pure clean
# form, and my aesthetical feeling shivers upon glancing at it. However
# we should handle commentless records. A loop with read inside would
# be ideal, I suppose. --pasky
tail -n +3 $i.cp | sed 's/# *\(.*\) *$/\/* \1 *\/ /' | grep '^0x[89a-zA-Z]' \
| sed 's/[ ][ ]*/ /g' | sed 's/[ ]*$/ },/' | sed 's/ /, /' \
| sed 's/^[ ]*/ {/' | grep '.*,.*,'
echo ' {0, 0}'
echo '};'
echo
sed '1,2d
/^[ ]*\(#.*\)\{,1\}$/d
h
s/^[^#]*//
s!#[ ]*\(.*\)!/* \1 */!
x
s/#.*//
y/Xabcdef/xABCDEF/
/^0x[01234567]/d
/[^0x0123456789ABCDEF ]/d
G
s/\n//' "$i.cp" | {
for left in 8 9 A B C D E F; do
for right in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do
eval "high0x$left$right="
done
done
table=
highuse=
while read byte unicode comment; do
if eval "[ \"\$high$byte\" ]"; then
table="$table {$byte, $unicode},${comment+ }$comment
"
else
eval "high$byte=\"\$unicode,\${comment+ }\$comment\""
highuse=1
fi
done
if [ "$highuse" ]; then
printf "const uint16_t highhalf_%s [] = {\n" "$i"
for left in 8 9 A B C D E F; do
for right in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do
eval "printf \"\\t/* %s */ %s\\n\" \"0x$left$right\" \"\${high0x$left$right:-0xFFFF,}\""
done
done
printf "};\n\n"
else
printf "#define highhalf_%s highhalf_NULL\n\n" "$i"
fi
if [ "$table" ]; then
printf "const struct table_entry table_%s [] = {\n%s\t{0, 0}\n};\n" "$i" "$table"
else
printf "#define table_%s table_NULL\n" "$i"
fi
printf "\n"
}
echo 'unsigned char *const aliases_'$i' [] = {'
head -n 2 $i.cp | tail -n +2 | sed 's/ \+/ /g; s/ $//; s/\", /\",<2C>/g; s/$/,/' | tr "<22>" "\n" \
@@ -45,11 +78,21 @@ for i in $codepages; do
n=`expr $n + 1`
done
printf "\n/*** NULL ***/\n\n"
printf "const uint16_t highhalf_NULL [] = {\n"
for r in `seq 16`; do
printf "\t0xFFFF,0xFFFF,0xFFFF,0xFFFF, 0xFFFF,0xFFFF,0xFFFF,0xFFFF,\n"
done
printf "};\n\n"
printf "const struct table_entry table_NULL [] = {\n"
printf "\t{0, 0}\n"
printf "};\n"
echo
echo 'const struct codepage_desc codepages [] = {'
for i in $codepages; do
echo ' {"'`head -n 1 $i.cp`'", aliases_'$i', table_'$i'},'
echo ' {"'`head -n 1 $i.cp`'", aliases_'$i', highhalf_'$i', table_'$i'},'
done
echo ' {NULL, NULL, NULL}'