OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 3:05 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 8:41 am 
Offline
Member
Member
User avatar

Joined: Fri Apr 03, 2015 9:41 am
Posts: 492
My friend tells me that my OS is more interested by Russians, so I need to implement the Russian language in it. I don't agree - who need Russian in alpha-stage OS? So, the question is - who I need to listen - me or my friend?

_________________
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing

OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 9:12 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4591
Location: Chichester, UK
Listen to yourself.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 9:24 am 
Offline
Member
Member
User avatar

Joined: Fri Apr 03, 2015 9:41 am
Posts: 492
OK, thanks.

_________________
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing

OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 10:17 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 3:45 am
Posts: 9301
Location: On the balcony, where I can actually keep 1½m distance
It can be a pain to introduce internationalisation later in development if precautions aren't taken early, and the alpha stage might be a good moment to try it out. But in the end it's just as iansjack says: Your project, your rules.

Psst, you can also look at it differently: it's worth several invisible tokens to beat the master on a niche subject and be really bilingual - who here can say they have better user support than for instance Brendan's OS? ; ) - of course, only if you're interested

_________________
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 12:33 pm 
Offline
Member
Member
User avatar

Joined: Fri Apr 03, 2015 9:41 am
Posts: 492
I have a UTF-8 and Russian fonts now, but the second isn't used, and the first is very glitchy.

_________________
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing

OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 03, 2016 3:00 pm 
Offline
Member
Member

Joined: Thu Jul 03, 2014 5:18 am
Posts: 84
Location: The Netherlands
I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.

_________________
My blog: http://www.rivencove.com/


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Fri Mar 04, 2016 4:07 am 
Offline
Member
Member

Joined: Sun Feb 01, 2009 6:11 am
Posts: 1070
Location: Germany
dseller wrote:
I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.

If you're not really careful, the problem with that is that you'll only notice the problems in your framework when it's too late. Implementing only one language means that you're prone to write your code as if other languages were just English with different words. But they aren't.

_________________
Developer of tyndur - community OS of Lowlevel (German)


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Fri Mar 04, 2016 7:20 am 
Offline
Member
Member
User avatar

Joined: Fri Apr 03, 2015 9:41 am
Posts: 492
dseller wrote:
I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.

It's done a long time ago.

_________________
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing

OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 10, 2016 8:26 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
I picked the name for my OS project to be specifically impossible to write until UTF8 support was properly added. It's Rødvin. That said, I've also implemented UTF8 support & drawing them, so that everything will be supporting any non-English UTF8 language.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Thu Mar 10, 2016 1:35 pm 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.

_________________
[www.abubalay.com]


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Fri Mar 11, 2016 1:50 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
Right-to-left and bidirectional writing. Characters that need to be larger than latin glyps. Different digit separators, date string formats, currency formats. Glyphs being digits but not being in the 0-9 range. Strings taking much more space than in English. Combining characters. The list goes on.

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Sat Mar 12, 2016 9:48 am 
Offline
Member
Member

Joined: Sat Mar 01, 2014 2:59 pm
Posts: 1146
Rusky wrote:
There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.
And of course with multi-language support your OS also needs to have a proper localisation framework.

_________________
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Fri May 13, 2016 3:07 am 
Offline
Member
Member
User avatar

Joined: Sun Feb 20, 2011 2:01 pm
Posts: 110
From my experience, I used UTF-16 as the internal character format, as that eased many of the issues. I used a simple Unicode bitmap font, freely available (I converted it to C format with a utility). Then, you want to load language packs. My approach was to place one in the initrd, and the kernel used that. The format doesn't have to be complex, you could just have a list of strings, each separated by a newline to begin with (you will probably want to move onto things like XML later though).

If you're mainly aiming at supporting Russian, RTL isn't so important. But, since I was attempting Hebrew, this was the nightmare bit. Enjoy!
Needless to say, this is all needing a graphics mode.

Of course, there is a difference between a language pack and a good language pack. Generally, you will want the strings to be complete sentences so you don't get a complete grammatical screw up.

Hope this helps.

_________________
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Sat May 14, 2016 9:47 pm 
Offline
Member
Member
User avatar

Joined: Tue Mar 06, 2007 11:17 am
Posts: 1225
You can encode all languages efficiently in UTF-8, and even SQLite3 supports it fully (that's why there are so many end-user programs that make internal use of SQLite3).

You could port SQLite3 to your OS to add indexing, query and even "registry" capabilities for installed programs, configuration values, etc.

I prefer to use UTF-8. It's widely supported in the Web and that's why I should learn to encode it with my own code. UTF-16 as well.


I would recommend you to use a database engine like SQLite3 and make a database that contains all words in all languages. Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:

This is a simple table definition for that:
Code:
CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";

.mode tabs
.import multilanguage_words.txt multilanguage_words





This is sample text of how to index all words in all languages in one same row (text line). Note how each word has a header between || which contains the language ID and the classification of the word (synonym, antonym, name, etc...). The numbers like 50 are the percentage of positive or negative emotions for each word by default that I felt when I wrote the text, but could be updated with A.I., and the last word like "skill" is an attempt to classify the words from the most basic existent human concept, to the most complex and emotive/subjective one, but is optional... all parameters after the language ID are meant to be optional and parsed for their presence:
Code:
{:synonym:es:50:skill}programador+{:synonym:en}programmer+{:synonym:fr}programmeur+{:synonym:it}programmatore+{:synonym:eo}programisto   {:synonym:es}Persona que diseña los procedimientos a seguir por un dispositivo automatizado.
{:synonym:es:80:quality}listo+{:synonym:en:70}smart+{:homonym:es}listo,{:synonym:es}sagaz,{:synonym:es}astuto,{:synonym:es}ducho,{:synonym:es}despabilado,{:synonym:es}avivado,{:synonym:es}avezado+{:typo:es}avesado+{:synonym:es}avispado,{:synonym:es}perspicaz,{:synonym:es:75}vivo   {:synonym:es}Persona con gran agudeza y agilidad mental y práctica.
{:synonym:es:60:status}listo+{:synonym:es}ready,{:synonym:es}preparado+{:synonym:es}prepared,{:synonym:es}dispuesto+{:synonym:en}willing,{:synonym:es}complaciente   {:synonym:es}Estado de espera y disposición para llevar a cabo una tarea.
{:synonym:es:50:concept}palabra,{:synonym:es}fonema,{:synonym:es}vocablo,{:synonym:es}término,{:synonym:es}verbo,{:synonym:es}dicción,{:synonym:es}expresión,{:synonym:es}lengua,{:synonym:es}lenguaje,{:synonym:es}habla,{:synonym:es}promesa,{:synonym:es}pacto,{:synonym:es}oferta,{:synonym:es}juramento,{:synonym:es}ofrecimiento,{:synonym:es}compromiso   {:synonym:es}Elemento de todo lenguaje que comunica ideas, intenciones y acciones.
{:synonym:es:-60:action}desaparecer+{:synonym:es:-35:action}desaparecerse,{:synonym:es}esfumar+{:synonym:es}esfumarse,{:synonym:es}retirar+{:synonym:es}retirarse   {:synonym:es}Alejar algo de nuestra percepción de modo que no se pueda encontrar.
{:name:es:0:male}Rodolfo+{name:en:0:male}Rudolph
{:synonym:*:medication}Panadol+{:synonym:*:medication}Paracetamol
{:synonym:en:65}conversely+{:synonym:es:65}al contrario de+{:synonym:es:65}a diferencia de
{:synonym:es}amalgama+{:synonym:en}amalgam+{:synonym:es}amalgamation
{:synonym:es}natural+{:synonym:en}natural,{:synonym:es}sincero+{:synonym:en}sincere,{:synonym:es}espontáneo+{:synonym:en}spontaneous,{:synonym:es}genuino+{:synonym:en}genuine
{:synonym:es}calibración+{:synonym:en}calibration+{:synonym:es}calibrar+{:synonym:en}calibrate,{:synonym:es}equilibrio+{:synonym:es}equilibrar+{:synonym:en}equilibrium+{:synonym:en}equilibrate,{:synonym:es}balance+{:synonym:es}balancear+{:synonym:en}balance
{:surname:en}Sonnenreich+{:typo:en}Sonnereich
{:synonym:es}correspondiente+{:synonym:es}coincidente+{:synonym:es}concordante+{:synonym:pt}concorda+{:synonym:pt}concorde+{:synonym:es}+que concuerde+{:synonym:es}acierto+{:synonym:es}coincidencias+{:synonym:es}concordancias+{:synonym:es}acierto+{:synonym:es}aciertos+{:synonym:pt}de acordo+{:synonym:pt}concerta+{:synonym:pt}concertar+{:synonym:pt}concerte
{:synonym:es}endurar+{:synonym:es:0:verb}endurecer+{:synonym:es:0:verb}endurezco+{:synonym:es:0:verb}endureces+{:synonym:es:0:verb}endurece+{:synonym:es:0:verb}endurecemos+{:synonym:es:0:verb}endurecéis+{:typo:es:0:verb}endureceis+{:synonym:es:0:verb}endurecen+{:synonym:en:0:verb}harden+{:synonym:en}hard+{:synonym:en:0:verb}make hard+{:synonym:en:0:verb}to make hard+{:synonym:en:0:verb}make it hard+{:synonym:en:0:verb}making it hard{:synonym:es}endura+{:synonym:es}endurece+{:synonym:es}durar
{:name:*:0:font-face}Calibri
{:synonym:en}keep+{:synonym:en}keeping+{:synonym:en}kept+{:synonym:es:0:verb}mantener+{synonym:es}mantén+{:typo:es}manten+{:synonym:es}mantengo+{:synonym:es}mantienes+{:synonym:es}mantiene+{:synonym:es:0:verb}mantienen+{:synonym:es}mantenemos+{:synonym:es}mantenéis+{:typo:es:verb}manteneis+{:synonym:es}mantén
{:word:es}tu+{:word-plural:es}tus+{:word:en}your
{:name:en:50:organism}eye+{:name-plural:en:50:organism}eyes+{:name:es:50:organism}ojo+{:name-plural:es:50:organism}ojos
{:name:es:100:math}Álgebra+{:name:en:100:math}Algebra
{:name:es:100:math}Aritmética+{:name:en:100:math}Arithmetic
{:name:es:100:math}Cálculo+{:name:en:100:math}Calculus
{:name:en:100:artificial-intelligence}Situation Calculus+{:name:es:100:artificial-intelligence}Cálculo Situacional
{:name:en}dynamical domain+{:name:en}dominio dinámico+{:name:en}dynamical domains+{:name:en}dominios dinámicos
{:name:en}vedic math+{:name:es}matemática védica
{:name:en}Pizza Hut
{:name:en}Toto's Pizza
{:word:es}tan+{:word:en}as+{:word:es}tanto
{:word:es}también+{:typo:es}tambien+{:chat:es}tmb+{:word:en}as well+{:word:en}as well as
{:synonym:en:0:verb}close+{:antonym:en:0:verb}open+{:synonym:en}closed+{:synonym:es}cerrado+{:antonym:es}abierto+{:synonym-plural:es}cerrados+{:antonym-plural:es}abiertos
{:word:en}and+{:word:es}y+{:word:pt}e+{:word:fr}et
{:word:en}of+{:word:es}de
{:word:en}you+{:word:es}tú
{:word:en}state+{:word:es}estado+{:word-plural:en}states+{:word-plural:es}estados
{:name:en}day+{:name:es}día+{:name-plural:en}days+{:name-plural:es}días
{:synonym:en}ensure+{:synonym:en}make sure+{:synonym:en}making sure+{:synonym:es}asegurándose+{:synonym:es}asegurar+{:synonym:es}asegurarse
{:synonym:en}high+{:synonym-male:es}alto+{:synonym-female:es}alta+{:synonym-plural-male:es}altos+{:synonym-plural-female:es}altas
{:synonym:en}level+{:synonym-plural-male:en}levels+{:synonym-male:es}nivel+{:synonym-plural:es}niveles+{:synonym:es}nivelación+{:synonym-plural:es}nivelaciones
{:synonym-male:en}channel+{:synonym-plural-male:en}channels+{:synonym-male:es}canal+{:synonym-plural-male:es}canales
{:word:en}the+{:word-male:es}el+{:word-female:es}la+{:word-female:es}las+{:word:es}lo+{:word-plural:es}los
{:word:en}to+{:word:es}para+{:word:es}a
{:synonym:en:0:verb}deepen+{:synonym:es:0:verb}profundizar
{:synonym:en}least+{:synonym:es}menos+{:synonym:es}menor




Remember that the effect of relating the same word and its synonyms/antonyms/etc., in one same row/record/register in all existing languages (including typos, abbreviations and phrases) makes you find and search more in terms of the core concepts of the words, more than search, find and process for a specific word itself.

It's a very good basic A.I. filter for understanding and processing natural language but it needs a massive database containing ALL existing words in human kind related (one same word in all languages===one database record).

Why hasn't even Google released such a vital language database?

_________________
Live PC 1: Image Live PC 2: Image

YouTube:
http://youtube.com/@AltComp126/streams
http://youtube.com/@proyectos/streams

http://master.dl.sourceforge.net/projec ... 7z?viasf=1


Top
 Profile  
 
 Post subject: Re: Implementing non-English language in OS
PostPosted: Sun May 15, 2016 2:28 am 
Offline
Member
Member
User avatar

Joined: Mon Mar 05, 2012 11:23 am
Posts: 616
Location: Germany
~ wrote:
Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:

This is a simple table definition for that:
Code:
CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";

.mode tabs
.import multilanguage_words.txt multilanguage_words



[...]

Why hasn't even Google released such a vital language database?

This database structure is terrible. Putting all translations to one word in one row is exactly how you should not do it. A database must be properly normalized so you can effectively work with it, index it and search through it.

You need more than a massive database to do natural language processing. Why should Google release a giant file that is basically only a dictionary? Google has it's AI that properly translates from/to a lot of languages and always learns new stuff. Natural languages are very complex, and the algorithms to process them are as well.

_________________
Ghost OS - GitHub


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 21 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group