Introducing akrantiain

akrantiain is a domain-specific language, developed by .sozysozbot., designed to describe phonological or orthographical rules of a language, and to execute string transformation based on the rules. The implementation can be obtained from its Github repository.

For the sake of simplicity, the explanation only deals with the case where the input is a list of words (of a natural or constructed language) and outputs a phonemic representation of the words, but akrantiain is by no means limited to that; for instance, it is even possible to receive a decimal representation of a natural number as an input and output a binary representation of it.

akrantiain first reads a file (called ".snoj file") that explains the phonological rules. It then converts the input based on the .snoj and outputs the result.

How to write .snoj files

Basics

Write the character(s) inside words before a ->, and write the corresponding phoneme(s) after the ->. Characters inside a word are to be surrounded by "s, and phonemes by /s. A sentence is terminated by a ;. The following describes a rule which says that an "s" inside a word is to be read as /z/.

"s" -> /z/;

It is also possible to allocate one phoneme to two or more letters. Also, more than one strings can be followed by ->, which can also be followed by multiple phonemes; in that case, the number of strings must equal to that of phonemes. The following describes a rule which says that an "sh" followed by "e" is read as /ʃ/ and /ə/ respectively.

"sh" "e" -> /ʃ/ /ə/;

Giving choices

A list of strings separated with |s represents a pattern that matches one of the strings in the list. When using it directly in the left-hand side of declaration, it must be surrounded by a pair of parentheses. For example, the following describes a rule which says that, when an "m" or a "b" is followed by an "a", the "m" or the "b" is read as /b/ and the "a" as /ʌ/.

("m" | "b") "a" -> /b/ /ʌ/;

Partial delegation

By writing a $, instead of a phoneme, in the right-hand side of ->, it becomes possible to not define the sound of the corresponding character but delegate that to a different rule. For example, the following rule says that "v" is read as /w/ when followed by an "i", but this rule does not define how "i" should be read; it delegates that to another rule.

"v" "i" -> /w/ $;

Thus, as the following example shows, it is also necessary to prepare a different rule that handle "i" itself.

"v" "i" -> /w/ $;
"i" -> /i/;

Defining and using variables

A pattern can be assigned to a variable by writing a variable name, followed by a =, followed by a list of strings separated with |s. This time, the patterns must NOT be surrounded by a pair of parentheses. Variable name consist of letters (both uppercase and lowercase letters), digits and underscores. However, it must not start with a digit. The following assigns, to a variable named vowel, a pattern which means "either 'a', 'e' or 'i'".

vowel = "a" | "e" | "i";

Variables can be used in the left-hand side of -> in a sentence that define the phonological rule. For example, the following first define a variable named vowel, and then defines a rule which says that a "k" is read as /ɡ/ when it follows a vowel (that is, either "a", "e" or "i").

vowel = "a" | "e" | "i";
vowel "k" -> $ /ɡ/;

Word boundary

In the left-hand side of ->, it is possible to write ^, which represents a word boundary. It is impossible to allocate phonemes to ^ itself. For example, the following defines a rule which says that, when a "z" is followed by a word boundary, (that is, when a "z" is at the end of a word,) the "z" is read as /s/. Note that the right-hand side has only one phoneme, since you cannot allocate a phoneme to ^.

"z" ^ -> /s/;

Note that, although it seems like a valid alternative, it is explicitly forbidden to represent a word boundary with a string literal containing a space character.

# "z " -> /s/; # illegal

Exclusion conditions

The left-hand side of -> can be accompanied by an "exclusion condition", which can be located at the beginning or the end of the list of strings or variables. An exclusion condition is represented by a ! followed by a string or a variable. For example, the following represents a rule which says that "au" is read as /o/ except when followed by a "t".

"au" !"t" -> /o/;

Another example is the following rule, which says that "ch" is read as /ç/ when it follows neither "a", "o", nor "u". As shown here, ! can also be followed by a parenthesized list of strings separated by |s.

!("a" | "o" | "u") "ch" -> /ç/;

It is also possible to specify exclusion conditions on both sides. The following defines a rule which says that "th" is read as /θ/ provided that it neither follows a letter that matches vowel nor it is followed by an "s" or a "z".

vowel = "a" | "e" | "i" | "o" | "u";
!vowel "th" !("s" | "z") -> /θ/;

Comments

Anything written from # till the end of the line is regarded as a comment, and will thus be ignored.

C = "s" | "t" | "k" | "f";  # This is a comment
# This whole line is a comment

Omitting semicolons

A semicolon can be omitted if it is located at the end of the line. However, when you write multiple sentences in a single line, a semicolon between two sentences cannot be omitted.

semivow = "y" | "w"; vow = "a" | "e"  # can be omitted
semivow vow -> // $  # can also be omitted
#  semivow = "y" | "w" vow = "a" | "e";  # will be a parse error

Others

Apart from what has been explained above, there are many more features such as "manually setting which characters should be regarded as punctuations" or "defining 'modules' and combining those". For more details, refer to the repository.