Research ArticleGenerative Power and Closure Properties ofWatson-Crick Grammars
Nurul Liyana Mohamad Zulkufli Sherzod Turaev Mohd Izzuddin Mohd Tamrinand Azeddine MessikhKulliyyah of Information and Communication Technology International Islamic University Malaysia 53100 Kuala Lumpur Malaysia
Correspondence should be addressed to Nurul Liyana Mohamad Zulkufli nurulliyanazulkufligmailcom
Received 27 November 2015 Revised 19 February 2016 Accepted 2 June 2016
Academic Editor Ryotaro Kamimura
Copyright copy 2016 Nurul Liyana Mohamad Zulkufli et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
WedefineWKlinear grammars as an extension ofWKregular grammarswith linear grammar rules andWKcontext-free grammarsthus investigating their computational power and closure propertiesWe show thatWK linear grammars can generate some context-sensitive languages Moreover we demonstrate that the family of WK regular languages is the proper subset of the family of WKlinear languages but it is not comparable with the family of linear languages We also establish that the Watson-Crick regulargrammars are closed under almost all of the main closure operations
1 Introduction
DNAcomputing appears as a challenge to design new types ofcomputing devices which differ from classical counterpartsin fundamental way to solve wide spectrum of computa-tionally intractable problems DNA (deoxyribonucleic acid)is double-stranded chain of nucleotides which differ by theirchemical bases that are adenine (A) guanine (G) cytosine (C)and thymine (T) and they are paired as A-T C-G according totheWatson-Crick complementary as it is illustrated in Figure 1[1] The massive parallelism another fundamental feature ofDNAmolecules allows performing millions of cut and pasteoperations simultaneously on DNA strands until a completeset of new DNA strands performing are generatedThese twofeatures give high hope for the use of DNA molecules andDNA based biooperations to develop powerful computingparadigms and devices
Since a DNA strand can be interpreted as a double strandsequence of symbols the DNA replication and synthesizeprocesses can be modeled using methods and techniquesof formal language theory Watson-Crick (WK) automata[2] one of the recent computational models abstractingthe properties of DNA molecules are finite automata withtwo reading heads working on complete double-strandedsequences where characters on corresponding positions from
the two strands of the input are related by a complementarityrelation similar to the Watson-Crick complementarity ofDNA nucleotides The two strands of the input are separatelyread from left to right by heads controlled by a common stateSeveral variants have been introduced and studied in recentpapers [3ndash7]
WK regular grammars [8] a grammar counterpart ofWK automata generate double-stranded strings related bya complementarity relation as in a WK automaton but userules as in a regular grammar The approach of using formalgrammars in the study of biological and computationalproperties of DNA molecules by formal grammars is a newdirection in the field of DNA computing we can introducepowerful variants of WK grammars such as WK linear WKcontext-free and WK regulated grammars and use them inthe investigation of the properties of DNA structures andalso inDNA applications in food authentication gene diseasedetection and so forth In this paper we introduce WKlinear grammars and study the generative capacity in therelationship of Chomsky grammars
Further as a motivation we show synthesis processes forinstance in DNA replication (Figure 2) can be simulated byderivations inWK grammarsThe replication of DNA beginsat the origin(s)The double strand then separated by proteinsproducing bubble-like shape(s)The synthesis of new strands
Hindawi Publishing CorporationApplied Computational Intelligence and So ComputingVolume 2016 Article ID 9481971 12 pageshttpdxdoiorg10115520169481971
2 Applied Computational Intelligence and Soft Computing
N
N N-H
N
O
N-HH
H-N
O
N
H
N
N-H
N
N
N
H
N
O
O
H-N
N
Adenine (A)
Guanine (G)
Thymine (T)
Cytosine (C)
CH3
Figure 1 The structure of DNA strand
Lagging strand
DNA pol I DNA ligaseDNA pol III
DNA pol III
Primer
Primase
Helicase
Parental DNA
OverviewOrigin of replication
Lagging strand Leading strandLeading strand
Parental DNA
5998400
3998400
5998400
3998400
5998400
39984005
998400
3998400
5998400
3998400
Figure 2 Synthesis process in bacterial DNA replication
using the parental strands as templates starts from the originsand proceeds in the 51015840 to 31015840 direction of both strands [1]
This synthesis process in general can be seen as a stringgeneration The enzymes responsible for the synthesizingDNA polymerases cannot initiate the process by themselvesbut can only add nucleotides to an existing RNA chain Thischain is called primer which is produced by the enzymeprimase From the grammar perspective the primase canbe interpreted as the start symbol 119878 After the primer hasbeen connected to the parental strand one of the synthesizingenzymes called DNA polymerase III continues to addnucleotides one by one to the primer and are complementedwith the parental strandThe synthesis finishes with replacingRNA primer with DNA nucleotides using the enzyme DNApolymerase I and joining with DNA ligase Again from thegrammar perspective DNA polymerase I and polymerase IIIact as production rules in the grammar specially DNA ligaseresembles to a terminal production (see Figure 3)
The paper is organized as follows In Section 2 we givesome notions and definitions from the theories of formallanguages and DNA computing needed in the sequel InSection 3 we define WK grammars and languages generated
AwS
Figure 3 The simulation of a synthesis process with a derivation
by these grammars Section 4 is devoted to the study ofthe generative capacity of WK regular and linear grammarsIn Section 5 we investigate the closure properties of WKgrammars Furthermore we show the application of WKgrammars in the analyses of DNA structures and program-ming language structures in Section 6 As the conclusion wediscuss open problems and interesting future research topicsrelated to WK grammars in Section 7
2 Preliminaries
We assume readers are familiar with formal languages theoryand automata Readers are referred to [3 9ndash11]
Applied Computational Intelligence and Soft Computing 3
Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast
Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871
1cup 1198712 is the
set of strings including the elements contained in both sets of1198711and 119871
2 The concatenation of two languages is yielded by
lining two strings from both languages which is shown by
11987111198712= 119886119887 | 119886 isin 119871
1 119887 isin 119871
2 (1)
The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886
11198862sdot sdot sdot 119886119899is
119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is
119871119877= 119886119877| 119886 isin 119871 (2)
A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582
and 119904(1198861 1198862) = 119904(119886
1)119904(1198862) for 119886
1 1198862isin Σlowast The substitution
for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1
A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)
lowast119873(119873 cup 119879)
lowasttimes (119873 cup 119879)
lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when
119909 = 11988611205721198862
119910 = 11988611205731198862
(3)
for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by
119871 (119866) = 119908 isin 119879lowast 119878 997904rArr
lowast119908 (4)
According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called
(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906
1 1199062 isin (119873 cup 119879)
lowastand 119906 isin (119873 cup 119879)+
(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast
(iii) linear if each production has the form 119860 rarr 11990611198611199062or
119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879
lowast
(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars
are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]
Theorem 1 (Chomsky hierarchy) Consider
119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)
We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902
0 119865 120575) where 119876
is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set
of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])
Next we cite some basic definitions and results ofWatson-Crick automata
The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩
Watson-Crick domain is the set of well-formed double-stranded strings (molecules)
WK120588(119881) = [
119881
119881]
lowast
120588
= [119886
119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)
WK+120588(119881) holds the similar meaning without including 120582 We
write
[1198861
1198871
] [1198862
1198872
] sdot sdot sdot [119886119899
119887119899
] isinWK120588
(7)
as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the
lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in
the upper strand are complemented and have the same lengthwith the lower strand
⟨119906
V⟩ = [
119906
V] (8)
A Watson-Crick finite automaton (WKFA) is 6-tuple
119872 = (119876119881 1199020 119865 120575 120588) (9)
4 Applied Computational Intelligence and Soft Computing
where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the
transition function 120575 is
120575 119876 times ⟨119881lowast
119881lowast⟩ 997888rarr 2
119876 (10)
where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902
2isin 120575(1199021 ⟨119906V⟩) as a rewriting
rule in grammars that is
1199021⟨119906
V⟩ 997888rarr ⟨
119906
V⟩1199022 (11)
We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is
119871 (119872) = 119906 [119906
V] isinWK
120588(119881) 119902
0[119906
V]
997888rarrlowast[119906
V] 119902 where 119902 isin 119865
(12)
The family of languages accepted is indicated by WKFA It isshown in [10 12] that
REG subWKFA sub CS (13)
3 Definitions
In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars
Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called
(i) regular if each production has the form
119860 997888rarr ⟨119906
V⟩119861
or 119860 997888rarr ⟨119906V⟩
(14)
where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨119906V⟩
(15)
where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin
⟨119879lowast119879lowast⟩
(iii) context-free if each production has the form
119860 997888rarr 120572 (16)
where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast
Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩⟨1199063
V3
⟩119861⟨1199064
V4
⟩⟨1199062
V2
⟩
or 119910 = ⟨1199061V1
⟩⟨119906
V⟩⟨1199062
V2
⟩
(17)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and
119860 997888rarr ⟨1199063
V3
⟩119861⟨1199064
V4
⟩
119860 997888rarr ⟨119906
V⟩ isin 119875
(18)
Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879
lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩120572⟨1199062
V2
⟩
(19)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin
119875
Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars
Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as
119871 (119866) = 119906 [119906
V] isinWK
120588(119879) 119878 997904rArr
lowast[119906
V] (20)
4 Generative Power ofWatson-Crick Grammars
In this section we establish results regarding the computa-tional power of WK grammars
41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)
Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨1199061V1
⟩
(21)
where |119906119894| le 1 |V
119894| le 1 119894 = 1 2 and 119860 119861 isin 119873
Applied Computational Intelligence and Soft Computing 5
Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form
Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let
119860 997888rarr ⟨
11988611sdot sdot sdot 11988611198981
11988711sdot sdot sdot 11988711198991
⟩119861⟨
11988621198982
sdot sdot sdot 11988621
11988721198992
sdot sdot sdot 11988721
⟩ (22)
be a production in 119875 where 1198981gt 1 119898
2gt 1 119899
1gt 1 or
1198992gt 1 Without loss of generality we assume that 119898
1ge 1198991
and1198982ge 1198992Then we define the following sequence of right-
linear and left-linear production rules
119860 997888rarr ⟨11988611
11988711
⟩119860119903
11
119860119903
11997888rarr ⟨
11988612
11988712
⟩119860119903
12
119860119903
11198991minus1997888rarr ⟨
11988611198991
11988711198991
⟩119860119903
11198991
119860119903
11198991
997888rarr ⟨
11988611198991+1
120582⟩119860119903
11198991+1
119860119903
11198981minus1997888rarr ⟨
11988611198981
120582⟩119860119903
11198981
119860119903
11198981
997888rarr 119860119903
21⟨11988621
11988721
⟩
119860119903
21997888rarr 119860
119903
22⟨11988622
11988722
⟩
119860119903
11198992minus1997888rarr 119860
119903
21198992
⟨
11988621198992
11988721198992
⟩
119860119903
21198992
997888rarr 119860119903
21198992+1⟨
11988621198992+1
120582⟩
119860119903
21198982minus1997888rarr ⟨
11988621198982
120582⟩
(23)
where 1198601199031119895 1 le 119895 le 119898
1 and 119860119903
2119895 1 le 119895 le 119898
2minus 1 are new
nonterminals
Let
119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898
11988711198872sdot sdot sdot 119887119899
⟩ isin 119875 (24)
where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules
119860 997888rarr ⟨1198861
1198871
⟩119860119903
1
119860119903
1997888rarr ⟨
1198862
1198872
⟩119860119903
2
119860119903
119899minus1997888rarr ⟨
119886119899
119887119899
⟩119860119903
119899
119860119903
119899997888rarr ⟨
119886119899+1
120582⟩119860119903
119899+1
119860119903
119898minus1997888rarr ⟨
119886119898
120582⟩
(25)
where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals
We construct a WK linear grammar 1198661015840 = (119873 cup
1198731015840 119879 120588 119878 119875 cup 119875
1015840) where 1198751015840 consists of productions defined
above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906
1| gt 1
|V1| gt 1 |119906
2| gt 1 or |V
2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with
|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)
42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars
Lemma 9 The following inclusions hold
119882119870119877119864119866 sube 119882119870119871119868119873
119871119868119873 sube 119882119870119871119868119873
119862119865 sube 119882119870119862119865
(26)
Next we show that WK grammars can generate non-context-free languages
119886119899119888119899119887119899 119899 ge 1
119886119899119887119898119888119899119889119898 119899 119898 ge 1
119908119888119908 119908 isin 119886 119887lowast
(27)
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 Applied Computational Intelligence and Soft Computing
N
N N-H
N
O
N-HH
H-N
O
N
H
N
N-H
N
N
N
H
N
O
O
H-N
N
Adenine (A)
Guanine (G)
Thymine (T)
Cytosine (C)
CH3
Figure 1 The structure of DNA strand
Lagging strand
DNA pol I DNA ligaseDNA pol III
DNA pol III
Primer
Primase
Helicase
Parental DNA
OverviewOrigin of replication
Lagging strand Leading strandLeading strand
Parental DNA
5998400
3998400
5998400
3998400
5998400
39984005
998400
3998400
5998400
3998400
Figure 2 Synthesis process in bacterial DNA replication
using the parental strands as templates starts from the originsand proceeds in the 51015840 to 31015840 direction of both strands [1]
This synthesis process in general can be seen as a stringgeneration The enzymes responsible for the synthesizingDNA polymerases cannot initiate the process by themselvesbut can only add nucleotides to an existing RNA chain Thischain is called primer which is produced by the enzymeprimase From the grammar perspective the primase canbe interpreted as the start symbol 119878 After the primer hasbeen connected to the parental strand one of the synthesizingenzymes called DNA polymerase III continues to addnucleotides one by one to the primer and are complementedwith the parental strandThe synthesis finishes with replacingRNA primer with DNA nucleotides using the enzyme DNApolymerase I and joining with DNA ligase Again from thegrammar perspective DNA polymerase I and polymerase IIIact as production rules in the grammar specially DNA ligaseresembles to a terminal production (see Figure 3)
The paper is organized as follows In Section 2 we givesome notions and definitions from the theories of formallanguages and DNA computing needed in the sequel InSection 3 we define WK grammars and languages generated
AwS
Figure 3 The simulation of a synthesis process with a derivation
by these grammars Section 4 is devoted to the study ofthe generative capacity of WK regular and linear grammarsIn Section 5 we investigate the closure properties of WKgrammars Furthermore we show the application of WKgrammars in the analyses of DNA structures and program-ming language structures in Section 6 As the conclusion wediscuss open problems and interesting future research topicsrelated to WK grammars in Section 7
2 Preliminaries
We assume readers are familiar with formal languages theoryand automata Readers are referred to [3 9ndash11]
Applied Computational Intelligence and Soft Computing 3
Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast
Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871
1cup 1198712 is the
set of strings including the elements contained in both sets of1198711and 119871
2 The concatenation of two languages is yielded by
lining two strings from both languages which is shown by
11987111198712= 119886119887 | 119886 isin 119871
1 119887 isin 119871
2 (1)
The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886
11198862sdot sdot sdot 119886119899is
119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is
119871119877= 119886119877| 119886 isin 119871 (2)
A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582
and 119904(1198861 1198862) = 119904(119886
1)119904(1198862) for 119886
1 1198862isin Σlowast The substitution
for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1
A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)
lowast119873(119873 cup 119879)
lowasttimes (119873 cup 119879)
lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when
119909 = 11988611205721198862
119910 = 11988611205731198862
(3)
for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by
119871 (119866) = 119908 isin 119879lowast 119878 997904rArr
lowast119908 (4)
According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called
(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906
1 1199062 isin (119873 cup 119879)
lowastand 119906 isin (119873 cup 119879)+
(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast
(iii) linear if each production has the form 119860 rarr 11990611198611199062or
119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879
lowast
(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars
are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]
Theorem 1 (Chomsky hierarchy) Consider
119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)
We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902
0 119865 120575) where 119876
is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set
of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])
Next we cite some basic definitions and results ofWatson-Crick automata
The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩
Watson-Crick domain is the set of well-formed double-stranded strings (molecules)
WK120588(119881) = [
119881
119881]
lowast
120588
= [119886
119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)
WK+120588(119881) holds the similar meaning without including 120582 We
write
[1198861
1198871
] [1198862
1198872
] sdot sdot sdot [119886119899
119887119899
] isinWK120588
(7)
as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the
lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in
the upper strand are complemented and have the same lengthwith the lower strand
⟨119906
V⟩ = [
119906
V] (8)
A Watson-Crick finite automaton (WKFA) is 6-tuple
119872 = (119876119881 1199020 119865 120575 120588) (9)
4 Applied Computational Intelligence and Soft Computing
where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the
transition function 120575 is
120575 119876 times ⟨119881lowast
119881lowast⟩ 997888rarr 2
119876 (10)
where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902
2isin 120575(1199021 ⟨119906V⟩) as a rewriting
rule in grammars that is
1199021⟨119906
V⟩ 997888rarr ⟨
119906
V⟩1199022 (11)
We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is
119871 (119872) = 119906 [119906
V] isinWK
120588(119881) 119902
0[119906
V]
997888rarrlowast[119906
V] 119902 where 119902 isin 119865
(12)
The family of languages accepted is indicated by WKFA It isshown in [10 12] that
REG subWKFA sub CS (13)
3 Definitions
In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars
Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called
(i) regular if each production has the form
119860 997888rarr ⟨119906
V⟩119861
or 119860 997888rarr ⟨119906V⟩
(14)
where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨119906V⟩
(15)
where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin
⟨119879lowast119879lowast⟩
(iii) context-free if each production has the form
119860 997888rarr 120572 (16)
where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast
Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩⟨1199063
V3
⟩119861⟨1199064
V4
⟩⟨1199062
V2
⟩
or 119910 = ⟨1199061V1
⟩⟨119906
V⟩⟨1199062
V2
⟩
(17)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and
119860 997888rarr ⟨1199063
V3
⟩119861⟨1199064
V4
⟩
119860 997888rarr ⟨119906
V⟩ isin 119875
(18)
Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879
lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩120572⟨1199062
V2
⟩
(19)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin
119875
Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars
Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as
119871 (119866) = 119906 [119906
V] isinWK
120588(119879) 119878 997904rArr
lowast[119906
V] (20)
4 Generative Power ofWatson-Crick Grammars
In this section we establish results regarding the computa-tional power of WK grammars
41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)
Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨1199061V1
⟩
(21)
where |119906119894| le 1 |V
119894| le 1 119894 = 1 2 and 119860 119861 isin 119873
Applied Computational Intelligence and Soft Computing 5
Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form
Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let
119860 997888rarr ⟨
11988611sdot sdot sdot 11988611198981
11988711sdot sdot sdot 11988711198991
⟩119861⟨
11988621198982
sdot sdot sdot 11988621
11988721198992
sdot sdot sdot 11988721
⟩ (22)
be a production in 119875 where 1198981gt 1 119898
2gt 1 119899
1gt 1 or
1198992gt 1 Without loss of generality we assume that 119898
1ge 1198991
and1198982ge 1198992Then we define the following sequence of right-
linear and left-linear production rules
119860 997888rarr ⟨11988611
11988711
⟩119860119903
11
119860119903
11997888rarr ⟨
11988612
11988712
⟩119860119903
12
119860119903
11198991minus1997888rarr ⟨
11988611198991
11988711198991
⟩119860119903
11198991
119860119903
11198991
997888rarr ⟨
11988611198991+1
120582⟩119860119903
11198991+1
119860119903
11198981minus1997888rarr ⟨
11988611198981
120582⟩119860119903
11198981
119860119903
11198981
997888rarr 119860119903
21⟨11988621
11988721
⟩
119860119903
21997888rarr 119860
119903
22⟨11988622
11988722
⟩
119860119903
11198992minus1997888rarr 119860
119903
21198992
⟨
11988621198992
11988721198992
⟩
119860119903
21198992
997888rarr 119860119903
21198992+1⟨
11988621198992+1
120582⟩
119860119903
21198982minus1997888rarr ⟨
11988621198982
120582⟩
(23)
where 1198601199031119895 1 le 119895 le 119898
1 and 119860119903
2119895 1 le 119895 le 119898
2minus 1 are new
nonterminals
Let
119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898
11988711198872sdot sdot sdot 119887119899
⟩ isin 119875 (24)
where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules
119860 997888rarr ⟨1198861
1198871
⟩119860119903
1
119860119903
1997888rarr ⟨
1198862
1198872
⟩119860119903
2
119860119903
119899minus1997888rarr ⟨
119886119899
119887119899
⟩119860119903
119899
119860119903
119899997888rarr ⟨
119886119899+1
120582⟩119860119903
119899+1
119860119903
119898minus1997888rarr ⟨
119886119898
120582⟩
(25)
where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals
We construct a WK linear grammar 1198661015840 = (119873 cup
1198731015840 119879 120588 119878 119875 cup 119875
1015840) where 1198751015840 consists of productions defined
above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906
1| gt 1
|V1| gt 1 |119906
2| gt 1 or |V
2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with
|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)
42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars
Lemma 9 The following inclusions hold
119882119870119877119864119866 sube 119882119870119871119868119873
119871119868119873 sube 119882119870119871119868119873
119862119865 sube 119882119870119862119865
(26)
Next we show that WK grammars can generate non-context-free languages
119886119899119888119899119887119899 119899 ge 1
119886119899119887119898119888119899119889119898 119899 119898 ge 1
119908119888119908 119908 isin 119886 119887lowast
(27)
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing 3
Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast
Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871
1cup 1198712 is the
set of strings including the elements contained in both sets of1198711and 119871
2 The concatenation of two languages is yielded by
lining two strings from both languages which is shown by
11987111198712= 119886119887 | 119886 isin 119871
1 119887 isin 119871
2 (1)
The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886
11198862sdot sdot sdot 119886119899is
119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is
119871119877= 119886119877| 119886 isin 119871 (2)
A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582
and 119904(1198861 1198862) = 119904(119886
1)119904(1198862) for 119886
1 1198862isin Σlowast The substitution
for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1
A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)
lowast119873(119873 cup 119879)
lowasttimes (119873 cup 119879)
lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when
119909 = 11988611205721198862
119910 = 11988611205731198862
(3)
for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by
119871 (119866) = 119908 isin 119879lowast 119878 997904rArr
lowast119908 (4)
According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called
(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906
1 1199062 isin (119873 cup 119879)
lowastand 119906 isin (119873 cup 119879)+
(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast
(iii) linear if each production has the form 119860 rarr 11990611198611199062or
119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879
lowast
(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast
A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars
are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]
Theorem 1 (Chomsky hierarchy) Consider
119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)
We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902
0 119865 120575) where 119876
is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set
of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])
Next we cite some basic definitions and results ofWatson-Crick automata
The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩
Watson-Crick domain is the set of well-formed double-stranded strings (molecules)
WK120588(119881) = [
119881
119881]
lowast
120588
= [119886
119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)
WK+120588(119881) holds the similar meaning without including 120582 We
write
[1198861
1198871
] [1198862
1198872
] sdot sdot sdot [119886119899
119887119899
] isinWK120588
(7)
as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the
lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in
the upper strand are complemented and have the same lengthwith the lower strand
⟨119906
V⟩ = [
119906
V] (8)
A Watson-Crick finite automaton (WKFA) is 6-tuple
119872 = (119876119881 1199020 119865 120575 120588) (9)
4 Applied Computational Intelligence and Soft Computing
where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the
transition function 120575 is
120575 119876 times ⟨119881lowast
119881lowast⟩ 997888rarr 2
119876 (10)
where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902
2isin 120575(1199021 ⟨119906V⟩) as a rewriting
rule in grammars that is
1199021⟨119906
V⟩ 997888rarr ⟨
119906
V⟩1199022 (11)
We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is
119871 (119872) = 119906 [119906
V] isinWK
120588(119881) 119902
0[119906
V]
997888rarrlowast[119906
V] 119902 where 119902 isin 119865
(12)
The family of languages accepted is indicated by WKFA It isshown in [10 12] that
REG subWKFA sub CS (13)
3 Definitions
In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars
Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called
(i) regular if each production has the form
119860 997888rarr ⟨119906
V⟩119861
or 119860 997888rarr ⟨119906V⟩
(14)
where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨119906V⟩
(15)
where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin
⟨119879lowast119879lowast⟩
(iii) context-free if each production has the form
119860 997888rarr 120572 (16)
where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast
Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩⟨1199063
V3
⟩119861⟨1199064
V4
⟩⟨1199062
V2
⟩
or 119910 = ⟨1199061V1
⟩⟨119906
V⟩⟨1199062
V2
⟩
(17)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and
119860 997888rarr ⟨1199063
V3
⟩119861⟨1199064
V4
⟩
119860 997888rarr ⟨119906
V⟩ isin 119875
(18)
Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879
lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩120572⟨1199062
V2
⟩
(19)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin
119875
Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars
Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as
119871 (119866) = 119906 [119906
V] isinWK
120588(119879) 119878 997904rArr
lowast[119906
V] (20)
4 Generative Power ofWatson-Crick Grammars
In this section we establish results regarding the computa-tional power of WK grammars
41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)
Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨1199061V1
⟩
(21)
where |119906119894| le 1 |V
119894| le 1 119894 = 1 2 and 119860 119861 isin 119873
Applied Computational Intelligence and Soft Computing 5
Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form
Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let
119860 997888rarr ⟨
11988611sdot sdot sdot 11988611198981
11988711sdot sdot sdot 11988711198991
⟩119861⟨
11988621198982
sdot sdot sdot 11988621
11988721198992
sdot sdot sdot 11988721
⟩ (22)
be a production in 119875 where 1198981gt 1 119898
2gt 1 119899
1gt 1 or
1198992gt 1 Without loss of generality we assume that 119898
1ge 1198991
and1198982ge 1198992Then we define the following sequence of right-
linear and left-linear production rules
119860 997888rarr ⟨11988611
11988711
⟩119860119903
11
119860119903
11997888rarr ⟨
11988612
11988712
⟩119860119903
12
119860119903
11198991minus1997888rarr ⟨
11988611198991
11988711198991
⟩119860119903
11198991
119860119903
11198991
997888rarr ⟨
11988611198991+1
120582⟩119860119903
11198991+1
119860119903
11198981minus1997888rarr ⟨
11988611198981
120582⟩119860119903
11198981
119860119903
11198981
997888rarr 119860119903
21⟨11988621
11988721
⟩
119860119903
21997888rarr 119860
119903
22⟨11988622
11988722
⟩
119860119903
11198992minus1997888rarr 119860
119903
21198992
⟨
11988621198992
11988721198992
⟩
119860119903
21198992
997888rarr 119860119903
21198992+1⟨
11988621198992+1
120582⟩
119860119903
21198982minus1997888rarr ⟨
11988621198982
120582⟩
(23)
where 1198601199031119895 1 le 119895 le 119898
1 and 119860119903
2119895 1 le 119895 le 119898
2minus 1 are new
nonterminals
Let
119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898
11988711198872sdot sdot sdot 119887119899
⟩ isin 119875 (24)
where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules
119860 997888rarr ⟨1198861
1198871
⟩119860119903
1
119860119903
1997888rarr ⟨
1198862
1198872
⟩119860119903
2
119860119903
119899minus1997888rarr ⟨
119886119899
119887119899
⟩119860119903
119899
119860119903
119899997888rarr ⟨
119886119899+1
120582⟩119860119903
119899+1
119860119903
119898minus1997888rarr ⟨
119886119898
120582⟩
(25)
where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals
We construct a WK linear grammar 1198661015840 = (119873 cup
1198731015840 119879 120588 119878 119875 cup 119875
1015840) where 1198751015840 consists of productions defined
above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906
1| gt 1
|V1| gt 1 |119906
2| gt 1 or |V
2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with
|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)
42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars
Lemma 9 The following inclusions hold
119882119870119877119864119866 sube 119882119870119871119868119873
119871119868119873 sube 119882119870119871119868119873
119862119865 sube 119882119870119862119865
(26)
Next we show that WK grammars can generate non-context-free languages
119886119899119888119899119887119899 119899 ge 1
119886119899119887119898119888119899119889119898 119899 119898 ge 1
119908119888119908 119908 isin 119886 119887lowast
(27)
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 Applied Computational Intelligence and Soft Computing
where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the
transition function 120575 is
120575 119876 times ⟨119881lowast
119881lowast⟩ 997888rarr 2
119876 (10)
where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902
2isin 120575(1199021 ⟨119906V⟩) as a rewriting
rule in grammars that is
1199021⟨119906
V⟩ 997888rarr ⟨
119906
V⟩1199022 (11)
We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is
119871 (119872) = 119906 [119906
V] isinWK
120588(119881) 119902
0[119906
V]
997888rarrlowast[119906
V] 119902 where 119902 isin 119865
(12)
The family of languages accepted is indicated by WKFA It isshown in [10 12] that
REG subWKFA sub CS (13)
3 Definitions
In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars
Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called
(i) regular if each production has the form
119860 997888rarr ⟨119906
V⟩119861
or 119860 997888rarr ⟨119906V⟩
(14)
where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨119906V⟩
(15)
where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin
⟨119879lowast119879lowast⟩
(iii) context-free if each production has the form
119860 997888rarr 120572 (16)
where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast
Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩⟨1199063
V3
⟩119861⟨1199064
V4
⟩⟨1199062
V2
⟩
or 119910 = ⟨1199061V1
⟩⟨119906
V⟩⟨1199062
V2
⟩
(17)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and
119860 997888rarr ⟨1199063
V3
⟩119861⟨1199064
V4
⟩
119860 997888rarr ⟨119906
V⟩ isin 119875
(18)
Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879
lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if
119909 = ⟨1199061
V1
⟩119860⟨1199062
V2
⟩
119910 = ⟨1199061
V1
⟩120572⟨1199062
V2
⟩
(19)
where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin
119875
Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars
Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as
119871 (119866) = 119906 [119906
V] isinWK
120588(119879) 119878 997904rArr
lowast[119906
V] (20)
4 Generative Power ofWatson-Crick Grammars
In this section we establish results regarding the computa-tional power of WK grammars
41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)
Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form
119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩
or 119860 997888rarr ⟨1199061V1
⟩
(21)
where |119906119894| le 1 |V
119894| le 1 119894 = 1 2 and 119860 119861 isin 119873
Applied Computational Intelligence and Soft Computing 5
Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form
Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let
119860 997888rarr ⟨
11988611sdot sdot sdot 11988611198981
11988711sdot sdot sdot 11988711198991
⟩119861⟨
11988621198982
sdot sdot sdot 11988621
11988721198992
sdot sdot sdot 11988721
⟩ (22)
be a production in 119875 where 1198981gt 1 119898
2gt 1 119899
1gt 1 or
1198992gt 1 Without loss of generality we assume that 119898
1ge 1198991
and1198982ge 1198992Then we define the following sequence of right-
linear and left-linear production rules
119860 997888rarr ⟨11988611
11988711
⟩119860119903
11
119860119903
11997888rarr ⟨
11988612
11988712
⟩119860119903
12
119860119903
11198991minus1997888rarr ⟨
11988611198991
11988711198991
⟩119860119903
11198991
119860119903
11198991
997888rarr ⟨
11988611198991+1
120582⟩119860119903
11198991+1
119860119903
11198981minus1997888rarr ⟨
11988611198981
120582⟩119860119903
11198981
119860119903
11198981
997888rarr 119860119903
21⟨11988621
11988721
⟩
119860119903
21997888rarr 119860
119903
22⟨11988622
11988722
⟩
119860119903
11198992minus1997888rarr 119860
119903
21198992
⟨
11988621198992
11988721198992
⟩
119860119903
21198992
997888rarr 119860119903
21198992+1⟨
11988621198992+1
120582⟩
119860119903
21198982minus1997888rarr ⟨
11988621198982
120582⟩
(23)
where 1198601199031119895 1 le 119895 le 119898
1 and 119860119903
2119895 1 le 119895 le 119898
2minus 1 are new
nonterminals
Let
119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898
11988711198872sdot sdot sdot 119887119899
⟩ isin 119875 (24)
where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules
119860 997888rarr ⟨1198861
1198871
⟩119860119903
1
119860119903
1997888rarr ⟨
1198862
1198872
⟩119860119903
2
119860119903
119899minus1997888rarr ⟨
119886119899
119887119899
⟩119860119903
119899
119860119903
119899997888rarr ⟨
119886119899+1
120582⟩119860119903
119899+1
119860119903
119898minus1997888rarr ⟨
119886119898
120582⟩
(25)
where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals
We construct a WK linear grammar 1198661015840 = (119873 cup
1198731015840 119879 120588 119878 119875 cup 119875
1015840) where 1198751015840 consists of productions defined
above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906
1| gt 1
|V1| gt 1 |119906
2| gt 1 or |V
2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with
|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)
42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars
Lemma 9 The following inclusions hold
119882119870119877119864119866 sube 119882119870119871119868119873
119871119868119873 sube 119882119870119871119868119873
119862119865 sube 119882119870119862119865
(26)
Next we show that WK grammars can generate non-context-free languages
119886119899119888119899119887119899 119899 ge 1
119886119899119887119898119888119899119889119898 119899 119898 ge 1
119908119888119908 119908 isin 119886 119887lowast
(27)
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing 5
Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form
Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let
119860 997888rarr ⟨
11988611sdot sdot sdot 11988611198981
11988711sdot sdot sdot 11988711198991
⟩119861⟨
11988621198982
sdot sdot sdot 11988621
11988721198992
sdot sdot sdot 11988721
⟩ (22)
be a production in 119875 where 1198981gt 1 119898
2gt 1 119899
1gt 1 or
1198992gt 1 Without loss of generality we assume that 119898
1ge 1198991
and1198982ge 1198992Then we define the following sequence of right-
linear and left-linear production rules
119860 997888rarr ⟨11988611
11988711
⟩119860119903
11
119860119903
11997888rarr ⟨
11988612
11988712
⟩119860119903
12
119860119903
11198991minus1997888rarr ⟨
11988611198991
11988711198991
⟩119860119903
11198991
119860119903
11198991
997888rarr ⟨
11988611198991+1
120582⟩119860119903
11198991+1
119860119903
11198981minus1997888rarr ⟨
11988611198981
120582⟩119860119903
11198981
119860119903
11198981
997888rarr 119860119903
21⟨11988621
11988721
⟩
119860119903
21997888rarr 119860
119903
22⟨11988622
11988722
⟩
119860119903
11198992minus1997888rarr 119860
119903
21198992
⟨
11988621198992
11988721198992
⟩
119860119903
21198992
997888rarr 119860119903
21198992+1⟨
11988621198992+1
120582⟩
119860119903
21198982minus1997888rarr ⟨
11988621198982
120582⟩
(23)
where 1198601199031119895 1 le 119895 le 119898
1 and 119860119903
2119895 1 le 119895 le 119898
2minus 1 are new
nonterminals
Let
119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898
11988711198872sdot sdot sdot 119887119899
⟩ isin 119875 (24)
where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules
119860 997888rarr ⟨1198861
1198871
⟩119860119903
1
119860119903
1997888rarr ⟨
1198862
1198872
⟩119860119903
2
119860119903
119899minus1997888rarr ⟨
119886119899
119887119899
⟩119860119903
119899
119860119903
119899997888rarr ⟨
119886119899+1
120582⟩119860119903
119899+1
119860119903
119898minus1997888rarr ⟨
119886119898
120582⟩
(25)
where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals
We construct a WK linear grammar 1198661015840 = (119873 cup
1198731015840 119879 120588 119878 119875 cup 119875
1015840) where 1198751015840 consists of productions defined
above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906
1| gt 1
|V1| gt 1 |119906
2| gt 1 or |V
2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with
|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)
42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars
Lemma 9 The following inclusions hold
119882119870119877119864119866 sube 119882119870119871119868119873
119871119868119873 sube 119882119870119871119868119873
119862119865 sube 119882119870119862119865
(26)
Next we show that WK grammars can generate non-context-free languages
119886119899119888119899119887119899 119899 ge 1
119886119899119887119898119888119899119889119898 119899 119898 ge 1
119908119888119908 119908 isin 119886 119887lowast
(27)
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 Applied Computational Intelligence and Soft Computing
Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)
119875 119878) be a WK linear grammar where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119887
120582⟩
119878 997888rarr ⟨119886
120582⟩119860⟨
119887
120582⟩
119860 997888rarr ⟨119888
119886⟩119860
119860 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
119888⟩119861⟨
120582
119887⟩
119861 997888rarr ⟨120582
120582⟩
(28)
In general we have the derivation
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119887119899minus1
120582⟩
997904rArr ⟨119886119899
120582⟩119860⟨
119887119899
120582⟩
997904rArrlowast⟨119886119899119888119899
119886119899⟩119860⟨
119887119899
120582⟩
997904rArr ⟨119886119899119888119899
119886119899119888⟩119861⟨
119887119899
119887⟩
997904rArrlowast⟨119886119899119888119899
119886119899119888119899⟩119861⟨
119887119899
119887119899⟩
997904rArr [119886119899119888119899119887119899
119886119899119888119899119887119899]
(29)
Thus 1198661generates the language 119871(119866
1) = 119886
119899119888119899119887119899 119899 ge 1 isin
CS minus CF
Example 11 Let
1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889
(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)
(30)
be a WK regular grammar and 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119886
120582⟩119860
119860 997888rarr ⟨119887
120582⟩119860 | ⟨
119887
120582⟩119861
119861 997888rarr ⟨119888
119886⟩119861 | ⟨
119888
119886⟩119862
119862 997888rarr ⟨119889
119887⟩119862 | ⟨
119889
119887⟩119863
119863 997888rarr ⟨120582
119888⟩119863 | ⟨
120582
119889⟩119863 | ⟨
120582
120582⟩
(31)
Then we have the following derivation for 119899119898 ge 1
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878 997904rArr ⟨
119886119899
120582⟩119860
997904rArrlowast⟨119886119899119887119898minus1
120582⟩119860 997904rArr ⟨
119886119899119887119898
120582⟩119861
997904rArrlowast⟨119886119899119887119898119888119899minus1
119886119899minus1
⟩119861 997904rArr ⟨119886119899119887119898119888119899
119886119899⟩119862
997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1
119886119899119887119898minus1
⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898
119886119899119887119898⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899⟩119863
997904rArrlowast⟨119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898⟩119863
997904rArr [119886119899119887119898119888119899119889119898
119886119899119887119898119888119899119889119898]
(32)
Hence 119871(1198662) = 119886
119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF
Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)
119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules
119878 997888rarr ⟨119886
120582⟩119878 | ⟨
119887
120582⟩119878 | ⟨
119888
120582⟩119860
119860 997888rarr ⟨119886
119886⟩119860 | ⟨
119887
119887⟩119860 | ⟨
120582
119888⟩119861
119861 997888rarr ⟨120582
119886⟩119861 | ⟨
120582
119887⟩119861 | ⟨
120582
120582⟩
(33)
By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861
continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively
119878 997904rArrlowast⟨119908
120582⟩119878 997904rArr ⟨
119908119888
120582⟩119860
997904rArrlowast⟨119908119888119908
119908⟩119860 997904rArr ⟨
119908119888119908
119908119888⟩119861
997904rArrlowast⟨119908119888119908
119908119888119908⟩119861 997904rArr [
119908119888119908
119908119888119908]
(34)
Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887
lowast isin CS minus CF
The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing 7
Theorem 13 The following inclusions hold
119871119868119873 ⊊ 119882119870119871119868119873
119882119870119877119864119866 minus 119862119865 = 0
119882119870119871119868119873 minus 119862119865 = 0
(35)
The following example shows that some WK linearlanguages cannot be generated by WK regular grammars
Lemma 14 The following language is not a WK regularlanguage
1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899
isin 119882119870119871119868119873 minus119882119870119877119864119866
(36)
Proof The language1198711can be generated by the followingWK
linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)
where 119875 consists of the rules
119878 997888rarr ⟨119886
120582⟩119878⟨
119886
119886⟩ | ⟨
119886
120582⟩119860⟨
119886
119886⟩
119860 997888rarr ⟨119887119887
119886⟩119860 | ⟨
119887119887119887
119886⟩119860 | ⟨
120582
119887⟩119861
119861 997888rarr ⟨120582
119887⟩119861 | ⟨
120582
120582⟩
(37)
It is not difficult to see that
119878 997904rArrlowast⟨119886119899minus1
120582⟩119878⟨
119886119899minus1
119886119899minus1⟩ 997904rArr ⟨
119886119899
120582⟩119860⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899⟩119860⟨
119886119899
119886119899⟩ 997904rArr ⟨
119886119899119887119898
119886119899119887⟩119861⟨
119886119899
119886119899⟩
997904rArrlowast⟨119886119899119887119898
119886119899119887119898⟩119861⟨
a119899
119886119899⟩ 997904rArr [
119886119899119887119898119886119899
119886119899119887119898119886119899]
(38)
where 2119899 le 119898 le 3119899Next we show that 119871
1notinWKREG
We suppose by contradiction that1198711can be generated by
a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and
V isin ⟨119886
120582⟩ ⟨
120582
119886⟩ ⟨
119886
119886⟩ ⟨
119887
120582⟩ ⟨
120582
119887⟩ ⟨
119887
119887⟩ ⟨
119886
119887⟩
⟨119887
119887⟩ (119873 cup 120582)
(39)
Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the
double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840
Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated
in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations
119878 997904rArrlowast⟨119886119903119887
119886119896⟩
or 119878 997904rArrlowast⟨119886119903119887
119886119903119887⟩
(40)
where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩ 119905 le 119904 (41)
Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider
119878 997904rArrlowast⟨119886119903119887
119886119896⟩997904rArr
lowast⟨119886119903119887119904
119886119903119887119905⟩997904rArr
lowast⟨119886119903119887119904119886119894
119886119903119887119904119886119895⟩ (42)
and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations
Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos
119878 997904rArrlowast⟨119886119903119887119897119886119903
119886119903⟩ (43)
In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules
Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following
Corollary 15 The following holds
119871119868119873 minus119882119870119877119864119866 = 0 (44)
43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem
Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions
5 Closure Properties
In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 Applied Computational Intelligence and Soft Computing
RE
WKCF
WKLIN
WKREG
REG
CS
CF
LIN
Figure 4 The hierarchy of WK and Chomsky language families
WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars
51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG
and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be Watson-
Crick regular grammars generating languages 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cap1198732= 0 and 119878
1and
1198782do not appear on the right-hand side of any production
rule
Lemma 17 (union) 119882119870119877119864119866 is closed under union
Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878
with 119878 notin 1198731cup 1198732 and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (45)
Then it is not difficult to see that 119871(119866) = 1198711cup 1198712
Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and
119878 = 1198781 We define
119875 = 1198752cup (1198751minus 119860 997888rarr ⟨
119906
V⟩ isin 119875
1)
cup 119860 997888rarr ⟨119906
V⟩1198782| 119860 997888rarr ⟨
119906
V⟩ isin 119875
1
(46)
Then it is obvious that 119871(119866) = 1198711sdot 1198712
Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation
Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)
with
1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)
where
119875119879= 119860 997888rarr ⟨
119906
V⟩ isin 119875 ⟨
119906
V⟩ isin ⟨
119879lowast
119879lowast⟩
119875119878= 119860 997888rarr ⟨
119906
V⟩1198781| 119860 997888rarr ⟨
119906
V⟩ isin 119875
119879
(48)
Then 119871(119866) = 119871lowast1
Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism
Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ
lowast be a finitesubstitution (homomorphism)Define119866 = (119873
1 1198791 1198781 119875 1205881)
where 119871(119866) = 119904(1198711) Without loss of generality we assume
that 1198661is in the 1-normal form Then 119875
1can contain
production rules of the forms
119860 997888rarr ⟨119886
119887⟩119861
119860 997888rarr ⟨119886
119887⟩
119860 997888rarr ⟨119886
120582⟩119861
119860 997888rarr ⟨119886
120582⟩
119860 997888rarr ⟨120582
119887⟩119861
119860 997888rarr ⟨120582
119887⟩
119860 997888rarr ⟨120582
120582⟩
(49)
We define 119875 as the set of production rules of the forms
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩119861
119860 997888rarr ⟨119904 (119886)
119904 (119887)⟩
119860 997888rarr ⟨119904 (119886)
120582⟩119861
119860 997888rarr ⟨119904 (119886)
120582⟩
119860 997888rarr ⟨120582
119904 (119887)⟩119861
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing 9
119860 997888rarr ⟨120582
119904 (119887)⟩
119860 997888rarr ⟨120582
120582⟩
(50)
Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)
Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage
Proof We show that 1198711198771isin WKREG Define 119866 = (119873
1cup
119878 1198791 119878 119875 120588
1) generating the language 119871119877
1 where 119878 is a new
nonterminal and 119875 consists of production rules defined asfollows
(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751
(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751
The results obtained from the lemmas above are summa-rized in the following theorem
Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image
This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies
52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN
Let 1198711 1198712isinWKLIN and
1198661= (1198731 119879 1198781 1198751 120588)
1198662= (1198732 119879 1198782 1198752 120588)
(51)
be Watson-Crick linear grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 23 (union) 119882119870119871119868119873 is closed under union
Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878
with 119878 notin 1198731cup 1198732and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (52)
Then 119871(119866) = 1198711cup 1198712
Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKLIN Let
ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality
we assume that 1198661is in the 1-normal form For each rule
119903 119860 997888rarr ⟨1199061
V1
⟩119861⟨1199062
V2
⟩ isin 1198751 (53)
where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩119861⟨
ℎ (1199062)
ℎ (V2)⟩ (54)
in 119875In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism
It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation
53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin
WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866
2= (1198732 119879 1198782 1198752 120588) be
Watson-Crick context-free grammars generating 1198711and 119871
2
respectively that is 1198711= 119871(119866
1) and 119871
2= 119871(119866
2) Without
loss of generality we can assume that1198731cup 1198732= 0
Lemma 26 (union) 119882119870119862119865 is closed under union operation
Proof Define the WK context-free grammar 119866 =
(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873
1cap 1198732
and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
1 cup 119878 997888rarr 119878
2 (55)
Then it is obvious that 119871(119866) = 1198711cup 1198712
Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation
Proof Define 119866 = (1198731cup 1198732cup 119878 119879
1cup 1198792 119878 119875 120588) where 119878 notin
(1198731cup 1198732) and
119875 = 1198751cup 1198752cup 119878 997888rarr 119878
11198782 (56)
Then 119871(119866) isinWKCF
Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation
Proof Define 119866 = (1198731cup 119878 119879
1 119878 119875 120588) isinWKCF setting
119875 = 1198751cup 119878 997888rarr 119878
1119878 | 120582 (57)
Then it is not difficult to see that 119871(119866) isinWKCF
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 Applied Computational Intelligence and Soft Computing
Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism
Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871
1) We
show that the homomorphism of 1198711is also in WKCF Let 119904
119879lowastrarr Σlowast be a homomorphism 119875
1contains production rules
of the form
119903 119860
997888rarr ⟨1199061
V1
⟩1198611⟨1199062
V2
⟩1198612sdot sdot sdot ⟨
119906119896
V119896
⟩119861119896⟨119906119896+1
V119896+1
⟩
(58)
where 119896 ge 0For each rule 119903 isin 119875 we construct
ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)
ℎ (V1)⟩1198611⟨ℎ (1199062)
ℎ (V2)⟩1198612
sdot sdot sdot⟨ℎ (119906119896)
ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)
ℎ (V119896+1)⟩
(59)
where 119896 ge 0In every successful derivation of 119866
1generating [119908119908] isin
[119879119879]lowast we replace production rule 119903 isin 119875
1with the
production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]
lowast Thus ℎ(119908) isin ℎ(1198711)
With the lemmas provided above in this subsection thenext theorem follows
Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism
Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven
6 Applications of Watson-Crick Grammars
In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures
61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems
The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that
the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]
In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]
Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888
119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)
In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as
(119886 119905 119888 119892lowast
119888119905119892 119886 119905 119888 119892lowast
)lowast
(61)
We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language
Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions
119878 997888rarr ⟨119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119888
119892⟩119860
119860 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119892
119888⟩ 119878 | ⟨
119905
119886⟩119861
119861 997888rarr ⟨119888
119892⟩119860 | ⟨
119886
119905⟩ 119878 | ⟨
119905
119886⟩ 119878 | ⟨
119892
119888⟩119862
119862
997888rarr ⟨119886
119905⟩119862 | ⟨
119905
119886⟩119862 | ⟨
119892
119888⟩119862 | ⟨
119888
119892⟩119860 | ⟨
120582
120582⟩
(62)
For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows
119878 997904rArrlowast⟨119909
119910⟩119878 997904rArr ⟨
119909119888
119910119892⟩119860 997904rArr ⟨
119909119888119905
119910119892119886⟩119861
997904rArr ⟨119909119888119905119892
119910119892119886119888⟩119862
997904rArrlowast⟨119909119888119905119892119909
119910119892119886119888119910⟩119860 997904rArr ⟨
119909119888119905119892119909119888
119910119892119886119888119910119892⟩119860
997904rArr ⟨119909119888119905119892119909119888119905
119910119892119886119888119910119892119886⟩119861
997904rArr ⟨119909119888119905119892119909119888119905119892
119910119892119886119888119910119892119886119888⟩119862997904rArr
lowast⟨119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910⟩119862
997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909
119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]
(63)
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing 11
where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively
Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars
62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested
Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions
Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures
Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques
To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font
Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875
5) be a WK
regular grammar 1198755consists of the rules
119878 997888rarr ⟨(120582⟩119878
119878 997888rarr ⟨(120582⟩119860
119860 997888rarr ⟨)(⟩119860
119860 997888rarr ⟨)(⟩119861
119861 997888rarr ⟨()⟩119861
119861 997888rarr ⟨120582
)⟩119861
119861 997888rarr ⟨120582
120582⟩
119861 997888rarr 119860
(64)
From this we obtain the derivation
119878 997904rArr ⟨(120582⟩119878997904rArr
lowast⟨(119899
120582⟩119860
997904rArr ⟨(119899)(⟩119860997904rArr
lowast⟨
(119899)119899
(119899⟩119861
997904rArr ⟨(119899)119899
(119899)⟩119861997904rArr
lowast⟨
(119899)119899
(119899)119899⟩119861
997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr
lowast⟨
(119899)119899(119899
(119899)119899⟩119860
997904rArr ⟨(119899)119899(119899)(119899)119899
⟩119860997904rArrlowast⟨
(119899)119899(119899)119899
(119899)119899⟩119861
997904rArrlowastsdot sdot sdot 997904rArr
lowast[((119899)119899)119896
((119899)119899)119896]
(65)
where 119896 ge 1 Hence the language obtained is
1198715= ((119899)119899)
119896
119899 119896 ge 1 isinWKREG minus CF (66)
7 Conclusion
In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that
(1) WK linear grammars can generate some context-sensitive languages
(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars
(3) the families of WK regular languages and linearlanguages are not comparable
(4) the family of WK linear languages is not comparablewith the family of context-free languages
(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages
The following problems related to the topic remain open
(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable
(2) What is the generative capacity of Watson-Crickcontext-free grammars
(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity
Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
12 Applied Computational Intelligence and Soft Computing
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program
References
[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011
[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999
[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006
[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012
[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010
[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013
[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011
[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012
[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006
[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998
[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997
[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006
[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993
[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007
[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998
[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004
[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014