+ All Categories
Home > Documents > Research Article Generative Power and Closure Properties...

Research Article Generative Power and Closure Properties...

Date post: 08-Jul-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
13
Research Article Generative Power and Closure Properties of Watson-Crick Grammars Nurul Liyana Mohamad Zulkufli, Sherzod Turaev, Mohd Izzuddin Mohd Tamrin, and Azeddine Messikh Kulliyyah of Information and Communication Technology, International Islamic University Malaysia, 53100 Kuala Lumpur, Malaysia Correspondence should be addressed to Nurul Liyana Mohamad Zulkufli; nurulliyanazulkufl[email protected] Received 27 November 2015; Revised 19 February 2016; Accepted 2 June 2016 Academic Editor: Ryotaro Kamimura Copyright © 2016 Nurul Liyana Mohamad Zulkufli et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We define WK linear grammars, as an extension of WK regular grammars with linear grammar rules, and WK context-free grammars, thus investigating their computational power and closure properties. We show that WK linear grammars can generate some context- sensitive languages. Moreover, we demonstrate that the family of WK regular languages is the proper subset of the family of WK linear languages, but it is not comparable with the family of linear languages. We also establish that the Watson-Crick regular grammars are closed under almost all of the main closure operations. 1. Introduction DNA computing appears as a challenge to design new types of computing devices, which differ from classical counterparts in fundamental way, to solve wide spectrum of computa- tionally intractable problems. DNA (deoxyribonucleic acid) is double-stranded chain of nucleotides, which differ by their chemical bases that are adenine (A), guanine (G), cytosine (C), and thymine (T), and they are paired as A-T, C-G according to the Watson-Crick complementary as it is illustrated in Figure 1 [1]. e massive parallelism, another fundamental feature of DNA molecules, allows performing millions of cut and paste operations simultaneously on DNA strands until a complete set of new DNA strands performing are generated. ese two features give high hope for the use of DNA molecules and DNA based biooperations to develop powerful computing paradigms and devices. Since a DNA strand can be interpreted as a double strand sequence of symbols, the DNA replication and synthesize processes can be modeled using methods and techniques of formal language theory. Watson-Crick (WK) automata [2], one of the recent computational models abstracting the properties of DNA molecules, are finite automata with two reading heads, working on complete double-stranded sequences where characters on corresponding positions from the two strands of the input are related by a complementarity relation similar to the Watson-Crick complementarity of DNA nucleotides. e two strands of the input are separately read from leſt to right by heads controlled by a common state. Several variants have been introduced and studied in recent papers [3–7]. WK regular grammars [8], a grammar counterpart of WK automata, generate double-stranded strings related by a complementarity relation as in a WK automaton but use rules as in a regular grammar. e approach of using formal grammars in the study of biological and computational properties of DNA molecules by formal grammars is a new direction in the field of DNA computing: we can introduce powerful variants of WK grammars, such as WK linear, WK context-free, and WK regulated grammars, and use them in the investigation of the properties of DNA structures and also in DNA applications in food authentication, gene disease detection, and so forth. In this paper, we introduce WK linear grammars and study the generative capacity in the relationship of Chomsky grammars. Further, as a motivation, we show synthesis processes, for instance, in DNA replication (Figure 2) can be simulated by derivations in WK grammars. e replication of DNA begins at the origin(s). e double strand then separated by proteins, producing bubble-like shape(s). e synthesis of new strands Hindawi Publishing Corporation Applied Computational Intelligence and So Computing Volume 2016, Article ID 9481971, 12 pages http://dx.doi.org/10.1155/2016/9481971
Transcript
Page 1: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Research ArticleGenerative Power and Closure Properties ofWatson-Crick Grammars

Nurul Liyana Mohamad Zulkufli Sherzod Turaev Mohd Izzuddin Mohd Tamrinand Azeddine MessikhKulliyyah of Information and Communication Technology International Islamic University Malaysia 53100 Kuala Lumpur Malaysia

Correspondence should be addressed to Nurul Liyana Mohamad Zulkufli nurulliyanazulkufligmailcom

Received 27 November 2015 Revised 19 February 2016 Accepted 2 June 2016

Academic Editor Ryotaro Kamimura

Copyright copy 2016 Nurul Liyana Mohamad Zulkufli et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

WedefineWKlinear grammars as an extension ofWKregular grammarswith linear grammar rules andWKcontext-free grammarsthus investigating their computational power and closure propertiesWe show thatWK linear grammars can generate some context-sensitive languages Moreover we demonstrate that the family of WK regular languages is the proper subset of the family of WKlinear languages but it is not comparable with the family of linear languages We also establish that the Watson-Crick regulargrammars are closed under almost all of the main closure operations

1 Introduction

DNAcomputing appears as a challenge to design new types ofcomputing devices which differ from classical counterpartsin fundamental way to solve wide spectrum of computa-tionally intractable problems DNA (deoxyribonucleic acid)is double-stranded chain of nucleotides which differ by theirchemical bases that are adenine (A) guanine (G) cytosine (C)and thymine (T) and they are paired as A-T C-G according totheWatson-Crick complementary as it is illustrated in Figure 1[1] The massive parallelism another fundamental feature ofDNAmolecules allows performing millions of cut and pasteoperations simultaneously on DNA strands until a completeset of new DNA strands performing are generatedThese twofeatures give high hope for the use of DNA molecules andDNA based biooperations to develop powerful computingparadigms and devices

Since a DNA strand can be interpreted as a double strandsequence of symbols the DNA replication and synthesizeprocesses can be modeled using methods and techniquesof formal language theory Watson-Crick (WK) automata[2] one of the recent computational models abstractingthe properties of DNA molecules are finite automata withtwo reading heads working on complete double-strandedsequences where characters on corresponding positions from

the two strands of the input are related by a complementarityrelation similar to the Watson-Crick complementarity ofDNA nucleotides The two strands of the input are separatelyread from left to right by heads controlled by a common stateSeveral variants have been introduced and studied in recentpapers [3ndash7]

WK regular grammars [8] a grammar counterpart ofWK automata generate double-stranded strings related bya complementarity relation as in a WK automaton but userules as in a regular grammar The approach of using formalgrammars in the study of biological and computationalproperties of DNA molecules by formal grammars is a newdirection in the field of DNA computing we can introducepowerful variants of WK grammars such as WK linear WKcontext-free and WK regulated grammars and use them inthe investigation of the properties of DNA structures andalso inDNA applications in food authentication gene diseasedetection and so forth In this paper we introduce WKlinear grammars and study the generative capacity in therelationship of Chomsky grammars

Further as a motivation we show synthesis processes forinstance in DNA replication (Figure 2) can be simulated byderivations inWK grammarsThe replication of DNA beginsat the origin(s)The double strand then separated by proteinsproducing bubble-like shape(s)The synthesis of new strands

Hindawi Publishing CorporationApplied Computational Intelligence and So ComputingVolume 2016 Article ID 9481971 12 pageshttpdxdoiorg10115520169481971

2 Applied Computational Intelligence and Soft Computing

N

N N-H

N

O

N-HH

H-N

O

N

H

N

N-H

N

N

N

H

N

O

O

H-N

N

Adenine (A)

Guanine (G)

Thymine (T)

Cytosine (C)

CH3

Figure 1 The structure of DNA strand

Lagging strand

DNA pol I DNA ligaseDNA pol III

DNA pol III

Primer

Primase

Helicase

Parental DNA

OverviewOrigin of replication

Lagging strand Leading strandLeading strand

Parental DNA

5998400

3998400

5998400

3998400

5998400

39984005

998400

3998400

5998400

3998400

Figure 2 Synthesis process in bacterial DNA replication

using the parental strands as templates starts from the originsand proceeds in the 51015840 to 31015840 direction of both strands [1]

This synthesis process in general can be seen as a stringgeneration The enzymes responsible for the synthesizingDNA polymerases cannot initiate the process by themselvesbut can only add nucleotides to an existing RNA chain Thischain is called primer which is produced by the enzymeprimase From the grammar perspective the primase canbe interpreted as the start symbol 119878 After the primer hasbeen connected to the parental strand one of the synthesizingenzymes called DNA polymerase III continues to addnucleotides one by one to the primer and are complementedwith the parental strandThe synthesis finishes with replacingRNA primer with DNA nucleotides using the enzyme DNApolymerase I and joining with DNA ligase Again from thegrammar perspective DNA polymerase I and polymerase IIIact as production rules in the grammar specially DNA ligaseresembles to a terminal production (see Figure 3)

The paper is organized as follows In Section 2 we givesome notions and definitions from the theories of formallanguages and DNA computing needed in the sequel InSection 3 we define WK grammars and languages generated

AwS

Figure 3 The simulation of a synthesis process with a derivation

by these grammars Section 4 is devoted to the study ofthe generative capacity of WK regular and linear grammarsIn Section 5 we investigate the closure properties of WKgrammars Furthermore we show the application of WKgrammars in the analyses of DNA structures and program-ming language structures in Section 6 As the conclusion wediscuss open problems and interesting future research topicsrelated to WK grammars in Section 7

2 Preliminaries

We assume readers are familiar with formal languages theoryand automata Readers are referred to [3 9ndash11]

Applied Computational Intelligence and Soft Computing 3

Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast

Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871

1cup 1198712 is the

set of strings including the elements contained in both sets of1198711and 119871

2 The concatenation of two languages is yielded by

lining two strings from both languages which is shown by

11987111198712= 119886119887 | 119886 isin 119871

1 119887 isin 119871

2 (1)

The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886

11198862sdot sdot sdot 119886119899is

119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is

119871119877= 119886119877| 119886 isin 119871 (2)

A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582

and 119904(1198861 1198862) = 119904(119886

1)119904(1198862) for 119886

1 1198862isin Σlowast The substitution

for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1

A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)

lowast119873(119873 cup 119879)

lowasttimes (119873 cup 119879)

lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when

119909 = 11988611205721198862

119910 = 11988611205731198862

(3)

for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by

119871 (119866) = 119908 isin 119879lowast 119878 997904rArr

lowast119908 (4)

According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called

(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906

1 1199062 isin (119873 cup 119879)

lowastand 119906 isin (119873 cup 119879)+

(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast

(iii) linear if each production has the form 119860 rarr 11990611198611199062or

119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879

lowast

(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars

are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]

Theorem 1 (Chomsky hierarchy) Consider

119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)

We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902

0 119865 120575) where 119876

is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set

of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])

Next we cite some basic definitions and results ofWatson-Crick automata

The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩

Watson-Crick domain is the set of well-formed double-stranded strings (molecules)

WK120588(119881) = [

119881

119881]

lowast

120588

= [119886

119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)

WK+120588(119881) holds the similar meaning without including 120582 We

write

[1198861

1198871

] [1198862

1198872

] sdot sdot sdot [119886119899

119887119899

] isinWK120588

(7)

as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the

lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in

the upper strand are complemented and have the same lengthwith the lower strand

⟨119906

V⟩ = [

119906

V] (8)

A Watson-Crick finite automaton (WKFA) is 6-tuple

119872 = (119876119881 1199020 119865 120575 120588) (9)

4 Applied Computational Intelligence and Soft Computing

where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the

transition function 120575 is

120575 119876 times ⟨119881lowast

119881lowast⟩ 997888rarr 2

119876 (10)

where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902

2isin 120575(1199021 ⟨119906V⟩) as a rewriting

rule in grammars that is

1199021⟨119906

V⟩ 997888rarr ⟨

119906

V⟩1199022 (11)

We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is

119871 (119872) = 119906 [119906

V] isinWK

120588(119881) 119902

0[119906

V]

997888rarrlowast[119906

V] 119902 where 119902 isin 119865

(12)

The family of languages accepted is indicated by WKFA It isshown in [10 12] that

REG subWKFA sub CS (13)

3 Definitions

In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars

Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called

(i) regular if each production has the form

119860 997888rarr ⟨119906

V⟩119861

or 119860 997888rarr ⟨119906V⟩

(14)

where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨119906V⟩

(15)

where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin

⟨119879lowast119879lowast⟩

(iii) context-free if each production has the form

119860 997888rarr 120572 (16)

where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast

Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩⟨1199063

V3

⟩119861⟨1199064

V4

⟩⟨1199062

V2

or 119910 = ⟨1199061V1

⟩⟨119906

V⟩⟨1199062

V2

(17)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and

119860 997888rarr ⟨1199063

V3

⟩119861⟨1199064

V4

119860 997888rarr ⟨119906

V⟩ isin 119875

(18)

Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879

lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩120572⟨1199062

V2

(19)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin

119875

Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars

Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as

119871 (119866) = 119906 [119906

V] isinWK

120588(119879) 119878 997904rArr

lowast[119906

V] (20)

4 Generative Power ofWatson-Crick Grammars

In this section we establish results regarding the computa-tional power of WK grammars

41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)

Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨1199061V1

(21)

where |119906119894| le 1 |V

119894| le 1 119894 = 1 2 and 119860 119861 isin 119873

Applied Computational Intelligence and Soft Computing 5

Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form

Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let

119860 997888rarr ⟨

11988611sdot sdot sdot 11988611198981

11988711sdot sdot sdot 11988711198991

⟩119861⟨

11988621198982

sdot sdot sdot 11988621

11988721198992

sdot sdot sdot 11988721

⟩ (22)

be a production in 119875 where 1198981gt 1 119898

2gt 1 119899

1gt 1 or

1198992gt 1 Without loss of generality we assume that 119898

1ge 1198991

and1198982ge 1198992Then we define the following sequence of right-

linear and left-linear production rules

119860 997888rarr ⟨11988611

11988711

⟩119860119903

11

119860119903

11997888rarr ⟨

11988612

11988712

⟩119860119903

12

119860119903

11198991minus1997888rarr ⟨

11988611198991

11988711198991

⟩119860119903

11198991

119860119903

11198991

997888rarr ⟨

11988611198991+1

120582⟩119860119903

11198991+1

119860119903

11198981minus1997888rarr ⟨

11988611198981

120582⟩119860119903

11198981

119860119903

11198981

997888rarr 119860119903

21⟨11988621

11988721

119860119903

21997888rarr 119860

119903

22⟨11988622

11988722

119860119903

11198992minus1997888rarr 119860

119903

21198992

11988621198992

11988721198992

119860119903

21198992

997888rarr 119860119903

21198992+1⟨

11988621198992+1

120582⟩

119860119903

21198982minus1997888rarr ⟨

11988621198982

120582⟩

(23)

where 1198601199031119895 1 le 119895 le 119898

1 and 119860119903

2119895 1 le 119895 le 119898

2minus 1 are new

nonterminals

Let

119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898

11988711198872sdot sdot sdot 119887119899

⟩ isin 119875 (24)

where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules

119860 997888rarr ⟨1198861

1198871

⟩119860119903

1

119860119903

1997888rarr ⟨

1198862

1198872

⟩119860119903

2

119860119903

119899minus1997888rarr ⟨

119886119899

119887119899

⟩119860119903

119899

119860119903

119899997888rarr ⟨

119886119899+1

120582⟩119860119903

119899+1

119860119903

119898minus1997888rarr ⟨

119886119898

120582⟩

(25)

where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals

We construct a WK linear grammar 1198661015840 = (119873 cup

1198731015840 119879 120588 119878 119875 cup 119875

1015840) where 1198751015840 consists of productions defined

above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906

1| gt 1

|V1| gt 1 |119906

2| gt 1 or |V

2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with

|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)

42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars

Lemma 9 The following inclusions hold

119882119870119877119864119866 sube 119882119870119871119868119873

119871119868119873 sube 119882119870119871119868119873

119862119865 sube 119882119870119862119865

(26)

Next we show that WK grammars can generate non-context-free languages

119886119899119888119899119887119899 119899 ge 1

119886119899119887119898119888119899119889119898 119899 119898 ge 1

119908119888119908 119908 isin 119886 119887lowast

(27)

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

2 Applied Computational Intelligence and Soft Computing

N

N N-H

N

O

N-HH

H-N

O

N

H

N

N-H

N

N

N

H

N

O

O

H-N

N

Adenine (A)

Guanine (G)

Thymine (T)

Cytosine (C)

CH3

Figure 1 The structure of DNA strand

Lagging strand

DNA pol I DNA ligaseDNA pol III

DNA pol III

Primer

Primase

Helicase

Parental DNA

OverviewOrigin of replication

Lagging strand Leading strandLeading strand

Parental DNA

5998400

3998400

5998400

3998400

5998400

39984005

998400

3998400

5998400

3998400

Figure 2 Synthesis process in bacterial DNA replication

using the parental strands as templates starts from the originsand proceeds in the 51015840 to 31015840 direction of both strands [1]

This synthesis process in general can be seen as a stringgeneration The enzymes responsible for the synthesizingDNA polymerases cannot initiate the process by themselvesbut can only add nucleotides to an existing RNA chain Thischain is called primer which is produced by the enzymeprimase From the grammar perspective the primase canbe interpreted as the start symbol 119878 After the primer hasbeen connected to the parental strand one of the synthesizingenzymes called DNA polymerase III continues to addnucleotides one by one to the primer and are complementedwith the parental strandThe synthesis finishes with replacingRNA primer with DNA nucleotides using the enzyme DNApolymerase I and joining with DNA ligase Again from thegrammar perspective DNA polymerase I and polymerase IIIact as production rules in the grammar specially DNA ligaseresembles to a terminal production (see Figure 3)

The paper is organized as follows In Section 2 we givesome notions and definitions from the theories of formallanguages and DNA computing needed in the sequel InSection 3 we define WK grammars and languages generated

AwS

Figure 3 The simulation of a synthesis process with a derivation

by these grammars Section 4 is devoted to the study ofthe generative capacity of WK regular and linear grammarsIn Section 5 we investigate the closure properties of WKgrammars Furthermore we show the application of WKgrammars in the analyses of DNA structures and program-ming language structures in Section 6 As the conclusion wediscuss open problems and interesting future research topicsrelated to WK grammars in Section 7

2 Preliminaries

We assume readers are familiar with formal languages theoryand automata Readers are referred to [3 9ndash11]

Applied Computational Intelligence and Soft Computing 3

Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast

Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871

1cup 1198712 is the

set of strings including the elements contained in both sets of1198711and 119871

2 The concatenation of two languages is yielded by

lining two strings from both languages which is shown by

11987111198712= 119886119887 | 119886 isin 119871

1 119887 isin 119871

2 (1)

The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886

11198862sdot sdot sdot 119886119899is

119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is

119871119877= 119886119877| 119886 isin 119871 (2)

A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582

and 119904(1198861 1198862) = 119904(119886

1)119904(1198862) for 119886

1 1198862isin Σlowast The substitution

for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1

A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)

lowast119873(119873 cup 119879)

lowasttimes (119873 cup 119879)

lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when

119909 = 11988611205721198862

119910 = 11988611205731198862

(3)

for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by

119871 (119866) = 119908 isin 119879lowast 119878 997904rArr

lowast119908 (4)

According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called

(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906

1 1199062 isin (119873 cup 119879)

lowastand 119906 isin (119873 cup 119879)+

(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast

(iii) linear if each production has the form 119860 rarr 11990611198611199062or

119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879

lowast

(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars

are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]

Theorem 1 (Chomsky hierarchy) Consider

119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)

We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902

0 119865 120575) where 119876

is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set

of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])

Next we cite some basic definitions and results ofWatson-Crick automata

The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩

Watson-Crick domain is the set of well-formed double-stranded strings (molecules)

WK120588(119881) = [

119881

119881]

lowast

120588

= [119886

119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)

WK+120588(119881) holds the similar meaning without including 120582 We

write

[1198861

1198871

] [1198862

1198872

] sdot sdot sdot [119886119899

119887119899

] isinWK120588

(7)

as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the

lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in

the upper strand are complemented and have the same lengthwith the lower strand

⟨119906

V⟩ = [

119906

V] (8)

A Watson-Crick finite automaton (WKFA) is 6-tuple

119872 = (119876119881 1199020 119865 120575 120588) (9)

4 Applied Computational Intelligence and Soft Computing

where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the

transition function 120575 is

120575 119876 times ⟨119881lowast

119881lowast⟩ 997888rarr 2

119876 (10)

where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902

2isin 120575(1199021 ⟨119906V⟩) as a rewriting

rule in grammars that is

1199021⟨119906

V⟩ 997888rarr ⟨

119906

V⟩1199022 (11)

We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is

119871 (119872) = 119906 [119906

V] isinWK

120588(119881) 119902

0[119906

V]

997888rarrlowast[119906

V] 119902 where 119902 isin 119865

(12)

The family of languages accepted is indicated by WKFA It isshown in [10 12] that

REG subWKFA sub CS (13)

3 Definitions

In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars

Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called

(i) regular if each production has the form

119860 997888rarr ⟨119906

V⟩119861

or 119860 997888rarr ⟨119906V⟩

(14)

where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨119906V⟩

(15)

where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin

⟨119879lowast119879lowast⟩

(iii) context-free if each production has the form

119860 997888rarr 120572 (16)

where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast

Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩⟨1199063

V3

⟩119861⟨1199064

V4

⟩⟨1199062

V2

or 119910 = ⟨1199061V1

⟩⟨119906

V⟩⟨1199062

V2

(17)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and

119860 997888rarr ⟨1199063

V3

⟩119861⟨1199064

V4

119860 997888rarr ⟨119906

V⟩ isin 119875

(18)

Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879

lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩120572⟨1199062

V2

(19)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin

119875

Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars

Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as

119871 (119866) = 119906 [119906

V] isinWK

120588(119879) 119878 997904rArr

lowast[119906

V] (20)

4 Generative Power ofWatson-Crick Grammars

In this section we establish results regarding the computa-tional power of WK grammars

41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)

Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨1199061V1

(21)

where |119906119894| le 1 |V

119894| le 1 119894 = 1 2 and 119860 119861 isin 119873

Applied Computational Intelligence and Soft Computing 5

Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form

Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let

119860 997888rarr ⟨

11988611sdot sdot sdot 11988611198981

11988711sdot sdot sdot 11988711198991

⟩119861⟨

11988621198982

sdot sdot sdot 11988621

11988721198992

sdot sdot sdot 11988721

⟩ (22)

be a production in 119875 where 1198981gt 1 119898

2gt 1 119899

1gt 1 or

1198992gt 1 Without loss of generality we assume that 119898

1ge 1198991

and1198982ge 1198992Then we define the following sequence of right-

linear and left-linear production rules

119860 997888rarr ⟨11988611

11988711

⟩119860119903

11

119860119903

11997888rarr ⟨

11988612

11988712

⟩119860119903

12

119860119903

11198991minus1997888rarr ⟨

11988611198991

11988711198991

⟩119860119903

11198991

119860119903

11198991

997888rarr ⟨

11988611198991+1

120582⟩119860119903

11198991+1

119860119903

11198981minus1997888rarr ⟨

11988611198981

120582⟩119860119903

11198981

119860119903

11198981

997888rarr 119860119903

21⟨11988621

11988721

119860119903

21997888rarr 119860

119903

22⟨11988622

11988722

119860119903

11198992minus1997888rarr 119860

119903

21198992

11988621198992

11988721198992

119860119903

21198992

997888rarr 119860119903

21198992+1⟨

11988621198992+1

120582⟩

119860119903

21198982minus1997888rarr ⟨

11988621198982

120582⟩

(23)

where 1198601199031119895 1 le 119895 le 119898

1 and 119860119903

2119895 1 le 119895 le 119898

2minus 1 are new

nonterminals

Let

119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898

11988711198872sdot sdot sdot 119887119899

⟩ isin 119875 (24)

where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules

119860 997888rarr ⟨1198861

1198871

⟩119860119903

1

119860119903

1997888rarr ⟨

1198862

1198872

⟩119860119903

2

119860119903

119899minus1997888rarr ⟨

119886119899

119887119899

⟩119860119903

119899

119860119903

119899997888rarr ⟨

119886119899+1

120582⟩119860119903

119899+1

119860119903

119898minus1997888rarr ⟨

119886119898

120582⟩

(25)

where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals

We construct a WK linear grammar 1198661015840 = (119873 cup

1198731015840 119879 120588 119878 119875 cup 119875

1015840) where 1198751015840 consists of productions defined

above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906

1| gt 1

|V1| gt 1 |119906

2| gt 1 or |V

2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with

|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)

42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars

Lemma 9 The following inclusions hold

119882119870119877119864119866 sube 119882119870119871119868119873

119871119868119873 sube 119882119870119871119868119873

119862119865 sube 119882119870119862119865

(26)

Next we show that WK grammars can generate non-context-free languages

119886119899119888119899119887119899 119899 ge 1

119886119899119887119898119888119899119889119898 119899 119898 ge 1

119908119888119908 119908 isin 119886 119887lowast

(27)

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Applied Computational Intelligence and Soft Computing 3

Throughout this paper we use the following notationsLet isin be the belonging relationship of an element to thecorresponding set and notin indicates its negation The symbolsube indicates the inclusion while sub notes the proper inclusionThe notations 0 |119860| and 2119860 denote the empty set thecardinality of a set 119860 and the power set of 119860 respectivelyWhen Σ is an alphabet (a finite set of symbols) the set ofall finite strings is denoted by Σlowast while Σ+ shows similarmeaningwithout including empty strings (we use 120582 for emptystring) The length of a string 119886 isin Σlowast is shown by |119886| Alanguage is a subset 119871 sube Σlowast

Next we recall some terms regarding the closure proper-ties of languages The union of two languages 119871

1cup 1198712 is the

set of strings including the elements contained in both sets of1198711and 119871

2 The concatenation of two languages is yielded by

lining two strings from both languages which is shown by

11987111198712= 119886119887 | 119886 isin 119871

1 119887 isin 119871

2 (1)

The Kleene-star closure is the closure under the Kleene lowastoperation the set of all possible strings in 119871 including theempty string The mirror image of a word 119886 = 119886

11198862sdot sdot sdot 119886119899is

119886119877= 119886119899sdot sdot sdot 11988621198861 For language 119871 its mirror image is

119871119877= 119886119877| 119886 isin 119871 (2)

A substitution is a mapping 119904 Σlowast rarr 119879lowast where 119904(120582) = 120582

and 119904(1198861 1198862) = 119904(119886

1)119904(1198862) for 119886

1 1198862isin Σlowast The substitution

for a language 119871 sube Σlowast that is 119904(119871) is the union of 119904(119910) where119910 isin 119871 A substitution 119904 is called finite if its length |119904(119886)| is finitefor each 119886 isin Σ Amorphism is a substitution where its lengthis 1

A Chomsky grammar is defined by 119866 = (119873 119879 119878 119875)where119873 is the set ofnonterminal symbols and119879 is the set of terminalsymbols and 119873 cap 119879 = 0 119878 isin 119873 is the start symbol while119875 sube (119873 cup 119879)

lowast119873(119873 cup 119879)

lowasttimes (119873 cup 119879)

lowast is the set of productionrules We write 120572 rarr 120573 indicating the rewriting process of thestrings based on the production rules (120572 120573) isin 119875 The term 119909directly derives 119910 is written as 119909 rArr 119910 when

119909 = 11988611205721198862

119910 = 11988611205731198862

(3)

for some production rules 120572 rarr 120573 isin 119875 A grammar generatesa language defined by

119871 (119866) = 119908 isin 119879lowast 119878 997904rArr

lowast119908 (4)

According to the forms of production rules grammars areclassified as follows A grammar 119866 = (119873 119879 119878 119875) is called

(i) context-sensitive if each production has the form11990611198601199062rarr 11990611199061199062 where 119860 isin 119873 119906

1 1199062 isin (119873 cup 119879)

lowastand 119906 isin (119873 cup 119879)+

(ii) context-free if each production has the form 119860 rarr 119906where 119860 isin 119873 and 119906 isin (119873 cup 119879)lowast

(iii) linear if each production has the form 119860 rarr 11990611198611199062or

119860 rarr 119906 where 119860 119861 isin 119873 and 1199061 1199062 119906 isin 119879

lowast

(iv) right-linear if each production has the form 119860 rarr 119906119861or 119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

(v) left-linear if each production has the form119860 rarr 119861119906 or119860 rarr 119906 where 119860 119861 isin 119873 and 119906 isin 119879lowast

A right-linear and left-linear grammars are called regularThe families of languages generated by these grammars

are REG LIN CF and CS respectively The families ofrecursive enumerable languages are denoted by RE while thefamilies of finite languages are denoted by FINThus the nextrelation holds [11]

Theorem 1 (Chomsky hierarchy) Consider

119865119868119873 sub 119877119864119866 sub 119871119868119873 sub 119862119865 sub 119862119878 sub 119877119864 (5)

We recall the definition of a finite automaton A finiteautomaton (FA) is a quintuple119872 = (119876119881 119902

0 119865 120575) where 119876

is the set of states 1199020isin 119876 is the initial state and 119865 sube 119876 is set

of final states Meanwhile 119881 is an alphabet and 120575 119876 times 119881 rarr2119876 is called the transition function The set (language) of allstrings accepted by 119872 is denoted by 119871(119872) We denote thefamily of languages accepted by finite automata by FA ThenFA = REG (see [11])

Next we cite some basic definitions and results ofWatson-Crick automata

The key feature of WK automata is the symmetric relationon an alphabet 119881 that is 120588 sube 119881 times 119881 In this paper forsimplicity we use the form ⟨119906V⟩ to mention the elements(119906 V) in the set of all pairs of strings 119881times119881 (which we chooseto write as [119881119881]) and instead of119881lowasttimes119881lowast we write ⟨119881lowast119881lowast⟩

Watson-Crick domain is the set of well-formed double-stranded strings (molecules)

WK120588(119881) = [

119881

119881]

lowast

120588

= [119886

119887] 119886 119887 isin 119881 (119886 119887) isin 120588 (6)

WK+120588(119881) holds the similar meaning without including 120582 We

write

[1198861

1198871

] [1198862

1198872

] sdot sdot sdot [119886119899

119887119899

] isinWK120588

(7)

as [119906V] where the upper strand is 119906 = 11988611198862sdot sdot sdot 119886119899and the

lower strand is V = 11988711198872sdot sdot sdot 119887119899 Note that when the elements in

the upper strand are complemented and have the same lengthwith the lower strand

⟨119906

V⟩ = [

119906

V] (8)

A Watson-Crick finite automaton (WKFA) is 6-tuple

119872 = (119876119881 1199020 119865 120575 120588) (9)

4 Applied Computational Intelligence and Soft Computing

where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the

transition function 120575 is

120575 119876 times ⟨119881lowast

119881lowast⟩ 997888rarr 2

119876 (10)

where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902

2isin 120575(1199021 ⟨119906V⟩) as a rewriting

rule in grammars that is

1199021⟨119906

V⟩ 997888rarr ⟨

119906

V⟩1199022 (11)

We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is

119871 (119872) = 119906 [119906

V] isinWK

120588(119881) 119902

0[119906

V]

997888rarrlowast[119906

V] 119902 where 119902 isin 119865

(12)

The family of languages accepted is indicated by WKFA It isshown in [10 12] that

REG subWKFA sub CS (13)

3 Definitions

In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars

Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called

(i) regular if each production has the form

119860 997888rarr ⟨119906

V⟩119861

or 119860 997888rarr ⟨119906V⟩

(14)

where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨119906V⟩

(15)

where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin

⟨119879lowast119879lowast⟩

(iii) context-free if each production has the form

119860 997888rarr 120572 (16)

where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast

Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩⟨1199063

V3

⟩119861⟨1199064

V4

⟩⟨1199062

V2

or 119910 = ⟨1199061V1

⟩⟨119906

V⟩⟨1199062

V2

(17)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and

119860 997888rarr ⟨1199063

V3

⟩119861⟨1199064

V4

119860 997888rarr ⟨119906

V⟩ isin 119875

(18)

Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879

lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩120572⟨1199062

V2

(19)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin

119875

Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars

Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as

119871 (119866) = 119906 [119906

V] isinWK

120588(119879) 119878 997904rArr

lowast[119906

V] (20)

4 Generative Power ofWatson-Crick Grammars

In this section we establish results regarding the computa-tional power of WK grammars

41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)

Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨1199061V1

(21)

where |119906119894| le 1 |V

119894| le 1 119894 = 1 2 and 119860 119861 isin 119873

Applied Computational Intelligence and Soft Computing 5

Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form

Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let

119860 997888rarr ⟨

11988611sdot sdot sdot 11988611198981

11988711sdot sdot sdot 11988711198991

⟩119861⟨

11988621198982

sdot sdot sdot 11988621

11988721198992

sdot sdot sdot 11988721

⟩ (22)

be a production in 119875 where 1198981gt 1 119898

2gt 1 119899

1gt 1 or

1198992gt 1 Without loss of generality we assume that 119898

1ge 1198991

and1198982ge 1198992Then we define the following sequence of right-

linear and left-linear production rules

119860 997888rarr ⟨11988611

11988711

⟩119860119903

11

119860119903

11997888rarr ⟨

11988612

11988712

⟩119860119903

12

119860119903

11198991minus1997888rarr ⟨

11988611198991

11988711198991

⟩119860119903

11198991

119860119903

11198991

997888rarr ⟨

11988611198991+1

120582⟩119860119903

11198991+1

119860119903

11198981minus1997888rarr ⟨

11988611198981

120582⟩119860119903

11198981

119860119903

11198981

997888rarr 119860119903

21⟨11988621

11988721

119860119903

21997888rarr 119860

119903

22⟨11988622

11988722

119860119903

11198992minus1997888rarr 119860

119903

21198992

11988621198992

11988721198992

119860119903

21198992

997888rarr 119860119903

21198992+1⟨

11988621198992+1

120582⟩

119860119903

21198982minus1997888rarr ⟨

11988621198982

120582⟩

(23)

where 1198601199031119895 1 le 119895 le 119898

1 and 119860119903

2119895 1 le 119895 le 119898

2minus 1 are new

nonterminals

Let

119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898

11988711198872sdot sdot sdot 119887119899

⟩ isin 119875 (24)

where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules

119860 997888rarr ⟨1198861

1198871

⟩119860119903

1

119860119903

1997888rarr ⟨

1198862

1198872

⟩119860119903

2

119860119903

119899minus1997888rarr ⟨

119886119899

119887119899

⟩119860119903

119899

119860119903

119899997888rarr ⟨

119886119899+1

120582⟩119860119903

119899+1

119860119903

119898minus1997888rarr ⟨

119886119898

120582⟩

(25)

where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals

We construct a WK linear grammar 1198661015840 = (119873 cup

1198731015840 119879 120588 119878 119875 cup 119875

1015840) where 1198751015840 consists of productions defined

above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906

1| gt 1

|V1| gt 1 |119906

2| gt 1 or |V

2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with

|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)

42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars

Lemma 9 The following inclusions hold

119882119870119877119864119866 sube 119882119870119871119868119873

119871119868119873 sube 119882119870119871119868119873

119862119865 sube 119882119870119862119865

(26)

Next we show that WK grammars can generate non-context-free languages

119886119899119888119899119887119899 119899 ge 1

119886119899119887119898119888119899119889119898 119899 119898 ge 1

119908119888119908 119908 isin 119886 119887lowast

(27)

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

4 Applied Computational Intelligence and Soft Computing

where 119876 119881 1199020 and 119865 are the same as a FA Meanwhile the

transition function 120575 is

120575 119876 times ⟨119881lowast

119881lowast⟩ 997888rarr 2

119876 (10)

where 120575(119902 ⟨119906V⟩) is not an empty set only for finitely manytriples (119902 119906 V) isin 119876times119881lowast times119881lowast Similar to FA we can write therelation in transition function 119902

2isin 120575(1199021 ⟨119906V⟩) as a rewriting

rule in grammars that is

1199021⟨119906

V⟩ 997888rarr ⟨

119906

V⟩1199022 (11)

We describe the reflexive and transitive closure ofrarr asrarrlowastThe language accepted by a WKFA119872 is

119871 (119872) = 119906 [119906

V] isinWK

120588(119881) 119902

0[119906

V]

997888rarrlowast[119906

V] 119902 where 119902 isin 119865

(12)

The family of languages accepted is indicated by WKFA It isshown in [10 12] that

REG subWKFA sub CS (13)

3 Definitions

In this section we slightly modified the definition of Watson-Crick regular grammars introduced in [8] in order to extendthe concept to linear grammars and context-free grammars

Definition 2 A Watson-Crick (WK) grammar 119866 = (119873 119879 119878119875 120588) is called

(i) regular if each production has the form

119860 997888rarr ⟨119906

V⟩119861

or 119860 997888rarr ⟨119906V⟩

(14)

where 119860 119861 isin 119873 and ⟨119906V⟩ isin ⟨119879lowast119879lowast⟩(ii) linear if each production has the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨119906V⟩

(15)

where 119860 119861 isin 119873 and ⟨1199061V1⟩ ⟨1199062V2⟩ ⟨119906V⟩ isin

⟨119879lowast119879lowast⟩

(iii) context-free if each production has the form

119860 997888rarr 120572 (16)

where 119860 isin 119873 and 120572 isin (119873 cup (⟨119879lowast119879lowast⟩))lowast

Definition 3 Let119866 = (119873 119879 120588 119878 119875) be aWK linear grammarWe say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives 119910 isin (119873 cup⟨119879lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 iff

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩⟨1199063

V3

⟩119861⟨1199064

V4

⟩⟨1199062

V2

or 119910 = ⟨1199061V1

⟩⟨119906

V⟩⟨1199062

V2

(17)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and

119860 997888rarr ⟨1199063

V3

⟩119861⟨1199064

V4

119860 997888rarr ⟨119906

V⟩ isin 119875

(18)

Definition 4 Let 119866 = (119873 119879 120588 119878 119875) be a WK context-freegrammar We say that 119909 isin (119873 cup ⟨119879lowast119879lowast⟩)lowast directly derives119910 isin (119873 cup ⟨119879

lowast119879lowast⟩)lowast denoted by 119909 rArr 119910 if and only if

119909 = ⟨1199061

V1

⟩119860⟨1199062

V2

119910 = ⟨1199061

V1

⟩120572⟨1199062

V2

(19)

where 119860 119861 isin 119873 119906119894 V119894isin ⟨119879lowast119879lowast⟩ 119894 = 1 2 3 4 and 119860 rarr 120572 isin

119875

Remark 5 We use a common notion ldquoWatson-Crick gram-marsrdquo referring to any type of WK grammars

Definition 6 The language generated by a WK grammar is aquintuple 119866 which is defined as

119871 (119866) = 119906 [119906

V] isinWK

120588(119879) 119878 997904rArr

lowast[119906

V] (20)

4 Generative Power ofWatson-Crick Grammars

In this section we establish results regarding the computa-tional power of WK grammars

41 A Normal Form for Watson-Crick Linear GrammarsNext we define 1-normal form for WK linear grammarsand show that for every WK linear grammar 119866 there is anequivalent WK linear grammar 1198661015840 in the normal form thatis 119871(119866) = 119871(1198661015840)

Definition 7 A linearWK grammar119866 = (119873 119879 120588 119875 119878) is saidto be in the 1-normal form if each rule in 119875 of the form

119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

or 119860 997888rarr ⟨1199061V1

(21)

where |119906119894| le 1 |V

119894| le 1 119894 = 1 2 and 119860 119861 isin 119873

Applied Computational Intelligence and Soft Computing 5

Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form

Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let

119860 997888rarr ⟨

11988611sdot sdot sdot 11988611198981

11988711sdot sdot sdot 11988711198991

⟩119861⟨

11988621198982

sdot sdot sdot 11988621

11988721198992

sdot sdot sdot 11988721

⟩ (22)

be a production in 119875 where 1198981gt 1 119898

2gt 1 119899

1gt 1 or

1198992gt 1 Without loss of generality we assume that 119898

1ge 1198991

and1198982ge 1198992Then we define the following sequence of right-

linear and left-linear production rules

119860 997888rarr ⟨11988611

11988711

⟩119860119903

11

119860119903

11997888rarr ⟨

11988612

11988712

⟩119860119903

12

119860119903

11198991minus1997888rarr ⟨

11988611198991

11988711198991

⟩119860119903

11198991

119860119903

11198991

997888rarr ⟨

11988611198991+1

120582⟩119860119903

11198991+1

119860119903

11198981minus1997888rarr ⟨

11988611198981

120582⟩119860119903

11198981

119860119903

11198981

997888rarr 119860119903

21⟨11988621

11988721

119860119903

21997888rarr 119860

119903

22⟨11988622

11988722

119860119903

11198992minus1997888rarr 119860

119903

21198992

11988621198992

11988721198992

119860119903

21198992

997888rarr 119860119903

21198992+1⟨

11988621198992+1

120582⟩

119860119903

21198982minus1997888rarr ⟨

11988621198982

120582⟩

(23)

where 1198601199031119895 1 le 119895 le 119898

1 and 119860119903

2119895 1 le 119895 le 119898

2minus 1 are new

nonterminals

Let

119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898

11988711198872sdot sdot sdot 119887119899

⟩ isin 119875 (24)

where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules

119860 997888rarr ⟨1198861

1198871

⟩119860119903

1

119860119903

1997888rarr ⟨

1198862

1198872

⟩119860119903

2

119860119903

119899minus1997888rarr ⟨

119886119899

119887119899

⟩119860119903

119899

119860119903

119899997888rarr ⟨

119886119899+1

120582⟩119860119903

119899+1

119860119903

119898minus1997888rarr ⟨

119886119898

120582⟩

(25)

where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals

We construct a WK linear grammar 1198661015840 = (119873 cup

1198731015840 119879 120588 119878 119875 cup 119875

1015840) where 1198751015840 consists of productions defined

above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906

1| gt 1

|V1| gt 1 |119906

2| gt 1 or |V

2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with

|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)

42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars

Lemma 9 The following inclusions hold

119882119870119877119864119866 sube 119882119870119871119868119873

119871119868119873 sube 119882119870119871119868119873

119862119865 sube 119882119870119862119865

(26)

Next we show that WK grammars can generate non-context-free languages

119886119899119888119899119887119899 119899 ge 1

119886119899119887119898119888119899119889119898 119899 119898 ge 1

119908119888119908 119908 isin 119886 119887lowast

(27)

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Applied Computational Intelligence and Soft Computing 5

Lemma 8 For every WK linear grammar 119866 there exists anequivalent WK linear grammar 1198661015840 in the 1-normal form

Proof Let 119866 = (119873 119879 120588 119878 119875) be a WK linear grammar Let

119860 997888rarr ⟨

11988611sdot sdot sdot 11988611198981

11988711sdot sdot sdot 11988711198991

⟩119861⟨

11988621198982

sdot sdot sdot 11988621

11988721198992

sdot sdot sdot 11988721

⟩ (22)

be a production in 119875 where 1198981gt 1 119898

2gt 1 119899

1gt 1 or

1198992gt 1 Without loss of generality we assume that 119898

1ge 1198991

and1198982ge 1198992Then we define the following sequence of right-

linear and left-linear production rules

119860 997888rarr ⟨11988611

11988711

⟩119860119903

11

119860119903

11997888rarr ⟨

11988612

11988712

⟩119860119903

12

119860119903

11198991minus1997888rarr ⟨

11988611198991

11988711198991

⟩119860119903

11198991

119860119903

11198991

997888rarr ⟨

11988611198991+1

120582⟩119860119903

11198991+1

119860119903

11198981minus1997888rarr ⟨

11988611198981

120582⟩119860119903

11198981

119860119903

11198981

997888rarr 119860119903

21⟨11988621

11988721

119860119903

21997888rarr 119860

119903

22⟨11988622

11988722

119860119903

11198992minus1997888rarr 119860

119903

21198992

11988621198992

11988721198992

119860119903

21198992

997888rarr 119860119903

21198992+1⟨

11988621198992+1

120582⟩

119860119903

21198982minus1997888rarr ⟨

11988621198982

120582⟩

(23)

where 1198601199031119895 1 le 119895 le 119898

1 and 119860119903

2119895 1 le 119895 le 119898

2minus 1 are new

nonterminals

Let

119903 119860 997888rarr ⟨11988611198862sdot sdot sdot 119886119898

11988711198872sdot sdot sdot 119887119899

⟩ isin 119875 (24)

where 119898 gt 1 or 119899 gt 1 Without loss of generality we assumethat 119898 ge 119899 Then we define the following sequence of right-linear production rules

119860 997888rarr ⟨1198861

1198871

⟩119860119903

1

119860119903

1997888rarr ⟨

1198862

1198872

⟩119860119903

2

119860119903

119899minus1997888rarr ⟨

119886119899

119887119899

⟩119860119903

119899

119860119903

119899997888rarr ⟨

119886119899+1

120582⟩119860119903

119899+1

119860119903

119898minus1997888rarr ⟨

119886119898

120582⟩

(25)

where 119860119903119894 1 le 119894 le 119898 minus 1 are new nonterminals

We construct a WK linear grammar 1198661015840 = (119873 cup

1198731015840 119879 120588 119878 119875 cup 119875

1015840) where 1198751015840 consists of productions defined

above for each 119860 rarr ⟨1199061V1⟩119861⟨1199062V2⟩ isin 119875 with |119906

1| gt 1

|V1| gt 1 |119906

2| gt 1 or |V

2| gt 1 and 119860 rarr ⟨119906V⟩ isin 119875 with

|119906| gt 1 or |V| gt 1 Then it is not difficult to see that in everyderivation productions in the form of (22) and (24) in 119866 canbe replaced by the sequences of productions (23) and (25) in1198661015840 and vice versa Thus 119871(119866) = 119871(1198661015840)

42TheGenerative Power The following results immediatelyfollow from the definition of WK grammars

Lemma 9 The following inclusions hold

119882119870119877119864119866 sube 119882119870119871119868119873

119871119868119873 sube 119882119870119871119868119873

119862119865 sube 119882119870119862119865

(26)

Next we show that WK grammars can generate non-context-free languages

119886119899119888119899119887119899 119899 ge 1

119886119899119887119898119888119899119889119898 119899 119898 ge 1

119908119888119908 119908 isin 119886 119887lowast

(27)

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

6 Applied Computational Intelligence and Soft Computing

Example 10 Let 1198661= (119878 119860 119861 119886 119887 119888 (119886 119886) (119887 119887) (119888 119888)

119875 119878) be a WK linear grammar where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119887

120582⟩

119878 997888rarr ⟨119886

120582⟩119860⟨

119887

120582⟩

119860 997888rarr ⟨119888

119886⟩119860

119860 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

119888⟩119861⟨

120582

119887⟩

119861 997888rarr ⟨120582

120582⟩

(28)

In general we have the derivation

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119887119899minus1

120582⟩

997904rArr ⟨119886119899

120582⟩119860⟨

119887119899

120582⟩

997904rArrlowast⟨119886119899119888119899

119886119899⟩119860⟨

119887119899

120582⟩

997904rArr ⟨119886119899119888119899

119886119899119888⟩119861⟨

119887119899

119887⟩

997904rArrlowast⟨119886119899119888119899

119886119899119888119899⟩119861⟨

119887119899

119887119899⟩

997904rArr [119886119899119888119899119887119899

119886119899119888119899119887119899]

(29)

Thus 1198661generates the language 119871(119866

1) = 119886

119899119888119899119887119899 119899 ge 1 isin

CS minus CF

Example 11 Let

1198662= (119878 119860 119861 119862119863 119886 119887 119888 119889

(119886 119886) (119887 119887) (119888 119888) (119889 119889) 119875 119878)

(30)

be a WK regular grammar and 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119886

120582⟩119860

119860 997888rarr ⟨119887

120582⟩119860 | ⟨

119887

120582⟩119861

119861 997888rarr ⟨119888

119886⟩119861 | ⟨

119888

119886⟩119862

119862 997888rarr ⟨119889

119887⟩119862 | ⟨

119889

119887⟩119863

119863 997888rarr ⟨120582

119888⟩119863 | ⟨

120582

119889⟩119863 | ⟨

120582

120582⟩

(31)

Then we have the following derivation for 119899119898 ge 1

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878 997904rArr ⟨

119886119899

120582⟩119860

997904rArrlowast⟨119886119899119887119898minus1

120582⟩119860 997904rArr ⟨

119886119899119887119898

120582⟩119861

997904rArrlowast⟨119886119899119887119898119888119899minus1

119886119899minus1

⟩119861 997904rArr ⟨119886119899119887119898119888119899

119886119899⟩119862

997904rArrlowast⟨119886119899119887119898119888119899119889119898minus1

119886119899119887119898minus1

⟩119862 997904rArr ⟨119886119899119887119898119888119899119889119898

119886119899119887119898⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899⟩119863

997904rArrlowast⟨119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898⟩119863

997904rArr [119886119899119887119898119888119899119889119898

119886119899119887119898119888119899119889119898]

(32)

Hence 119871(1198662) = 119886

119899119887119898119888119899119889119898 119899 119898 ge 1 isin CS minus CF

Example 12 Let 1198663= (119878 119860 119861 119862 119886 119887 119888 (119886 119886) (119887 119887)

119875 119878) be a WK linear grammar with 119875 consisting of thefollowing rules

119878 997888rarr ⟨119886

120582⟩119878 | ⟨

119887

120582⟩119878 | ⟨

119888

120582⟩119860

119860 997888rarr ⟨119886

119886⟩119860 | ⟨

119887

119887⟩119860 | ⟨

120582

119888⟩119861

119861 997888rarr ⟨120582

119886⟩119861 | ⟨

120582

119887⟩119861 | ⟨

120582

120582⟩

(33)

By rules 119878 rarr ⟨119886120582⟩ and 119878 rarr ⟨119887120582⟩ we obtain a sententialform ⟨119908120582⟩119878 where 119908 isin 119886 119887lowast The derivation is continuedby only possible rule 119878 rarr ⟨119888120582⟩119860 and we have ⟨119908119888120582⟩119860Further we can only apply rules 119860 rarr ⟨119886119886⟩119860 and 119860 rarr⟨119887119887⟩119860 By the symmetric relation 120588 the derivation resultsin ⟨119908119888119908119908⟩119860 Then we can only apply 119860 rarr ⟨120582119888⟩119861

continuing with rules 119861 rarr ⟨120582119886⟩119861 and 119861 rarr ⟨120582119887⟩119861 andobtain ⟨119908119888119908119908119888119908⟩119861 Finally by rule 119861 rarr ⟨120582120582⟩ we get[119908119888119908119908119888119908] Illustratively

119878 997904rArrlowast⟨119908

120582⟩119878 997904rArr ⟨

119908119888

120582⟩119860

997904rArrlowast⟨119908119888119908

119908⟩119860 997904rArr ⟨

119908119888119908

119908119888⟩119861

997904rArrlowast⟨119908119888119908

119908119888119908⟩119861 997904rArr [

119908119888119908

119908119888119908]

(34)

Thus 119871(1198663) = 119908119888119908 119908 isin 119886 119887

lowast isin CS minus CF

The following theorem follows from Lemma 9 and Exam-ples 10 11 and 12

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Applied Computational Intelligence and Soft Computing 7

Theorem 13 The following inclusions hold

119871119868119873 ⊊ 119882119870119871119868119873

119882119870119877119864119866 minus 119862119865 = 0

119882119870119871119868119873 minus 119862119865 = 0

(35)

The following example shows that some WK linearlanguages cannot be generated by WK regular grammars

Lemma 14 The following language is not a WK regularlanguage

1198711= 119886119899119887119898119886119899 2119899 le 119898 le 3119899

isin 119882119870119871119868119873 minus119882119870119877119864119866

(36)

Proof The language1198711can be generated by the followingWK

linear grammar 1198664= (119878 119860 119861 119886 119887 (119886 119886) (119887 119887) 119878 119875)

where 119875 consists of the rules

119878 997888rarr ⟨119886

120582⟩119878⟨

119886

119886⟩ | ⟨

119886

120582⟩119860⟨

119886

119886⟩

119860 997888rarr ⟨119887119887

119886⟩119860 | ⟨

119887119887119887

119886⟩119860 | ⟨

120582

119887⟩119861

119861 997888rarr ⟨120582

119887⟩119861 | ⟨

120582

120582⟩

(37)

It is not difficult to see that

119878 997904rArrlowast⟨119886119899minus1

120582⟩119878⟨

119886119899minus1

119886119899minus1⟩ 997904rArr ⟨

119886119899

120582⟩119860⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899⟩119860⟨

119886119899

119886119899⟩ 997904rArr ⟨

119886119899119887119898

119886119899119887⟩119861⟨

119886119899

119886119899⟩

997904rArrlowast⟨119886119899119887119898

119886119899119887119898⟩119861⟨

a119899

119886119899⟩ 997904rArr [

119886119899119887119898119886119899

119886119899119887119898119886119899]

(38)

where 2119899 le 119898 le 3119899Next we show that 119871

1notinWKREG

We suppose by contradiction that1198711can be generated by

a WK regular grammar 1198661015840 = (119873 119886 119887 120588 119878 119875) Without lossof generality we assume that 1198661015840 is in 1-normal form Thenfor each rule 119906 rarr V in 119875 we have 119906 isin 119873 and

V isin ⟨119886

120582⟩ ⟨

120582

119886⟩ ⟨

119886

119886⟩ ⟨

119887

120582⟩ ⟨

120582

119887⟩ ⟨

119887

119887⟩ ⟨

119886

119887⟩

⟨119887

119887⟩ (119873 cup 120582)

(39)

Let 119908 = 119886119903119887119904119886119903 be a string in 1198711such that 119903 gt |119875| Then the

double-stranded sequence [119886119903119887119904119886119903119886119903119887119904119886119903] is generated by thegrammar 1198661015840

Case 1 In any derivation for this string first 119887 can occur inthe upper (or lower) strand if 119886119903 has already been generated

in the upper (or lower) strand Thus we obtain two possiblesuccessful derivations

119878 997904rArrlowast⟨119886119903119887

119886119896⟩

or 119878 997904rArrlowast⟨119886119903119887

119886119903119887⟩

(40)

where 119896 le 119903 In the latter derivation in (40) we cannot controlthe number of occurrences of 119887 that is the derivation maynot be successful In the former derivation in (40) using thesecond strand we can generate 119887119904

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩ 119905 le 119904 (41)

Equation (41) is continued by generating 119886rsquos in the first strandand we can use the second strand to control their numberConsider

119878 997904rArrlowast⟨119886119903119887

119886119896⟩997904rArr

lowast⟨119886119903119887119904

119886119903119887119905⟩997904rArr

lowast⟨119886119903119887119904119886119894

119886119903119887119904119886119895⟩ (42)

and 119894 is related to 119904 Since 2119903 le 119904 le 3119903 generally 119894 is not thesame as 119903 for all derivations

Case 2We can control the number of 119886rsquos after 119887rsquos by using thesecond strand for 119886rsquos before 119887rsquos In this case the number of 119887rsquoscannot be related to the number of 119886rsquos

119878 997904rArrlowast⟨119886119903119887119897119886119903

119886119903⟩ (43)

In both cases we cannot control the number of 119887rsquos and thenumber of 119886rsquos after 119887rsquos at the same time using WK regularrules

Since strings 119886119899119887119898119886119899 are palindrome strings for even119898rsquosthe language 119908119908119877 119908 isin 119886 119887lowast is not in WKREG that iswe have the following

Corollary 15 The following holds

119871119868119873 minus119882119870119877119864119866 = 0 (44)

43 Hierarchy of the Families of Watson-Crick LanguagesCombining the results above we obtain the following theo-rem

Theorem 16 The relations in Figure 4 hold the dotted linesdenote incomparability of the language families and the arrowsdenote proper inclusions of the lower families into the upperfamilies while the dotted arrows denote inclusions

5 Closure Properties

In this section we establish results regarding the closureproperties of WK grammars The families of WK languagesare shown to be higher in the hierarchy than their respectiveChomsky language families thus it is interesting to see how

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

8 Applied Computational Intelligence and Soft Computing

RE

WKCF

WKLIN

WKREG

REG

CS

CF

LIN

Figure 4 The hierarchy of WK and Chomsky language families

WKgrammarswork in terms of closure properties as the onesfor the Chomsky languages Moreover researching closureproperties ofWKgrammars ensure the safety and correctnessof the results yielded when performing operations on the setsof DNA molecules generated by some WK grammars

51 Watson-Crick Regular Grammars Let 1198711 1198712isinWKREG

and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be Watson-

Crick regular grammars generating languages 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cap1198732= 0 and 119878

1and

1198782do not appear on the right-hand side of any production

rule

Lemma 17 (union) 119882119870119877119864119866 is closed under union

Proof Define119866 = (119873 119879 119878 119875 120588) by setting119873 = 1198731cup1198732cup119878

with 119878 notin 1198731cup 1198732 and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (45)

Then it is not difficult to see that 119871(119866) = 1198711cup 1198712

Lemma 18 (concatenation) 119882119870119877119864119866 is closed under concate-nation

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732and

119878 = 1198781 We define

119875 = 1198752cup (1198751minus 119860 997888rarr ⟨

119906

V⟩ isin 119875

1)

cup 119860 997888rarr ⟨119906

V⟩1198782| 119860 997888rarr ⟨

119906

V⟩ isin 119875

1

(46)

Then it is obvious that 119871(119866) = 1198711sdot 1198712

Lemma 19 (Kleene-star) 119882119870119877119864119866 is closed under Kleene-star operation

Proof Define the WK regular grammar 119866 = (1198731 1198791 119878 1198751 120588)

with

1198751= (1198751minus 119875119879) cup 119875119878cup 1198781997888rarr 120582 (47)

where

119875119879= 119860 997888rarr ⟨

119906

V⟩ isin 119875 ⟨

119906

V⟩ isin ⟨

119879lowast

119879lowast⟩

119875119878= 119860 997888rarr ⟨

119906

V⟩1198781| 119860 997888rarr ⟨

119906

V⟩ isin 119875

119879

(48)

Then 119871(119866) = 119871lowast1

Lemma 20 (finite substitution and homomorphism)119882119870119877119864119866 is closed under finite substitution and homomor-phism

Proof We show that the finite substitution (homomorphism)of 1198711is also in WKREG Let 119904 119879lowast rarr Σ

lowast be a finitesubstitution (homomorphism)Define119866 = (119873

1 1198791 1198781 119875 1205881)

where 119871(119866) = 119904(1198711) Without loss of generality we assume

that 1198661is in the 1-normal form Then 119875

1can contain

production rules of the forms

119860 997888rarr ⟨119886

119887⟩119861

119860 997888rarr ⟨119886

119887⟩

119860 997888rarr ⟨119886

120582⟩119861

119860 997888rarr ⟨119886

120582⟩

119860 997888rarr ⟨120582

119887⟩119861

119860 997888rarr ⟨120582

119887⟩

119860 997888rarr ⟨120582

120582⟩

(49)

We define 119875 as the set of production rules of the forms

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩119861

119860 997888rarr ⟨119904 (119886)

119904 (119887)⟩

119860 997888rarr ⟨119904 (119886)

120582⟩119861

119860 997888rarr ⟨119904 (119886)

120582⟩

119860 997888rarr ⟨120582

119904 (119887)⟩119861

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Applied Computational Intelligence and Soft Computing 9

119860 997888rarr ⟨120582

119904 (119887)⟩

119860 997888rarr ⟨120582

120582⟩

(50)

Since the substitutionhomomorphism 119904 is finite 119875 is afinite set too that is 119866 is a WK regular grammar and 119871(119866) =119904(1198711)

Lemma 21 (mirror image) 119882119870119877119864119866 is closed under mirrorimage

Proof We show that 1198711198771isin WKREG Define 119866 = (119873

1cup

119878 1198791 119878 119875 120588

1) generating the language 119871119877

1 where 119878 is a new

nonterminal and 119875 consists of production rules defined asfollows

(i) 119860 rarr 119861⟨119906V⟩ where 119860 rarr ⟨119906V⟩119861 isin 1198751

(ii) 119860 rarr ⟨119906V⟩ where 119860 rarr ⟨119906V⟩ isin 1198751

The results obtained from the lemmas above are summa-rized in the following theorem

Theorem 22 The family of Watson-Crick regular languages isclosed under union concatenation Kleene-star finite substitu-tion homomorphism and mirror image

This shows thatWK regular grammars preserve almost allof the closure properties of regular grammars Other closureproperties ofWK regular grammars are left for future studies

52 Watson-Crick Linear Grammars Similar to the subsec-tion above a Watson-Crick linear grammar 119866 is constructedfor the purpose of investigating the closure properties ofWKLIN

Let 1198711 1198712isinWKLIN and

1198661= (1198731 119879 1198781 1198751 120588)

1198662= (1198732 119879 1198782 1198752 120588)

(51)

be Watson-Crick linear grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 23 (union) 119882119870119871119868119873 is closed under union

Proof Define 119866 = (119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878

with 119878 notin 1198731cup 1198732and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (52)

Then 119871(119866) = 1198711cup 1198712

Lemma 24 (homomorphism) 119882119870119871119868119873 is closed underhomomorphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKLIN Let

ℎ 119879lowastrarr Σlowast be a homomorphismWithout loss of generality

we assume that 1198661is in the 1-normal form For each rule

119903 119860 997888rarr ⟨1199061

V1

⟩119861⟨1199062

V2

⟩ isin 1198751 (53)

where 1199061 V1 1199062 V2isin 119879 cup 120582 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩119861⟨

ℎ (1199062)

ℎ (V2)⟩ (54)

in 119875In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

Theorem 25 The family of Watson-Crick linear languages isclosed under union and homomorphism

It is compelling to prove if the concatenation of two WKlinear grammars is still included in WKLIN or not Thisalso decides whether WK linear grammars are unique fromlinear grammars or not as linear grammars are not closedunder concatenation and thus not closed under Kleene-staroperation

53 Watson-Crick Context-Free Grammars Let 1198711 1198712 119871 isin

WKCF and 1198661= (1198731 119879 1198781 1198751 120588) 119866

2= (1198732 119879 1198782 1198752 120588) be

Watson-Crick context-free grammars generating 1198711and 119871

2

respectively that is 1198711= 119871(119866

1) and 119871

2= 119871(119866

2) Without

loss of generality we can assume that1198731cup 1198732= 0

Lemma 26 (union) 119882119870119862119865 is closed under union operation

Proof Define the WK context-free grammar 119866 =

(119873 119879 119878 119875 120588) where 119873 = 1198731cup 1198732cup 119878 with 119878 notin 119873

1cap 1198732

and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

1 cup 119878 997888rarr 119878

2 (55)

Then it is obvious that 119871(119866) = 1198711cup 1198712

Lemma 27 (concatenation) 119882119870119862119865 is closed under concate-nation

Proof Define 119866 = (1198731cup 1198732cup 119878 119879

1cup 1198792 119878 119875 120588) where 119878 notin

(1198731cup 1198732) and

119875 = 1198751cup 1198752cup 119878 997888rarr 119878

11198782 (56)

Then 119871(119866) isinWKCF

Lemma 28 (Kleene-star) 119882119870119862119865 is closed under Kleene-staroperation

Proof Define 119866 = (1198731cup 119878 119879

1 119878 119875 120588) isinWKCF setting

119875 = 1198751cup 119878 997888rarr 119878

1119878 | 120582 (57)

Then it is not difficult to see that 119871(119866) isinWKCF

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

10 Applied Computational Intelligence and Soft Computing

Lemma 29 (homomorphism) 119882119870119862119865 is closed under homo-morphism

Proof Define 119866 = (1198731 119879 1198781 119875 120588) where 119871(119866) = ℎ(119871

1) We

show that the homomorphism of 1198711is also in WKCF Let 119904

119879lowastrarr Σlowast be a homomorphism 119875

1contains production rules

of the form

119903 119860

997888rarr ⟨1199061

V1

⟩1198611⟨1199062

V2

⟩1198612sdot sdot sdot ⟨

119906119896

V119896

⟩119861119896⟨119906119896+1

V119896+1

(58)

where 119896 ge 0For each rule 119903 isin 119875 we construct

ℎ (119903) 119860 997888rarr ⟨ℎ (1199061)

ℎ (V1)⟩1198611⟨ℎ (1199062)

ℎ (V2)⟩1198612

sdot sdot sdot⟨ℎ (119906119896)

ℎ (V119896)⟩119861119896⟨ℎ (119906119896+1)

ℎ (V119896+1)⟩

(59)

where 119896 ge 0In every successful derivation of 119866

1generating [119908119908] isin

[119879119879]lowast we replace production rule 119903 isin 119875

1with the

production rule ℎ(119903) isin 119875 and obtain the string [ℎ(119908)ℎ(119908)] isin[ΣΣ]

lowast Thus ℎ(119908) isin ℎ(1198711)

With the lemmas provided above in this subsection thenext theorem follows

Theorem 30 The family of Watson-Crick context-free lan-guages is closed under union concatenation Kleene-star andhomomorphism

Closure of WKCF under complement and intersectiondepends on the generative capacity ofWKCFThenonclosureof context-free (CF) grammars for intersection was shownwith the famous example that the intersection of two CFlanguages results in a string that cannot be generated by a CFgrammars If one can provide some examples of strings thatcannot be generated byWK context-free grammars then thenonclosure of WKCF can be proven

6 Applications of Watson-Crick Grammars

In this section we consider two examples of the applicationsofWatson-Crick grammars in the analyses of DNA structuresand programming language structures

61 DNA Structure Analysis Since Watson-Crick grammarsare developed based on the structure and recombinant behav-ior of DNA molecules they can suitably be implemented inthe study of DNA related problems

The analysis of DNA strings provides useful informationfor instance the finding of a specific pattern in a DNAstring and the identification of the repeats of a pattern arevery important for detecting mutation One of the diseasescaused by mutation is Huntington disease resulting fromtrinucleotide repeat disorders [13ndash15] It is discovered that

the number of repeats of the trinucleotide 119862119879119866119862119860119866 in thepatientrsquos DNA with Huntington disease is not normal Therepeats are also useful for finding the origin of replication ofmicroorganisms [16]

In this section we show thatWatson-Crick grammars canbe used for analyzing the repeats in DNA strings We use theDNA of a breed of pig Sus scrofa breed mixed chromosome1 Sscrofa102 provided by the The National Center forBiotechnology Information (NCBI) database (NCBI Refer-ence Sequence NC 0104434) [17]

Consider a part of the upper strand of the DNA from thebreed stated above with the length of 100 nucleotides119886119886119905119888119892119886119888119892119888119888 119886119888119886119888119892cag119892119888 cag119905119905119888119888119892119886119892 ctg119888119886119905ctg119892 119892119886119888ctg119888119905119888119888

119886119905119886119905ctg119886119886119892 119892119888119886119886119905119888ctg119892 119886119905119888cag119892119892119886119905 119905119892119888119886119888ctg119888119888 119905119905119888119905119888119905119888119888119905119886(60)

In this example the pattern 119888119905119892 is being repeated for sixtimes in the direct strand (upper strand) and two times in thereverse strand (lower strand) which can be seen as 119888119886119892 in theupper strand Focusing on the repeats of 119888119905119892 pattern a DNAstring containing such a pattern can be expressed as

(119886 119905 119888 119892lowast

119888119905119892 119886 119905 119888 119892lowast

)lowast

(61)

We now construct a simple Watson-Crick regular gram-mar 119866 to generate the above language

Let 119866 = (119873 119879 119878 119875 120588) where 119879 = 119886 119905 119888 119892 120588 =(119886 119905) (119888 119892) and 119875 consists of the following productions

119878 997888rarr ⟨119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119888

119892⟩119860

119860 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119892

119888⟩ 119878 | ⟨

119905

119886⟩119861

119861 997888rarr ⟨119888

119892⟩119860 | ⟨

119886

119905⟩ 119878 | ⟨

119905

119886⟩ 119878 | ⟨

119892

119888⟩119862

119862

997888rarr ⟨119886

119905⟩119862 | ⟨

119905

119886⟩119862 | ⟨

119892

119888⟩119862 | ⟨

119888

119892⟩119860 | ⟨

120582

120582⟩

(62)

For instance a derivation resulting in six repeats of 119888119905119892pattern can be obtained as follows

119878 997904rArrlowast⟨119909

119910⟩119878 997904rArr ⟨

119909119888

119910119892⟩119860 997904rArr ⟨

119909119888119905

119910119892119886⟩119861

997904rArr ⟨119909119888119905119892

119910119892119886119888⟩119862

997904rArrlowast⟨119909119888119905119892119909

119910119892119886119888119910⟩119860 997904rArr ⟨

119909119888119905119892119909119888

119910119892119886119888119910119892⟩119860

997904rArr ⟨119909119888119905119892119909119888119905

119910119892119886119888119910119892119886⟩119861

997904rArr ⟨119909119888119905119892119909119888119905119892

119910119892119886119888119910119892119886119888⟩119862997904rArr

lowast⟨119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910⟩119862

997904rArrlowast[119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909119888119905119892119909

119910119892119886119888119910119892119886119888119910119892119886119888119910g119886119888119910119892119886119888119910119892119886119888119910]

(63)

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Applied Computational Intelligence and Soft Computing 11

where 119909 isin 119886 119905 119892lowast and 119910 is their complementarity symbolsbased on 120588 respectively

Though in this example the functionality of WK regulargrammars is not used to the fullest the DNA structures canbe naturally and effectively analyzed with WK grammars

62 Programming Language Structure Analysis The abil-ity to differentiate between parentheses that are correctlybalanced and those that are unbalanced is an importantpart of recognizing many programming language structuresBalanced parentheses mean that each opening symbol has acorresponding closing symbol and the pairs of parenthesesare properly nested

Theparsing algorithms of compilers and interpreters haveto check the correctness of balanced parentheses in the blocksof codes including algebraic and arithmetic expressions

Although context-free grammars are used to developparsers for programming languages many programminglanguage structures are context-sensitiveThus it is of interestto develop parsers based on grammars which are able toanalyze non-context-free language structures

Further we show an example of balanced parentheses thatcan be generated by WK regular grammars but cannot begenerated by context-free grammars One can see that just byincorporating the concept of double-stranded string bondedwith Watson-Crick complementarity even WK grammarsthat are based of just regular rules can enhance the power ofthe parsing techniques

To avoid confusion we denote ldquo(rdquo as the open parenthesisterminal symbol and ldquo)rdquo as the close parenthesis terminalsymbol in bold font

Example 31 Let 1198665= (119878 119860 119861 ( ) (( )) 119878 119875

5) be a WK

regular grammar 1198755consists of the rules

119878 997888rarr ⟨(120582⟩119878

119878 997888rarr ⟨(120582⟩119860

119860 997888rarr ⟨)(⟩119860

119860 997888rarr ⟨)(⟩119861

119861 997888rarr ⟨()⟩119861

119861 997888rarr ⟨120582

)⟩119861

119861 997888rarr ⟨120582

120582⟩

119861 997888rarr 119860

(64)

From this we obtain the derivation

119878 997904rArr ⟨(120582⟩119878997904rArr

lowast⟨(119899

120582⟩119860

997904rArr ⟨(119899)(⟩119860997904rArr

lowast⟨

(119899)119899

(119899⟩119861

997904rArr ⟨(119899)119899

(119899)⟩119861997904rArr

lowast⟨

(119899)119899

(119899)119899⟩119861

997904rArr ⟨(119899)119899((119899)119899⟩119861997904rArr

lowast⟨

(119899)119899(119899

(119899)119899⟩119860

997904rArr ⟨(119899)119899(119899)(119899)119899

⟩119860997904rArrlowast⟨

(119899)119899(119899)119899

(119899)119899⟩119861

997904rArrlowastsdot sdot sdot 997904rArr

lowast[((119899)119899)119896

((119899)119899)119896]

(65)

where 119896 ge 1 Hence the language obtained is

1198715= ((119899)119899)

119896

119899 119896 ge 1 isinWKREG minus CF (66)

7 Conclusion

In this paper we defined Watson-Crick regular grammarsWatson-Crick linear grammars and Watson-Crick context-free grammars Further we investigated their computationalpower and closure properties We showed that

(1) WK linear grammars can generate some context-sensitive languages

(2) the families of linear languages and WK regularlanguages are strictly included in the family of WKlinear grammars

(3) the families of WK regular languages and linearlanguages are not comparable

(4) the family of WK linear languages is not comparablewith the family of context-free languages

(5) WK regular grammars preserves the closure proper-ties similar to the ones of regular languages

The following problems related to the topic remain open

(1) Are the family of context-free languages proper subsetof the family of WK linear languages or are they notcomparable

(2) What is the generative capacity of Watson-Crickcontext-free grammars

(3) What are the remaining closure properties ofWatson-Crick (regular linear and context-free) grammarsThese depend on their generative capacity

Moreover there are many interesting topics for furtherresearch for instance we can define WK context-free andregulated WK context-free grammars and use them in thestudy of DNA properties and in the DNA based applicationssuch as food authentication and disease gene detection

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

12 Applied Computational Intelligence and Soft Computing

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported through International IslamicUniversity Endowment B research Grant EDW B14-136-1021and Fundamental ResearchGrant Scheme FRGS13-066-0307Ministry of Education Malaysia The first author would liketo thank both organizations for the scholarship through theIIUM fellowship program

References

[1] J B Reece L A Urry M L Cain S A Wasserman PV Minorsky and R B Jackson Campbell Biology PearsonEducation 10th edition 2011

[2] R Freund G Paun G Rozernberg and A Salomaa Watson-Crick Finite Automata vol 48 of DIMACS Series in DiscreteMathematics and Theoretical Computer Science 1999

[3] E Czeizler ldquoA short survey on watson-crick automatardquo Bulletinof the EATCS vol 88 no 3 pp 104ndash119 2006

[4] L Kari S Seki and P Sosik ldquoDNA computingmdashfoundationsand implicationsrdquo in Handbook of Natural Computing GRozenberg T Back and J N Kok Eds pp 1073ndash1127 2012

[5] P Leupold and B Nagy ldquo51015840 rarr 31015840Watson-Crick automata withseveral runsrdquo Fundamenta Informaticae vol 104 no 1-2 pp 71ndash91 2010

[6] M I Mohd Tamrin S Turaev and T M Tengku SembokldquoWeighted watson-crick automatardquo in Proceedings of the 21stNational Symposium on Methematical Sciences vol 1605 of AIPConference Proceedings p 302 Penang Malaysia November2013

[7] K G Subramanian I Venkat and K Mahalingam ldquoContext-free systems with a complementarity relationrdquo in Proceedingsof the 6th International Conference on Bio-Inspired ComputingTheories and Applications (BIC-TA rsquo11) pp 194ndash198 PenangMalaysia September 2011

[8] K G Subramanian S Hemalatha and I Venkat ldquoOn Watson-Crick automatardquo in Proceedings of the 2nd International Confer-ence on Computer Science Science Engineering and InformationTechnology (CCSEIT rsquo12) pp 151ndash156 Coimbatore India 2012

[9] P Linz An Introduction to Formal Languages and AutomataJones and Bartlett 2006

[10] G Paun G Rozenberg and A SalomaaDNA Computing NewComputing Paradigms Springer Berlin Germany 1998

[11] G Rozenberg and A SalomaaHandbook of Formal Languagesvol 1ndash3 Springer 1997

[12] S Okawa and S Hirose ldquoThe relations among Watson-Crickautomata and their relations with context-free languagesrdquoIEICE Transactions on Information and Systems vol E89 no 10pp 2591ndash2599 2006

[13] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

[14] H T Orr and H Y Zoghbi ldquoTrinucleotide repeat disordersrdquoAnnual Review of Neuroscience vol 30 pp 575ndash621 2007

[15] J Petruska M J Hartenstine and M F Goodman ldquoAnalysis ofstrand slippage in DNA polymerase expansions of CAGCTGtriplet repeats associated with neurodegenerative diseaserdquo TheJournal of Biological Chemistry vol 273 no 9 pp 5204ndash52101998

[16] N C Jones and P Pevzner An Introduction to BioinformaticsAlgorithms MIT Press Boston Mass USA 2004

[17] M A M Groenen A L Archibald H Uenishi et al ldquoAnalysesof pig genomes provide insight into porcine demography andevolutionrdquo Nature vol 491 no 7424 pp 393ndash398 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Generative Power and Closure Properties ...downloads.hindawi.com/journals/acisc/2016/9481971.pdf · Watson-Crick automata. ekeyfeatureofWKautomataisthe symmetricrelation

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Recommended