Fwd: pattern matching



thanks for all the suggestions.
Below follows a simple and (very) non-optimized algorithm for finding
the longest common substring (not subsequence!) for two lists, for
which we define a circular equivalence (i.e.,
[a,b,c]=[b,c,a]=[c,a,b]).

cycle(L) creates a list with all cyclical permutations of the original list
(%i2) cycle1(L):= endcons(first(L),rest(L))$

(%i3) ?cycle(L):=block([el:makelist([],i,1,length(L))],
? ? ? ? ? ? ? ?el[1]:L,
? ? ? ? ? ? ? ?for i: 2 thru length(L) do el[i]:cycle1(el[i-1]),
? ? ? ? ? ? ? ?el)$


functions to find the largest common pattern in two lists
lcstring2() finds the largest common string for 2 given lists, while
largestcs() look for all possible combinations in cycle(l1) and
cycle(l2) and returns one of the largest common strings. (it returns
the first largest one found - there might be more than one common
substring with the same size)

(%i39) largestcs(S,T):= block( [c1:cycle(S),c2:cycle(T),ret:[],s:[]],
? ? ? ? ? ? ? ? ? ? ? for m1 in c1 do
? ? ? ? ? ? ? ? ? ? ? ?for m2 in c2 do
? ? ? ? ? ? ? ? ? ? ? ? ?block(s:lcstring2(m1,m2),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if length(s)>length(ret) then ret:s),
? ? ? ? ? ? ? ? ? ? ?ret)$

(%i38) lcstring2(S,T):=
?block([tmp:ematrix(length(S),length(T),0,1,1),len:0,ret:[]],
? ? ? ?for i: 1 thru length(S) do
? ? ? ? for j: 1 thru length(T) do
? ? ? ? ?if not(S[i]=T[j]) then tmp[i,j]:0
? ? ? ? ? else block(
? ? ? ? ? ? ? ? ? ? ?if i=1 or j=1 then tmp[i,j]:1 else
tmp[i,j]:1+tmp[i-1,j-1],
? ? ? ? ? ? ? ? ? ? ?if tmp[i,j]>len then block(len:tmp[i,j],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ret:[i,j])
? ? ? ? ? ? ? ? ? ? ?),
? ? ? ?makelist(S[k],k,ret[1]-len+1,ret[1]))$


example:
largestcs([a,a,b,c], [a,b,z,a]) returns [a,a,b].


Paulo Gustavo Grahl, CFA
------------------------------------------
pgrahl at gmail.com
pgrahl at fgvmail.br
+55(21) 8809-9254
www.linkedin.com/in/pgrahl
------------------------------------------




On Wed, Mar 4, 2009 at 1:57 PM, Richard Fateman <fateman at cs.berkeley.edu> wrote:
> Perhaps use a program that is different from rest(...,-1) ? ?which
> circulates the items from the front.
> You need to figure out a way of terminating it though.
>
> cycling one sequence as you suggest would be an alternative.
>
> If this is an important problem for long sequences, you can make the program
> exponentially faster
> by accessing or removing the FIRST elements rather than the LAST elements of
> lists.
>
> Also, makelist(x,i,1,1)
> can be written ?[x]
> also l2=[] ? is much faster to test than length(l2)=0.
> RJF
>
>
>
> LPaulo Grahl wrote:
>>
>> thanks a lot.
>> A maxima version of the algorithm would be
>>
>> longerone(L1,L2):= if length(L1) > length(L2) then L1 else L2$
>> lcs(L1,L2):= if (length(L1)=0) or (length(L2)=0) then []
>> ? else if last(L1)=last(L2) then
>> append(lcs(rest(L1,-1),rest(L2,-1)),makelist(last(L1),i,1,1))
>> ? else longerone(lcs(rest(L1,-1),L2),lcs(L1,rest(L2,-1)))$
>>
>> but it still does not work for my goal as I need to check within
>> "circular" lists, i.e,
>> ?[a,a,b,c,d] is equivalent to [a,b,c,d,a] and so on and the algorithm
>> proposed gives different results depending on which one of the
>> equivalent lists I am using.
>>
>> for example applying it to [a,a,b,c] and [a,b,z,a] I find [a,b] as the
>> longest common sequence, but I need [a,a,b]. I can apply the algorithm
>> several times with one of the sequences cycled and get the longest
>> result, but I was wondering whether a simpler way to do it exists.
>> Thanks.
>>
>> --Paulo
>>
>> Paulo Gustavo Grahl, CFA
>> ------------------------------------------
>> pgrahl at gmail.com
>> pgrahl at fgvmail.br
>> +55(21) 8809-9254
>> www.linkedin.com/in/pgrahl
>> ------------------------------------------
>>
>>
>>
>>
>> On Wed, Mar 4, 2009 at 2:58 AM, Xiang Liu <hsiang.liu at gmail.com> wrote:
>>
>>>
>>> I have no idea on Maxima's approach. But your problem is a typical
>>> longest
>>> common sequence in Algorithm.
>>>
>>> We write A=A[1:n], B=B[1:m]. Now,
>>>
>>> if A[n]=B[n] then
>>> ?result=common(A[1,n-1],B[1,n-1]).append(A[n]);
>>> else
>>> ?result=longerOne( common(A[1,n-1], B[1,n]), common(A[1,n], B[1,n-1]) );
>>>
>>> The pseudo code above is just a skeleton.
>>>
>>> On Wed, Mar 4, 2009 at 12:10 AM, Paulo Grahl <pgrahl at gmail.com> wrote:
>>>
>>>>
>>>> Dear list members:
>>>>
>>>> I'm facing the following problem:
>>>> given a two lists, for example, [a,b,c,x,a] and [z,a,a,b]
>>>> I'm interested in finding the largest sequence of symbols that match
>>>> both lists, considering lists as circular, i.e.,
>>>> [z,a,a,b]=[a,a,b,z]=[a,b,z,a]=[b,z,a,a].
>>>> In the above example, the sequence "a,b" appears in both lists, but I
>>>> would be interested in finding the sequence "a,a,b" which is the
>>>> longest one to appear in both lists (considering the lists as
>>>> circular).
>>>>
>>>> Any idea on how to find such common patterns ?
>>>> Thanks.
>>>> Paulo
>>>>
>>>>
>>>> Paulo Gustavo Grahl, CFA
>>>> ------------------------------------------
>>>> pgrahl at gmail.com
>>>> pgrahl at fgvmail.br
>>>> +55(21) 8809-9254
>>>> www.linkedin.com/in/pgrahl
>>>> ------------------------------------------
>>>> _______________________________________________
>>>> Maxima mailing list
>>>> Maxima at math.utexas.edu
>>>> http://www.math.utexas.edu/mailman/listinfo/maxima
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Maxima mailing list
>> Maxima at math.utexas.edu
>> http://www.math.utexas.edu/mailman/listinfo/maxima
>>
>
>
>