Solid Fluid System Solutions  
Home Software About Hardware Firmware
Document Icon 1. Introduction
Document Icon 2. Source data
Document Icon 3. Copied data
Document Icon 4. Analysing process data
Document Icon 5. Forming the expression
Current Document Icon 6. Extending the expression
Document Icon 7. Completing processing
Document Icon 8. Conclusion

Extending the regular expression

Having begun to form the expression we now have a complete scheme for selecting, and inserting commas for the first field in the line below (in blue);

  • Search expression - ^(\d+(?=\x20))\x20
  • Replace expression - \1,
1 Richard Pinkett MG TA 00 76.52 74.73 73.56 72.92 1000 1000 74.69 75.23 72.92 74.69 147.61 5 131

What we need now, is a scheme to get at the part in green, and add a comma after it. Some time back we realised that we could use the the CLS field (in red) as a terminator to stop selection after the text field. All we need to do is to decide how to express the field in green, and the field in red.

Looking back at the whole dataset reveals that the majority of the green field on each of the 150 lines only contains digits word characters and spaces. With an eagle eye, one also realises that a couple of these fields contain hyphens and brackets. We can then, create the first part of our expression, [\d\w\s\-()]+. This is pretty straightforward. It's just a range that can match digits word characters and spaces, in addition brackets and hyphens. The range has the property of matching a single character from those described inside the square brackets. Obviously we have more than just the one so, like before, the plus indicates one or more.

What we need now is a way to terminate the greedy plus. We already decided that this would be the CLS field (in red). To terminate we use \x20\d{2}\x20. We already know about \x20, and that \d matches any digit. The only new thing, is the curly brackets. Like the plus these implement a repeat on the previous token, in this case digits. \x20\d{2}\x20, then matches a space followed by two digits, followed by a space.

At this point you may be wondering why we've included the additional trailing space. The reason is that the plus we're trying to terminate with it, is greedy. If we don't include the extra space the greedy plus, trying to match digits, words and spaces, will leap straight through the CLS field, because the terminator can match on the first numeric field "76" in black. Because of the trailing space, a failure to match occurs on the first "76" where the space collides with the decimal place period. The trailing space ensures that the terminator always terminates at the CLS field.

So then, to match the field in green we use [\d\w\s\-()]+(\x20\d{2}\x20). The problem with this is that it also matches the red CLS field. When we use the technique, previously described in detail, to move the characters outside, it doesn't matter that the terminator is longer than that which we wish to change. We are going to describe the terminator as a metatoken group, so even though it will perform the termination, it won't affect the changes we want to make in replacement. Using the technique, we get; ([\d\w\s\-()]+(?=\x20\d{2}\x20))\x20.

In this new expression, the body part (in green), is completely enclosed in the first group, and the trailing space is outside the group, just like before. If we were going to use this search expression on it's own, then we would use the same replacement expression as before. As it is, we haven't put any thought into where the expression is going to start matching. Because of this, the best thing to do is to concatenate it with the first expression, at the top of the page. What we actually want to do is have this expression follow on from that one.

Once concatenated we have the following search and replacement expressions;

  • Search expression - ^(\d+(?=\x20))\x20([\d\w\s\-()]+(?=\x20\d{2}\x20))\x20
  • Replace expression - \1,\2,

We can now run these expressions using a "Replace All" operation. The overall effect will be to add commas between the blue and green fields, and the green and red fields. This yields;

1,Richard Pinkett MG TA,00 76.52 74.73 73.56 72.92 1000 1000 74.69 75.23 72.92 74.69 147.61 5 131
2,Ian Anderson BL Mini GT,00 1000 67.36 66.09 64.88 68.15 67.69 67.46 67.33 64.88 67.33 132.21 4 129
3,Rob Choules Suzuki Swift Gti,00 1000 61.69 61.85 61.44 64.07 61.36 62.87 61.85 61.44 61.85 123.29 2 120
4,James Tapner Peugeot 106 Rallye,00 1000 60.66 59.43 59.26 63.6 62.14 63.34 61.81 59.26 61.81 121.07 1 116
5,Andy Thomas Rover Metro Gti,00 64.93 63.68 63.32 63.32 66.36 65.6 65.53 65.4 63.32 65.4 128.72 3 126
10,Stephen Biggs VW Golf Gti,01 65.23 66.26 62.38 60.45 64.18 64.33 65.13 71.68 60.45 65.13 125.58 6 125
11,Dave Penycate Volkswagen Golf Gti,01 61.68 61.09 61.98 61.26 65.75 64.86 64.62 62.89 61.26 62.89 124.15 5 122
12,Andrew Till MG ZR160,01 55.51 55.09 56.86 54.83 60.21 58.6 57.78 57.47 54.83 57.47 112.3 2 85
14,Jeremy Parker Honda S2000,01 52.82 51.19 51.27 51.84 57.17 57.37 56.77 55.97 51.27 55.97 107.24 1 55
15,Vicki Lawrence Nissan Sunny Gti,01 58.31 56.45 57.17 57.84 63.29 61.14 60.47 60.37 57.17 60.37 117.54 3 108
16,Peter Lawrence Nissan Sunny Gti,01 59.78 59.78 59.07 59.59 61.42 62.27 61.4 61.62 59.07 61.4 120.47 4 113
21,Tim Cole Mini Cooper,02 55.43 54.45 54.23 53.76 57.05 56.25 55.83 55.91 53.76 55.83 109.59 1 68
22,Nigel Patten Renault 8 Gordini,02 66.03 63.35 62.2 61.38 65.2 63.35 63.66 63.23 61.38 63.23 124.61 2 123
29,Lee Whittaker Subaru Impreza,05 55.47 52.99 52.08 52.85 56.05 55.18 55.19 54.23 52.08 54.23 106.31 3 53
Copyright © Solid Fluid 2007-2022
Last modified: SolFlu  Thu, 25 Jun 2009 19:31:29 GMT