Re: Segmentation of numbers

Samuel Murray
 

On 22/04/2020 23:52, M -- via groups.io wrote:

I have a problem with the segmentation of numbers. I have something like this:
*24<segment 01>*
*Hour National Crisis Line<segment 02>
I tested this on a plain text file, and I confirm that this happens.

My guess is that the default segmentation rules assume that a number at the start of a line, followed by a capital letter, is meant to be a line number or a heading number. I tried to google for the meaning of the four general rules in OmegaT, but I was unable to find a sufficiently comprehensive guide... and the link to the Java documentation in the OmegaT user manual is dead.

What you can do, is add a new rule above all other rules.

- In OmegaT, go Options > Segmentations.
- In the top part of the dialog, click Add. This will add a new set of rules, usually called "New Language and Country" and "LN-CO". Rename this to e.g. "Fixes" and ".*".
- Select the Fixes rule set and then click Move Up until the rule set is at the very top of the list.
- Then, while Fixes is selected, in the bottom part of the dialog, click "Add". This will add a new segmentation rule. Edit that rule as follows:

Break/Exception: unticked
Pattern Before: [0-9]
Pattern After: \s

This rule means that (apart from certain exceptions) no segment will ever break between a number and a space. I'm not sure if this will affect numbers followed by tabs.

Samuel

Join chat@omegat.groups.io to automatically receive all group messages.