Custom Segmentation
Each time you upload XML, HTML, MD, or any other source files without a key-value structure, the predefined segmentation rules (SRX 2.0) are used for automatic content segmentation. Although, there might be situations when the default segmentation rules segment source files in contrast to the desired expectations.
In this case, you can define your own segmentation rules for each source file individually using the SRX 2.0 standard.
Change Segmentation
You can change segmentation in Sources > Files.
- Open the project where you’d like to adjust the segmentation rules and go to Sources > Files.
- Click (or right-click) on the needed file and select Settings.
- In the appeared dialog, switch to the Parser configuration tab.
- Select Enable content segmentation and Use custom segmentation rules.
- Paste your SRX segmentation rules and click Save.
After you save your new segmentation rules, your source file will be automatically reimported and segmented according to these new rules.
Segmentation Examples
A typical SRX file looks similar to the following:
<?xml version="1.0" encoding="UTF-8"?><srx version="2.0" xmlns="http://www.lisa.org/srx20" xsi:schemaLocation="http://www.lisa.org/srx20 srx20.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <header segmentsubflows="yes" cascade="yes"> <formathandle type="start" include="no"/> <formathandle type="end" include="yes"/> <formathandle type="isolated" include="yes"/> </header> <body> <languagerules> <languagerule languagerulename="Default"> <!-- Common rules for most languages --> <rule break="no"> <beforebreak>^\s*[0-9]+\.</beforebreak> <afterbreak>\s</afterbreak> </rule> <rule break="yes"> <afterbreak>\n</afterbreak> </rule> <rule break="yes"> <beforebreak>[\.\?!]+</beforebreak> <afterbreak>\s</afterbreak> </rule> </languagerule> </languagerules> <maprules> <!-- List exceptions first --> <languagemap languagepattern="[Ee][Nn].*" languagerulename="English"/> <languagemap languagepattern="[Ff][Rr].*" languagerulename="French"/> <!-- Japanese breaking rules --> <languagemap languagepattern="[Jj][Aa].*" languagerulename="Japanese"/> <!-- Common breaking rules --> <languagemap languagepattern=".*" languagerulename="Default"/> </maprules> </body></srx>