How can I split a string in perl, keeping the delimiters, and having the split be between the delimiters?

My question is a little wordy so I'll try to explain with an example.

I have a file that's somewhat similar to XML that I need to parse, though not exactly. Elements in the file generally show up similar to XML format, like

<person><greeting>hello</greeting><goodbye>bye</goodbye></person>

I wanted to split up the file into individual sets of tags, so that one element would be

<greeting>hello</greeting>

and another would be

<goodbye>bye</goodbye>

Naturally for an empty element, <person> and </person> will end up being their own elements, I'm completely OK with that because of how I want to parse the file as a whole.

The issue I'm running into is how best to split the whole file into an array, because there's no newlines at all in the file, it's written out as you see it. I tried doing it like this

my @array = split(/(><)/, $file)

but the issue is that it doesn't preserve the angle braces as a part of the associated tag, but separates them out. Is there a way for me to split the file between the > and < characters?

Answers


I am not sure if this is the best solution, but to answer your question directly, you can split between the angles using lookbehind and lookahead assertions.

my @array = split(/(?<=>)(?=<)/, $file)

The difference is. that they do not consume the >< part, they match the position in between.

Another idea would be to use a backreference to match the correct (it matches the first closing tag with this name, that is wrong when identical tags are nested) closing tag, something like this

<([^>]*)>(.*?)</\1>

See it here on Regexr

You have two references in this regex. The first is used to match the closing tag, and in the second you will find the content of the tag.

Of course it will match at first the "person" tag, but you will find the other tags in $2. You would have to use the regex recursively on $2 till the result is an empty array.


Need Your Help

Start-job vs. Invoke-command -asjob

powershell powershell-v2.0 invoke-command start-job

I'm trying to do basic background jobs in PowerShell 2.0, and I'm seeing different things with start-job and invoke-command -asjob.