Page 1 of 1

Regex problem

Posted: Wed Jan 16, 2008 10:15 am
by sanity
regex wizes look over here!

I need a regex that will react on the following characters:
?
!
.
;
:
\n

although a string like: ... or ??? should be treated as one reaction, but if more dots or questionmarks are shown later on in the full string it should react to that to

also I need it to react on smileys, such as :) :P :D ;) ;P ;p :p etcetera, this interfers with the previous engagement to react on : and ;, making a smiley to split up into : and ), ; and p for example

basically I need to split up a string with delimiters being those, for example this text:

"Hello and welcome to my site! This site will be about everything about computers...
Does this seem like a good idea? Or does it not, heh :) Bye"

that text should be split up to this:

1 => Hello and welcome to my site!
2 => This site will be about everything about computers...
3 => Does this seem like a good idea?
4 => Or does it not, heh :)
5 => Bye

Because it already splits up every ":" it will split the smiley as
4 => Or does it not, heh :
5 => ) Bye

and because it splits up each dot I guess it'll make it like this:
2 => This site will be about everything about computers.
3 => .
4 => .
5 => Does it seem like a good idea?

But I'm only sure about the smiley splitup

Posted: Wed Jan 16, 2008 1:50 pm
by zly
When you say "react", do you only mean to insert a linebreak?
And for which language, not all follows the same modifiers and syntaxes, but i assume for which ever language it is, that you want a perl compatible regex (the most used).

The following is done in php with preg_replace, and should do what you want:

Code: Select all

$string  = 'Hello and welcome to my site! This site will be about everything about computers... Does this seem like a good idea? Or does it not, heh :) Bye';

// Search and append linebreaks
$string = preg_replace("(\.{3}|\?{3}|:\)|:P|:D|;\)|;p|:p|[\?\.\!;:])", "\\0\n", $string);

// Get rid of any whitespace after the new linebreaks
foreach(explode("\n", $string) as $str)
  echo(trim($str)."\n");
Note that the order of the regex is the key to not break up your smileys or insert 3 linebreaks for '...'

Posted: Wed Jan 16, 2008 3:03 pm
by sanity
well I use preg_split actually, and I don't want to add a linebreak, I want to add text and there's alot more code to it, see here

$contentTok = preg_split( '/([.\n?!:;]+)+/' , $contentLine , -1 , PREG_SPLIT_DELIM_CAPTURE );

foreach ( $contentTok as $thisNum => $thisLine )
# loops through each sentence
{
here I check if it was a only a new line or a new sentence (just some simple variable-play) if its a new sentence I add the text, if only a new line I just add a <br> (pretty much same as nl2br() )
}

so I'd need your regex to fit into that formula

Posted: Wed Jan 16, 2008 9:08 pm
by sanity
thanks alot zly
I got it now:
#([:;=][|()/{}\[\]<>\\\odpsx]+|[.\n?!:;]+[\s]|[\n])#i

thanks for help