Clicky

I have the following 3 lines of text and need a single regular expression to capture only the data. The data can contain any characters. The line with the word test can be (test1|test2|test3|test4)

1. test1 data- data  
1. data- data
data- data

Here's what I tried to do but not working.

$remove_test_name = "(?:test1|test2|test3|test4)";    
preg_match("/(?:\d+\.)? $remove_test_name (.*)\-(.*)/",$string,$matches);    
echo $matches[1] . "-" . $matches[2]; The output I'm getting is this for line 1 (mention above):

:test1 data

for line 2

:data

for line 3

:

What I would like is this:

data: data

asked 09/12/2011 04:22

areyouready344's gravatar image

areyouready344 ♦♦


38 Answers:
I do not understand the example.  Can you please give us some real-world input and show us what you want to get out from it?  Thanks. ~Ray
link
Ray_Paseur's gravatar image

Ray_Paseur

How about this?
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^d*.?s*(?:testd+)?s*#m';

	$result = preg_replace($pattern, "", $source);

	echo $result;
?>
link
kaufmed's gravatar image

kaufmed

For example, I have this file that have the following lines...

note:
          test1 - could also be test2 or test3 or test4
          data - the data could be any characters except test1 or test2 or test3 or test4

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data
link
areyouready344's gravatar image

areyouready344

Perhaps I should change my pattern a tad then:

1:
$pattern = '#^d+.s*(?:test[1-4])?s*#m';
link
kaufmed's gravatar image

kaufmed

Do you have spaces in the data-data fields?  I am asking because this looks like a made-up generalization and for better or worse regular expressions tend to be easier to write correctly when you have a few accurate examples of the inputs and the corresponding accurate examples of the desired outputs.

Example:
1. data-data is expected to yield data- data but do you really want to insert a blank after the hyphen?  It's these kinds of seemingly unimportant things that can cause a lot of debugging time.
link
Ray_Paseur's gravatar image

Ray_Paseur

This is the closest answer I can get

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);


The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
link
areyouready344's gravatar image

areyouready344

Building on kaufmed's code, this works for me:

Output is (source and result shown):
1. test1 data-data
1. data-data
data-data
------
data-data
data-data
data-data
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
<?php

  $source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

  $remove_test_name = "test[1-4]";
  $pattern = "#^(d+.?)?s*(?:$remove_test_name)?s*#m";

  $result = preg_replace($pattern, "", $source);

  echo $source."
------
";
  echo $result."
";
?>
link
TerryAtOpus's gravatar image

TerryAtOpus

Can you provide an answer based on my question? I almost got it working

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);

The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
link
areyouready344's gravatar image

areyouready344

Like this?
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
<?php

$string = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

$remove_test_name = "test[1-4]";
preg_match_all("#^(?:d+.?)?s*(?:$remove_test_name)?s*(.*)$#m",$string,$matches);
unset($matches[0]);
print_r($matches);
link
TerryAtOpus's gravatar image

TerryAtOpus

For example, I have the following line of text

1. test1 dkdkd- dkdkd

This code gives me the following output:

code
------
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);

Output
---------
test1 dkdkd- dkdkd

Question - why is test1 is showing up?
link
areyouready344's gravatar image

areyouready344

What are you outputting? (eg you shouldn't output $matches[0])
link
TerryAtOpus's gravatar image

TerryAtOpus

I"m outputting like this:

echo $matches[1] . " - " . $matches[2];
link
areyouready344's gravatar image

areyouready344

Pretty much exactly that code works for me:

$string = "1. test1 dkdkd- dkdkd";
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);
echo $matches[1] . " - " . $matches[2];

Output:
dkdkd -  dkdkd
link
TerryAtOpus's gravatar image

TerryAtOpus

If you have more than one space, you need a * instead of a ? after each \s:

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);
link
TerryAtOpus's gravatar image

TerryAtOpus

Let's go back to the problem definition.  Quote:

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data

Unquote.

This would seem to say you want to throw away the first two lines.  Can you please give us one or two real-world examples of the input strings with the corresponding output strings?
link
Ray_Paseur's gravatar image

Ray_Paseur

Here is the output of the var_dump:


[0]=>
  string(177) "1.  test1 dkdkd: dkdkd"
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"
link
areyouready344's gravatar image

areyouready344

You are only going to receive good results here if you clearly describe the input data, the expected output, and any errors you may be encountering. I wrote my pattern based on the requirement you previously described, yet I still see subsequent posts from you with patterns that are completely different that what I posted. If my (or others') pattern does not work for you, then provide sample input, resulting output, and any errors you may be receiving.
link
kaufmed's gravatar image

kaufmed

In:

  string(177) "1.  test1 dkdkd: dkdkd"

There are 2 spaces after the 1. so my latest pattern should address that.
link
TerryAtOpus's gravatar image

TerryAtOpus

Thanks Terry for understanding this problem and was hoping your last resolution (\s*) would work.
link
areyouready344's gravatar image

areyouready344

Maybe it would be easier to get this right if you eat the elephant in bites instead of trying to write a single complicated regular expression.
http://www.laprbass.com/RAY_temp_notready.php

But that said, there is no substitute for test-driven programming.  And for that you need to write your test cases first.  Practice has shown this to be the fastest way to write dependable code.  You can write code without creating test data - heck, any idiot can learn to write bad code without testing.  But the pros would probably want unit tests for something like this little algorithm.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
<?php // RAY_temp_notready.php
error_reporting(E_ALL);
echo "<pre>";

$strs = array
( '1. test1 dkdkd- dkdkd'
, '1. data-data'
, 'data- data'
)
;

$rgx1
= '#'              // REGEX DELIMITER
. '^d.'          // STARTS WITH A DIGIT AND A DOT
. '#'              // REGEX DELIMITER
;

$rgx2
= '#'              // REGEX DELIMITER
. '^testd?'       // STARTS WITH 'test' AND MAYBE A DIGIT
. '#'              // REGEX DELIMITER
;

foreach ($strs as $str)
{
    $new = $str;
    $new = trim(preg_replace($rgx1, NULL, $new));
    $new = trim(preg_replace($rgx2, NULL, $new));
    echo PHP_EOL . "$str TRANSFORMED INTO $new";
}
link
Ray_Paseur's gravatar image

Ray_Paseur

Thanks Terry for understanding this problem

Was that a jab? Please allow me to counter with a 1-2...

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^d+.s*(?:test[1-4])?s*#m';

	$result = preg_replace($pattern, "", $source);

	$result = explode(PHP_EOL, $result);

	print_r($result);
?>


Pity it won't be used though. Can't expect everyone to appreciate one's logic, I guess  :
link
kaufmed's gravatar image

kaufmed

Good luck Terry and Ray...  I'm hanging up the gloves  = )
link
kaufmed's gravatar image

kaufmed

@kaufmed: Spot on.

@areyouready344: Computer programming is an activity that requires clarity of thought and precision in execution.  You have to get the ideas into data and code in a way that will behave predictably.  In this matter, PHP is not your friend at all because it is highly permissive of sloppy programming and it hides important things from the programmer, such as accidental reliance on undefined variables.  If PHP is your only programming language you might want to consider studying something a little more structured.  And don't be impatient with yourself as you learn.  Rome was not built in a day.  This article explains what you are up against.
http://norvig.com/21-days.html

Anyway, you've gotten some working answers and hopefully some good ideas about how this kind of thing is usually done when time is money and accuracy matters.  Best of luck with your project, ~Ray
link
Ray_Paseur's gravatar image

Ray_Paseur

Me, too.  Over and out, ~Ray
link
Ray_Paseur's gravatar image

Ray_Paseur

kaufmed, I agree that a replace is probably a more elegant solution, and avoids mucking around with an array as a result. The author didn't seem comfortable to change much from his original code though - either will work in the long run.

I often feel it's overkill with 2-3 experienced experts working on the same problem, but thanks to my timezone I'm not awake to answer most of the EE questions, so I have to be pretty competitive to pick up some points (even if it feels like I'm butting in on another expert's progress at times!)... on the positive side though, the competition sharpens both my technical skills and my ability to interpret and explain. So thanks, to you and Ray!
link
TerryAtOpus's gravatar image

TerryAtOpus

The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.

When I use the following code below it works:


preg_match("/(?:\d+\.)?\s*test1?\s*(.*)-(.*)/",$string,$matches);
echo $matches[1] . " " . $matches[2];

against this line:

1. test1 dkdkd- dkdkd

output:

dkdkd- dkdkd

Why when I use this code below it does not work? The only difference is I don't use the () around test1 in the above example and use
it in the example below.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

output is:

test1 dkdkd- dkdkd

Why is test1 is being displayed in the second example?


link
areyouready344's gravatar image

areyouready344

In your first piece of code, with the ? after test1, the ? makes the 1 optional so that test or test1 would be matched.

The second piece of code makes the entire test1 optional, but doesn't capture it. I just retested that second piece of code with your given example, and it works fine for me - test1 is not displayed. Can you please retest? If it still fails, then is it possible there is a special (invisible) character included in the string?
link
TerryAtOpus's gravatar image

TerryAtOpus

@TerryAtOpus
The author didn't seem comfortable to change much from his original code though - either will work in the long run.

You and I both know this site is as much about deciphering what problems are as it is posing solutions to said problems. The only things that frustrate me here are when people don't clearly express themselves and when someone ignores a potential solution without so much as questioning the logic or why it may be better or worse than one's own approach--or even saying, "hey, I need to do it this way because [fill-in-the-blank]." I have zero problem with explaining any of my posts; I often neglect an explanation because I find often times people just want a "get-er-done" approach rather than a "teach a man to fish" approach. I know Ray's seen this; I'd have trouble believing you haven't seen it. If someone doesn't understand my approach, all I request is they ask me to explain. Since all of us here are volunteers (i.e. we don't get paid), I think it's a small price to pay to say, "Hey, I didn't quite understand why you went that way. Would you mind clarifying for me?" Also, I LOVE being called out when my logic is incorrect. It gives me a chance to learn from my mistakes...  and I'm wrong quite often. Hell, you've corrected me on a number of occasions, and I love you that much for it (totally platonic, I assure you). I'm here to learn just as much as I am to teach.

Terry, I'm not harping on you... I'm just using you as my soapbox for the moment. Hope ya don't mind  ; )


The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.

Read my profile. I don't claim to be an expert. I only hope that others who have worked with me can say, "yeah, he knows his stuff." If they are kind enough to bestow the moniker of "expert" upon me, then I thank them for it. Those who tout their own expertise have a need for self-satisfaction. I am completely satisfied with who I am.
link
kaufmed's gravatar image

kaufmed

That last bit wasn't for you Terry. Well, none of it was really.
link
kaufmed's gravatar image

kaufmed

I resolved the problem. The problem was I had two spaces between the line number and test1 and use the following code to resolve this issue.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

the problem was with this part of the code, it made it not greedy so on the first whitespace it stops.

preg_match("/(?:\d+\.)?\s*

I change the code above to the code below and it now works...

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

link
areyouready344's gravatar image

areyouready344

That doesn't make sense. I have never seen a regex engine work as you describe so I suspect you may have made a mistake in your logic somewhere, or maybe you have a version of php that has a bug?

This:
preg_match("/(?:\d+\.)?\s*
is not non-greedy, with respect to matching spaces.

This is non-greedy (by adding a ? after the \s*):
preg_match("/(?:\d+\.)?\s*?

This:
preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);
requires 2 space characters. Does it match your other cases? ie:
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"

link
TerryAtOpus's gravatar image

TerryAtOpus

All I can tell you is that everything is working after changing \s*? to \s\s I tested with Perl and PHP both were not working and both are working now.. Thanks Terry for mentioning an extra character is missing somewhere.
link
areyouready344's gravatar image

areyouready344

Did you have the non-greedy version:
\s*?
because that would explain why it didn't work. It should have been:
\s*

Anyway, glad you got it working!
link
TerryAtOpus's gravatar image

TerryAtOpus

Still had problem with this:

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

But changed it to this and everything is working:

preg_match("/(?:\d+\.)?(?:\s*)?(?:test1)?(?:\s*)?(.*)-(.*)/",$string,$matches);

Thanks for all your help Terry....
link
areyouready344's gravatar image

areyouready344

I've requested that this question be closed as follows:

Accepted answer: 0 points for areyouready344's comment http:/Q_27304152.html#36526890

for the following reason:

best solution
link
areyouready344's gravatar image

areyouready344

It's been a few weeks since I looked at this, and trying to go back over the trail of logic in this question makes my head spin. However, it's pretty clear to me that the author's final solution was based on code I provided, and my code was based on kaufmed's code. Even if the author's comment is accepted as the solution, both kaufmed and I deserve some points for helping the author along the way, thus I object to the closing of the question this way (to the author: did you know you can accept multiple comments when closing the question?). Thanks...
link
TerryAtOpus's gravatar image

TerryAtOpus

All,
 
Following an 'Objection' by TerryAtOpus (at http://www.qa.downappz.com/Q_27399445.html) to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
 
At this point I am going to re-start the auto-close procedure.
 
Thank you,
 
SouthMod
Community Support Moderator
link
SouthMod's gravatar image

SouthMod

Your answer
[hide preview]

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Tags:

×1

Asked: 09/12/2011 04:22

Seen: 301 times

Last updated: 11/02/2011 09:16