Parsing Logs with Perl – Line Feeds in the String

Windows… oh Windows please tell me why you have ruined my day today? Today I was working on logging password failure attempts for SIP registrations and came across a weird little bug where upon printing the log the log was being overwritten with text after a variable. This drove me nuts, possibly driving all of #perl nuts on freenode.

Example sanitized dataset

[9] 20151210140858: Resolve 6764: aaaa udp 1.2.3.4 5060
[9] 20151210140858: Resolve 6764: a udp 1.2.3.4 5060
[9] 20151210140858: Resolve 6764: udp 1.2.3.4 5060
[5] 20151210140858: SIP Tx udp:1.2.3.4:5060:
SIP/2.0 401 Authentication Required
v: SIP/2.0/UDP 1.2.3.4:5060;branch=z9hG4bK641f.daf89114.0
v: SIP/2.0/UDP 1.2.3.4:5060;received=1.2.3.4;rport=34822;branch=abcd1234
f: <sip:[email protected]>;tag=abcd1234
t: <sip:[email protected]>;tag=abcd1234
i: abcd1234
CSeq: 40838 REGISTER
User-Agent: PBX/1.2.3
Warning: 399 1.2.3.4 Password does not match
l: 0

First attempt at parsing the log thinking it would just work

#!/usr/bin/perl
use strict;
use warnings;
use Sys::Hostname;
 
# Log to parse
my $log = '/var/log/pbx.log';
 
# Fix hostname to look like /var/log/messages
my $hostname = hostname;
$hostname =~ s/\.\S+$//;
 
# Open files for logging
open my $fh, '<', $log or die "Error: $!\n";
 
# Set input separator record
local $/ = "";
while ( my $line = <$fh> ) {
    if ( $line =~ /
        (\d{14}):\sSIP.*          # Get Timestamp
        received=(\S+);rport.*    # Get IP
        t:\s<sip:(\S+)@\S+>;tag.* # Get User
        Warning:(.*?)\n           # Get Error
    /xsm ) {
    print $line;
    my $timestamp = $1;
        my $ip = $2;
        my $user = $3;
        (my $error = $4) =~ s/^\s+\d+\s\S+\s+//;
    print "$timestamp $hostname pbxf2b: $error from $ip for user $user\n";
    }
}
close $fh;
The output from above script
[[email protected] ~]# perl parser.pl
from 1.2.3.4 for user user Password does not match
from 1.2.3.4 for user user Password does not match
from 1.2.3.4 for user user Password does not match
from 1.2.3.4 for user user Password does not match
from 1.2.3.4 for user user Password does not match
from 1.2.3.4 for user user Password does not match

So all of a sudden my $timestamp and $hostname variables were gone and my error was at the end of my string; and let me tell you this was infuriating. The issue here apparently since I’ve almost always worked in a all Linux environment was that the logs had some mysterious WINDOWS characters in the strings that were wonking up all of my code. This is easily fixed but first lets take a bit of a journey down hexdump lane to confirm the issue.

Here’s an ASCII table for your enjoyment:
http://www.bluesock.org/~willg/dev/ascii.html

So looking up the carriage return hex symbol was pretty simple I am looking for 00x0d.

[[email protected] ~]# perl parser.pl | head -9 xxd
0001c50: 3435 3135 0d0a 693a 2035 6531 3339 6530  4515..i: 5e139e0
0001ca0: 6961 2d50 4258 2f35 2e33 2e32 0d0a 5761  PBX/5.3.2..Wa...
0001cd0: 6573 206e 6f74 206d 6174 6368 0d0a 6c3a  es not match..l:
0001ce0: 2030 0d0a 0d0a 0a32 3031 3531 3231 3031   0.....201512101
0001f60: 3631 6235 3138 6130 3337 6630 0d0a 4353  61b518a037f0..CS
0001f80: 4552 0d0a 5573 6572 2d41 6765 6e74 3a20  ER..User-Agent:
0002120: 7175 6972 6564 0d0a 763a 2053 4950 2f32  quired..v: SIP/2
0002260: 0d0a 4353 6571 3a20 3430 3833 3820 5245  ..CSeq: 40838 RE
0002270: 4749 5354 4552 0d0a 5573 6572 2d41 6765  GISTER..User-Age

If you look closely you will see tons of line feeds littering my input and poisoning my regex. So now that we have confirmed our suspicions and know the source of the problem we have a simple fix, strip all of those characters:

$error =~ s/\r//g;

And now parsing is working as expected! Yay.

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.