I find Python extremely useful at many scenario's. Certain kind of work I do involves lot of manual efforts and consumes lot of time. I started slowly experimenting with python then and there to automate few things with which I work.
One such case where python is very useful for me is, parsing large text files. It often happens that we need to parse huge file to pull data or parse log file to create reports. In all these cases python comes in very handy. I will share a small snippet below to show how powerful is python with regular expression module.
We need to identify the pattern which we want to track down and create a regular expression for it. We can write a template kind of snippet where we can change the regular expression to re use the code.
[Sun Mar 7 16:05:49 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 16:45:56 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:13:50 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:21:44 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)sample.txt -Permission denied
[Sun Mar 7 17:27:37 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:31:39 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:58:00 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:00:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:10:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:19:01 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:58:52 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:03:58 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:08:55 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:22:11 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:31:25 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)template.txt -Permission denied
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:58:52 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:03:58 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:08:55 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:22:11 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:31:25 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)example.txt -Permission denied
[Sun Mar 7 18:00:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:10:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:19:01 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
below is the output I got
['sample.txt', 'template.txt', 'example.txt']
Links to browse about python regular expressions
https://developers.google.com/edu/python/regular-expressions
https://docs.python.org/2/library/re.html
Thanks for reading
Cheers!
One such case where python is very useful for me is, parsing large text files. It often happens that we need to parse huge file to pull data or parse log file to create reports. In all these cases python comes in very handy. I will share a small snippet below to show how powerful is python with regular expression module.
We need to identify the pattern which we want to track down and create a regular expression for it. We can write a template kind of snippet where we can change the regular expression to re use the code.
#import statements import os import sys import re """function main holds the base logic and validation.
This function parses the given file to pick the specified pattern"""
def main(argv): if len(argv)==1: file = argv[0] if os.path.isfile(file): fh = open(file,'r') data = fh.read(); filter = re.findall(r'\[error\] \(105\)(.*?) -Permission denied',data,re.DOTALL) print filter else: print 'provide a valid file' else: usage() """This prints the usage of the script when input is not provided as excepted or input not in proper context""" def usage(): print 'Usage: python crawler.py path' # boilerplate template - it invokes main function if __name__ == "__main__": main(sys.argv[1:])
Below is the sample file content I used as input
[Sun Mar 7 16:05:49 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 16:45:56 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:13:50 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:21:44 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)sample.txt -Permission denied
[Sun Mar 7 17:27:37 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:31:39 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:58:00 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:00:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:10:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:19:01 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:58:52 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:03:58 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:08:55 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:22:11 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:31:25 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)template.txt -Permission denied
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:58:52 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:03:58 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:08:55 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:22:11 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 19:31:25 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 17:23:53 2004] [error] (105)example.txt -Permission denied
[Sun Mar 7 18:00:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:10:09 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:19:01 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:42:29 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
[Sun Mar 7 18:52:30 2004] [info] (104)Connection reset by peer: client stopped connection before send body completed
below is the output I got
['sample.txt', 'template.txt', 'example.txt']
Links to browse about python regular expressions
https://developers.google.com/edu/python/regular-expressions
https://docs.python.org/2/library/re.html
Thanks for reading
Cheers!
No comments:
Post a Comment