Before we start, I have to tell you that it is the continuation of an older tutorial you find here.
As you know the YARA is a tool aimed at helping malware researchers to identify and classify malware samples.
The YARA tool helps you to create descriptions of malware families based on textual or binary patterns.
This patterns – rules come with description consists of a set of strings and a boolean expression which determine its logic.
The YARA tool can be found on the official website.
First, you need to install the python version
I used the yara-python-3.7.0.win-amd64-py3.5 version.
You need to use Python 3.5.0 version from here.
Let’s test the Yara python module:
1 2 3 4 5 6 7 8 9 10 11 12 13 | python.exe Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import yara >>> from yara import * >>> dir(yara) ['CALLBACK_ABORT', 'CALLBACK_ALL', 'CALLBACK_CONTINUE', 'CALLBACK_MATCHES', 'CALLBACK_NON_MATCHES', 'Error', 'SyntaxError', 'TimeoutError', 'WarningError', 'YARA_VERSION', 'YARA_VERSION_HEX', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__version__', 'compile', 'load'] >>> print(YARA_VERSION) 3.7.0 >>> print(YARA_VERSION_HEX) 198400 |
You can see the Yara python module works well.
Let’s make a rule and test with a PDF file. This rule will tell us if the PDF comes with links.
The rule is one file named detected links into this path:
C:\\BackUP\\Tools\\Python35\\detectpdflinks
The source code of this rule is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | rule pdf_with_links { meta: author = "Catalin George Festila" last_updated = "2017" category = "informational" confidence = "high" description = "A PDF that contains a link or external content" strings: $pdf_test = {25 50 44 46} $link_anchor_tags = "<a " ascii wide nocase $the_uri = /\(http.+\)/ ascii wide nocase condition: $pdf_test at 0 and (#link_anchor_tags == 1 or (#the_uri > 0 and #the_uri < 3)) } |
Now will make the python script to use this Yara rule with one pdf file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | python.exe Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import yara >>> >>> def mycallback(data): ... print (data) ... yara.CALLBACK_CONTINUE ... >>> rules=yara.compile("C:\\BackUP\\Tools\\Python35\\detectpdflinks") >>> matches = rules.match('C:\\BackUP\\PDF\\lua.pdf', callback=mycallback) {'matches': True, 'meta': {'confidence': 'high', 'last_updated': '2017', 'author': 'Catalin George Festila', 'category': 'informational', 'description': 'A PDF that contains a link or external content'}, 'rule': 'pdf_with_links', 'strings': [(0, '$pdf_test', b'%PDF'), (583970, '$the_uri', b'(http://www.cs.usfca.edu/galles)')], 'namespace': 'default', 'tags': []} >>> |
You can see the URI on the variable named $the_uri.