Skraypar: Pattern parsing with Iterators and Look Aheads

You’ll often be told not to parse HTML with RegEx – but what if you’re a rebel?

WHY YOU SHOULDN’T PARSE HTML WITH REGEX

Clicky.

WHY YOU COULD PARSE WITH REGEX

Parsing from static templates is pretty easy with RegEx and quite simple. The basic course of action is matching a line with what you’d want to match and either add grouping selectors in the RegEX or get your hands dirty and polish the data from that abhorrent line of HTML.


I made a successful RESTful service, Jikan.moe, using nothing but RegEx. This didn’t require any extra dependencies, libraries, yadda yadda. Neither was speed a concern since the parse was pretty quick.


 

 

What am I going on about?

Enter;Skraypar

With a terrible choice of a name, I began to simplify my repetitive tasks while parsing HTML using RegEx which consists of RegEx/pattern matching, loops, and so on.

Skraypar is an abstract PHP class which works by parsing by pattern matching, Iterators and Look Aheads’.

The parsing tasks split into 2.

  • (Inception) Pattern matching & callback on the line of match – Iterators
  • Additional pattern matching and callbacks within Iterators for dynamic HTML location – Look Aheads’

 

Think of it as the Iterator matching a table, and another Iterator matching the rows and the Look Aheads’ parsing the cells.

This is a pretty abstract and experimental project, I won’t blame you if you think I’ve gone mad. But heck – finding new ways to do things is one thing I like to do.

 

How does it work

1 – File Loading

Skraypar uses Guzzle as a dependency to fetch the HTML or if it’s a local file, it simply loads it. The file is loaded into an array, each line means each new index.

1B – Accepting & Rejecting

Fetching from the interwebs means you get to tell Skraypar which HTTP responses to allow and which ones to throw an exception at. By default, 200 (OK) and 303 (Forwarding) are accepted HTTP responses.

2 – Rules

When you extend a class with Skraypar, you’ve to set a method namely, loadRules, with added rules for Skraypar to remember when parsing.


Rules are patterns and callback functions for that pattern match. They loop at every line of code and if there’s a match and a callback executes – that particular rule is disabled.


3 – Iterators

Iterators are used inside of Rule Callbacks, by setting a breakpoint pattern and a callback pattern; the Iterator loops over each line executing a pattern match or Look Aheads until that breakpoint pattern is reached.

If breakpoint pattern is not found, Skraypar throws an exception that the parser failed by pointing to an unset offset in the array of lines from the file (since it increments)

There can be Iterators within Iterators.

4 – Look Aheads

Look Aheads are used inside Iterators. Usually, one could simply access a data on the next line given a pattern match for a line by incrementing the iterator count by 1. But in given cases, the data may not be available on the next line rather on the offset of 2 lines. This is a dynamic location for the data that is being parsed, hence a Look Ahead method basically looks for a pattern of that dynamically located data and parses it with a function callback.

5 – References

Everything is passed, controlled and set by references within the Iterator callables. You can pass a reference of the Iterator itself within it’s own callable to access setting responses or using the Look Ahead method of the Iterator Class or manually setting the iterator count property to an offset.


That’s pretty much it. This project is in development and is to be used as a dependency for the next major Jikan release. It’s not limited to Jikan, it can be used on any website or file.

 

No documentation is available at the moment.


Links

Advertisements

Playing with Browser Extensions

So I had to test out the usability of my REST API and what’s a better way than developing an app that does it? Of course, being limited with app development  skills,  I turned to something more easier that I, atleast, have some skills in; Browser Addons/Extensions.

Now one thing I learned was that developing a  browser addons is mostly the same as making a web application. Except you need to have a manifest file and have it compiled by your browser into a proper extension. That’s pretty much it.

Enter;Anime Info

The reason why it’s such a generic name was because I had thought of the possibility of releasing it into the wild market, free to download and use. So the name would easily be search-able and would get SEO points. But the thing is it’s been about a month and Opera’s market takes almost forever to review an Addon and Chrome and Firefox have you make an initial payment to start putting addons/apps on their market, and I don’t have an card available at the moment so ¯\_(ツ)_/¯.

screen1screen2screen3

I don’t plan on updating the addon, it was merely for searching up Anime and viewing it just so I could get the gist of browser addon development and to see how my REST API worked with it. The conclusion was pretty nice.

The whole project is open-source and available on Github: https://github.com/irfan-dahir/anime-info

Designed with Material Design in mind and developed with speed in mind, it’s probably the only addon out there for its’ purpose. (No kidding, I couldn’t find anything remotely close). Sure, it can be updated to allow users to login and even update their Anime and Manga lists and even view and search Mangas – but again, nah.

DAY 5 – ‘Comments’ | DECEMBER WEB DESIGN CHALLENGE

I actually lost the challenge. Totally forgot to make a design on the 28th. This should’ve been Day 6, but ah well. In this case, what I’ll simply do is push it a day extra, so the challenge would end on the 3rd of Jan instead. I will continue the challenge.

So, this is a post/comment based component. Something you’d see on any social media. It does look a little similar to Facebook but that’s just your imagination. 😋

Demo | Download | In The Making Timelapse (Coming)

I can confidently say that with this one I’ve managed to combine the use of FlexBox and Grid and go up a level! 🧙‍♂️ It’s amazing how they both can work together.

Specification

DAY 4 – ‘Plans’ | DECEMBER WEB DESIGN CHALLENGE

I struck off the 24 hour limit as I’ve been insanely busy during the past 2 days. But I managed to complete this design before morning (6.30am) so I won’t be considering it a “next day” yet for the sake of not ruining my own challenge. 🤔

Demo | Download | In The Making Timelapse

 

This design, again, was done as fast as I could with little planning as I woke up at around 4.00am and begun at 4.30~. So it took me about an hour and half to complete this one.

I used FontAwesome 5 for the icons and picked out a gradient color from uigradients.com.

 

Specification

Colors used are:

  • Primary Gradient (#ff9966)
  • Secondary Gradient (#ff5e62)
  • White n’ Black

And as the last 2 designs, Open Sans was used for the type face.

DAY 3 – ‘Me’ | DECEMBER WEB DESIGN CHALLENGE

So this design was done in the most possibly messy way. I screwed up a lot given I was unavailable the whole day to work on anything, then sleepily designed something as simple as I could within 2 hours.

 

Demo | Download | In The Making Timelapse

 

Given I almost dozed off between coding, I messed up the file system and lost all source files at the end. Thankfully,  I had a tab open in the browser and saved it directly from there then as messily as I could, pushed it to the repo.

There’s nothing fancy, just a static design without any responsive or feedback details given the amount of time I had available.

 

I got the display picture from uifaces.com, it belongs to @AdhamDannaway.

 

There’s a little effect where the background is black and it fades in once the page has loaded completely. Obviously something broke since I made a static save from the browser and it’s now not working. Don’t really want to touch it either.

 

Specification

Colors used are simply black and white.

There is no responsivity given the amount of time I had but it shouldn’t do bad on devices since it’s grid-made and sticks to the center.

DAY 2 – ‘Display’ | DECEMBER WEB DESIGN CHALLENGE

Product modals aren’t really my strong point. This was a pretty good time to change that.

A clean, “pastel-feel”, “blue-ish” with slight touches of Material design brings light to “Display” – the refreshingly sleek product modal.

 

With today’s design, I managed to get a tighter grip on CSS Grids & some Flexbox. Indeed, it really makes things much easier once you’ve managed to comprehend the gist of it.

 

Demo | Download | In The Making Timelapse (Youtube)

 

The Apple Smart watch mockups were taken from Freepik. Respective used asset licenses are available here.

Specification

Colors used:

  • Good ol’ white (#fff)
  • Background (#94B4DD)
  • Primary Color (##779CF1)
  • Shades of Black

Font: Open Sans

 

And that conclude Day 2.

Day 1 – ‘Authenticate’ | December Web Design Challenge

And we start off Day 1 with a login design component. A rich yet simple page focused on a quick login with content available to its’ left.

I first had it use a pastel blue color which looked completely off on mobile devices, rendering the white text hard to read. I then switched to a darker contrast of that color which not only looks nice on the desktop but mobile devices as well.

 

Demo | Download | In The Making Timelapse (Youtube)

 

The logo and pattern at the bottom is taken from freepik.com. Respective used asset licenses are available here.

 

Specification

There are 3 colors used:

  • White (#fff)
  • Pastel Turquoise (#8ECCC8)
  • Nearly Black (#4C4C4C)

Font: Montserrat

 

It took more more than the average time to make this one thanks to the learning curve placed by CSS Grids. I got hang of the column and row components so far since there was nothing too complex done.

 

This concludes Day 1.