How I Dumped My Entire Careem Rides Data

careem.png

 

I use Careem. A lot. The most I’ve used it was until earlier this year for booking rides to and from my College. It’s one of the most affordable modes of transport in Pakistan right now.

For those who don’t know what Careem is, it’s a ride hailing service like Uber.

As a fan of data, I really wanted to know how much I’ve spent, covered distance, etc, over the period I’ve used Careem. Unfortunately, there’s no public API made available by Careem. There is something of the likes however you have to contact them for access and I haven’t got a reply yet. 😦

With my experience in building a successful REST API service, I decided to turn to reverse engineer their website dashboard for users. This dashboard allows you to book rides online and shows you your ride history.

Screenshot from 2018-10-18 15-07-07.png

My initial thought was web scraping but that required me to create a scraper that authenticated through their web form, bypass their CSRF protection and that hidden Google ReCaptcha. If I could that by passing around headers and sessions, this would’ve allowed me to develop an unofficial API but ain’t nobody got time fo dat. What I really wanted was my rides data.

One way to do so is to go to the rides page and intensly click on “Load More” until you get a good amount to “select all” and click “Export as CSV”.

Screenshot from 2018-10-18 15-09-54

This method wasn’t really cutting out to be efficient, I needed something that’d be much more faster. And I got something really interesting that output a lot more data than what they’d want to show on the frontend.

Careem’s frontend, like any other, gets it’s data from their server via GET or POST requests. I popped up the handy Chrome Dev Tools and performed a network monitor for any requests being made as I clicked on the “Show More Rides” button – since it updates the DOM (updates the page with more data without reloading).

And this brought my attention to the following request:

Screenshot from 2018-10-18 15-15-23.png

This request fetches the most recent 10 rides. This is why when you check your most recent rides on the dashboard or the Careem App, it only shows 10 at a time.

Let’s look at some useful parameters being thrown in there.

start and limit, where the former is the starting offset of rides and the limit is how many I’d want to fetch. There’s a few other request props such as serviceAreaId and key. Both of them were of no use and could be removed from the request.

Careem makes this request via POST which means you shouldn’t be able to view it via your browser URL like I did (screenshot below).

Screenshot from 2018-10-18 15-19-23.png

But it works anyways, so why not ¯\_(ツ)_/¯

Anywho, as you can see. There’s 10 items in the data array. This is my 10 recent rides. I don’t know what’s up with the other properties like results and success. I don’t know why they’re null when this request is obviously successful. Possibly leftover development code.

So I successfully went up to requesting from 10 rides to 100 rides data in a single request. My GET request would be this https://app.careem.com/getAllAccessibleCompletedTrips.json?start=0&limit=100

Screenshot from 2018-10-18 15-26-56.png

Oh also, I do need to mention if you’re not aware, you have to be logged in to view this URL.

And for some reason this sometimes returned an error and didn’t work out if you give it a large range. e.g 200 rides. Would respond with the following:

Screenshot from 2018-10-18 15-27-16.png

 

Downloading my data

I only needed to make 2 requests and download the content since I only had 380 rides. The data is in a proper data structure format called JSON, which makes the work for me so much easier. All my rides were in a property called data, which was an array, in both files and I just had to write a simple script to merge them both into one.

Screenshot from 2018-10-18 15-30-07.png

This is the COLLAPSED (not all the info as you can see most of the arrays are collapsed) data structure. Pretty convenient.

 

What can I do with this data?

Now I have this huge dataset telling me about anything and everything about the ride. This includes trip pricing break downs, dropoff and pickoff metadata and coordiantes, total price, distance covered, waiting time, in route time, when the driver arrived, how long the driver had to wait, when we reached our destination, whether I (the client) or the driver is verified, whether my ride was waived, client data, driver data, car data, car type and A WHOLE LOT more.

 

Introducing Careem Analyzer

composer require irfan/careem-rides-analysis

I spent the next few hours developing a small PHP parser to read the important parts of data available, which is open source and available here. Visit that link to read more about what kind of data you can access for each ride.

And now I was finally able to produce a POC, called in the library and looped through my rides to sum how much I had used Careem.

Rs. 40,731 and 2502.82 Km.

Wow that’s a lot… less. Compared to other direct modes of transport such as Rickshaws, that is. An estimate 45% in savings had I taken a Rickshaw for my college transport. Possibly a lot more, this is just a very basic calculation.

So let’s go deeper. I’ve made an example.php file which uses the parser library to analyze the data and even create a CSV with every driver’s info.

Here’s the output for my rides:

Total Rides: 380
Total Spent: Rs. 40,731
Total Distance: 2502.8235 Km
Average price per km: 16.274 Per Km
Traveled in: GO, Bike, Go Mini, GO+ car types

Waived Rides: 17
Avg. In Journey Wait Time: 4.98 min
Avg. Initial Wait Time: 1.97 min
Total In Journey Wait Time: 1886 min
Total Initial Wait Time: 746 min

—BREAKDOWN—
Car Type: “GO”
Rides: 217 ride(s)
Total Spent: Rs. 25742.9
Avg Price/Ride: Rs. 118.63 /Ride
Avg Price/Km: Rs. 17.67 /Km
Avg. Distance/Ride: 6.71 Km/Ride
Avg. Duration/Ride: 15.33 Min/Ride
Total Distance: 1457 Km
Total Duration: 55.45 Hours

First Ride: Sunday Feb 5, 2017 at 9.38am

Car Type: “Bike”
Rides: 64 ride(s)
Total Spent: Rs. 2175.73
Avg. Price/Ride: Rs. 34 /Ride
Avg. Price/Km: Rs. 5.35 /Km
Avg. Distance/Ride: 6.35 Km/Ride
Avg. Duration/Ride: 14.98 Min/Ride
Total Distance: 406 Km
Total Duration: 15.98 Hours

First Ride: Monday Mar 19, 2018 at 8.34am

Car Type: “Go Mini”
Rides: 1 ride(s)
Total Spent: Rs. 87.93
Avg. Price/Ride: Rs. 87.93 /Ride
Avg. Price/Km: Rs. 27.36 /Km
Avg. Distance/Ride: 3.21 Km/Ride
Avg. Duration/Ride: 13.77 Min/Ride
Total Distance: 3 Km
Total Duration: 0.23 Hours

First Ride: Sunday Aug 12, 2018 at 8.13pm

Car Type: “GO+”
Rides: 98 ride(s)
Total Spent: Rs. 12726.96
Avg. Price/Ride: Rs. 129.87 /Ride
Avg. Price/Km: Rs. 20 /Km
Avg. Distance/Ride: 6.49 Km/Ride
Avg. Duration/Ride: 15.06 Min/Ride
Total Distance: 636 Km
Total Duration: 24.59 Hours

First Ride: Friday Mar 24, 2017 at 8.34am

 

And here’s some cool graphs from the CSV dump for driver info.

Screenshot from 2018-10-18 16-12-41.png

Color of the cars that I rode in

Screenshot from 2018-10-18 16-12-53.png

Make of the Cars

Screenshot from 2018-10-18 16-13-22.png

Model of the cars

Screenshot from 2018-10-18 16-13-32.png

Car Build Years

Do note, the data is messy and I’m not bothering to tidy up because this is PoC only and I’m bored of this project already.

Is this a security risk for Careem?

With what happened earlier this year in mind, I doubt this is a security risk as it’s not a hack. I don’t have any other person’s ride data available except for mines and mines only.

But there is a slight concern I had with the data that was available for each driver. Although the data is displayed on the frontend as well, in this bulk amount it could be very useful for marketing or something.

Nevertheless, I feel that Careem possibly puts too much driver information in the hands of the client. Although it’s understandable why might be needed for the client, such as losing something in their car or wanting to report the driver. But the data ranges all the way back to your initial ride. And that’s something to think about.

¯\_(ツ)_/¯

 

That’s all for this post and project.

Advertisements

Jikan Update – October 2018

Stats for JikanREST have stayed pretty much the same for Jikan in the past month, in the range of 2-3 million requests per week. I don’t think that’d be dying down anytime soon.Screenshot from 2018-09-30 04-48-58.png

This is rather a minor update so I’ll keep to the point.

JikanPHP v2.0.0 Stable

After months of being stuck in RC, we’ve plowed through the field of bugs and finally released v2.0.0 Stable a few days back. Rejoice!

v2.1.0 Stable is nearly ready for release as well which features new constant updates, bug fixes, and User List Parsing.

JikanREST v3.1 Release

JikanREST runs on the dev-master branch rather than any published released so it’s always upto date with the recent additions to the parser.

I’ve added a few new features to the REST service.

1. Nullable Date Range Props

A terrible problem with MyAnimeList is that when it returns date ranges for Anime & Manga, it’s usually data submitted through the community users and sometimes doesn’t pass the validation checks as MAL Editors/Mods are going through them. This brings up very inconsistent Date formats. Which sometimes break the parser.

Therefore as per a discussion with contributors, we worked on and introduced Nullable Dates. Which look like this:

Screenshot from 2018-09-30 04-59-53.png

Quite convenient, if you ask me.

 

2. User Lists

A much awaited feature! There’s this internal MAL API which returns data in JSON. The problem being that it does not have CORS support and one would have to rather rely on a CORS Proxy which can bring down the response time. Thus, with a bit of standardization of the schema, I’m introducing User Lists to Jikan.

Usage Info

 

3. Season Archive

MAL has a season archive page where it lists a valid amount of Years and Seasons that it currently has in it’s database. This feature was complete when JikanREST 3.0 was released but not included. It has now been added to the REST service.

Usage Info

 

Going Forward

There’s still quite a lot of features that are included in the Roadmap and I wish to add. Unfortunately will be getting busy now due to IRL stuff, but try to will keep consistent updates to Jikan. PRs are much welcome!

Jikan Roadmap

Jikan Update – September 2018

This update marks a new milestone for Jikan. I am super excited to announce quite a few things.

JikanPHP v2

JikanPHP has the backbone, parser of the Jikan REST service. It does all the parsing. With the help of some amazing contributors, we’ve rewritten JikanPHP. Making it more robust, professional and standard. V2 promises PSR2.

More on usage in the newly written Documentation which is powered by MkDocs – Material – providing a fresh look and better UX over the old documentation.

Lots of new stuff were added, lots of bugs were squashed, lots of effort was put into standard quality coding. There’s just too much to list here! Read more in the changelog.

Jikan REST Service

JikanREST has broke through an amazing mount of milestones. We’ve been getting over 3 million requests weekly! That’s no joke, behold the stats below.

Screenshot from 2018-08-30 05-47-51.png

That’s on average ~12,000 requests hourly, ~280,000 requests daily.

Service Improvements

Matsujo Hibiki, Jikan’s hosting sponsor has equipped the server with dual load balancers for fetching fresh requests and a master server that handles any cache. This means ZERO rate limiting from MyAnimeList and a 100% uptime service!

Jikan REST v3.0

I’ve extremely excited to announce JikanREST v3! Let’s go over a few of the many improvements.

Open Source

The entire codebase is open source and anyone can set up their own JikanREST instance!

No more Daily Limits!

That’s right! Daily limits of 5,000 requests per IP has been removed!

In return a much better throttling middleware has been introduced.

  • Cached requests are NOT throttled/counted against your limit
  • Clients can make upto 30 requests / minute
  • Clients can make upto 2 concurrent requests / second

That’s the limit over at api.jikan.moe. Obviously, if you’re hosting the instance, you can reconfigure these values, even remove them.

JikanREST v2.2 “Extended Requests” Depreciation Notice

In support of some things such as Jikan no longer utilizing ‘extended requests’ (due to performance issues and rather standing the endpoints as separate endpoints for JikanREST v3);

“Extended Requests” will depreciate on January, 1st 2019.

Developers are encouraged to start using api.jikan.moe/v3 ASAP.

That’s all for now folks.

Oh, we have a discord community set up! Come say hello!

Starting Affordable Professsional Web Design Services

So I’ve finally come around (free from college and so) to start my own Fiverr Gig (View Here)

The point in this being that I’ve got pretty good in designing websites and so using modern technology and I believe that having a personalized website shouldn’t be so difficult for anyone. Furthermore I think this is a good opportunity to test my customer service skills (lol) and client handling.

Let’s see how this rolls. ¯\_(ツ)_/¯

Skraypar: Pattern parsing with Iterators and Look Aheads

You’ll often be told not to parse HTML with RegEx – but what if you’re a rebel?

WHY YOU SHOULDN’T PARSE HTML WITH REGEX

Clicky.

WHY YOU COULD PARSE WITH REGEX

Parsing from static templates is pretty easy with RegEx and quite simple. The basic course of action is matching a line with what you’d want to match and either add grouping selectors in the RegEX or get your hands dirty and polish the data from that abhorrent line of HTML.


I made a successful RESTful service, Jikan.moe, using nothing but RegEx. This didn’t require any extra dependencies, libraries, yadda yadda. Neither was speed a concern since the parse was pretty quick.


 

 

What am I going on about?

Enter;Skraypar

With a terrible choice of a name, I began to simplify my repetitive tasks while parsing HTML using RegEx which consists of RegEx/pattern matching, loops, and so on.

Skraypar is an abstract PHP class which works by parsing by pattern matching, Iterators and Look Aheads’.

The parsing tasks split into 2.

  • (Inception) Pattern matching & callback on the line of match – Iterators
  • Additional pattern matching and callbacks within Iterators for dynamic HTML location – Look Aheads’

 

Think of it as the Iterator matching a table, and another Iterator matching the rows and the Look Aheads’ parsing the cells.

This is a pretty abstract and experimental project, I won’t blame you if you think I’ve gone mad. But heck – finding new ways to do things is one thing I like to do.

 

How does it work

1 – File Loading

Skraypar uses Guzzle as a dependency to fetch the HTML or if it’s a local file, it simply loads it. The file is loaded into an array, each line means each new index.

1B – Accepting & Rejecting

Fetching from the interwebs means you get to tell Skraypar which HTTP responses to allow and which ones to throw an exception at. By default, 200 (OK) and 303 (Forwarding) are accepted HTTP responses.

2 – Rules

When you extend a class with Skraypar, you’ve to set a method namely, loadRules, with added rules for Skraypar to remember when parsing.


Rules are patterns and callback functions for that pattern match. They loop at every line of code and if there’s a match and a callback executes – that particular rule is disabled.


3 – Iterators

Iterators are used inside of Rule Callbacks, by setting a breakpoint pattern and a callback pattern; the Iterator loops over each line executing a pattern match or Look Aheads until that breakpoint pattern is reached.

If breakpoint pattern is not found, Skraypar throws an exception that the parser failed by pointing to an unset offset in the array of lines from the file (since it increments)

There can be Iterators within Iterators.

4 – Look Aheads

Look Aheads are used inside Iterators. Usually, one could simply access a data on the next line given a pattern match for a line by incrementing the iterator count by 1. But in given cases, the data may not be available on the next line rather on the offset of 2 lines. This is a dynamic location for the data that is being parsed, hence a Look Ahead method basically looks for a pattern of that dynamically located data and parses it with a function callback.

5 – References

Everything is passed, controlled and set by references within the Iterator callables. You can pass a reference of the Iterator itself within it’s own callable to access setting responses or using the Look Ahead method of the Iterator Class or manually setting the iterator count property to an offset.


That’s pretty much it. This project is in development and is to be used as a dependency for the next major Jikan release. It’s not limited to Jikan, it can be used on any website or file.

 

No documentation is available at the moment.


Links

Jikan News & Updates – Mid-2018

Okay, this news is almost a month old. Here goes.


Already 5 months into 2018 and I’ve already exciting news regarding Jikan. I wrote a post back in January – laying out the road map of Jikan for the current year. I had announced 4 more features that were to be done this year. I’ve completed 3 of them with User Related scraping to be done by the release of REST 2.3.

 

RELATED

 

Over the past year, Jikan has gained a huge traction, client and development wise. Here are the highlights of the past 6 months.

Jikan REST 2.2

With the release of REST 2.2, came many new features.

  1. More extended data for Anime and Manga (with the exception of reviews & recommendations – for now)
  2. Anime/Manga/People/Characters Search! This comes with advanced search filters and pagination support.
  3. Top Anime and Manga with advanced filters
  4. Season – To list the Anime airing this season and for other years/seasons.
  5. Schedule – Anime scheduling for the week for this season
  6. Meta – Experimental requests for getting usage stats for Jikan and most requested links by daily, weekly & monthly periods.

 

And some service changes.

  1. Jikan has moved domain to Jikan.moe. The previous (Jikan.me) domain has been discontinued.
  2. Jikan REST API is now being hosted in Tokyo (closer to MyAnimeList’s Tokyo server) by an awesome dude called Hibiki.

 

100% Jikan Open Source

That’s right. The entirety of Jikan has been open-sourced under MIT License. This includes the website, docs and REST API service.

This not only adds flexibility, but the code is easier to manage and deploy. There goes the days of patches having to wait till the next REST version. Now the RESTful services is updated as soon as a new JikanPHP version is out – this ofcourse will vary for major feature releases as I’ve to set up the controllers on the REST service.

 

Usage Stats

This is the Meta feature I mentioned.

 

It works by logging requests made in Redis and increasing the respective counters for that request. Here are some interesting usage links.

You can read more about the further usability.

 

Late 2018 Roadmap (REST 2.3)

So here’s a few stuff that will definitely be completed before the end of 2018. Perhaps in the upcoming months.

  • Top Characters/People
  • Anime/Manga Extended Data – Reviews & Recommendations
  • User Data – Profile, Watch History, Friends

 

Early 2019

This is given if the MyAnimeList’s new API hasn’t been publicly released yet or people haven’t started ditching Jikan.

  • JikanPHP (Core) – Rewrite. This will introduce JikanPHP 2.X.
    • Separation of the parser as an abstraction class for Requests & RegEx parsing
    • Faster Parsing – Rework Extended Requests.
  • Jikan REST 3.0 – Given the crazy amount of requests we’ve been gettings. The main problem is rate limiting from MyAnimeList since we’re making all these requests from one server, i.e one IP Address.
    • Rework Redis Database data caching
    • API Keys. Note: This won’t replace free, unmonitored GET requests. The current limit of 5,000 will be lowered down to encourage app/project developers to get an API key that will support higher rate limits.
    • Rework Extended Requests as separate API calls. This is a bottleneck right now as extended requests make 2 requests instead of one to merge the data for you into 1 request.
  • Relational data – Expand to other sites (maybe)

Playing with Browser Extensions

So I had to test out the usability of my REST API and what’s a better way than developing an app that does it? Of course, being limited with app development  skills,  I turned to something more easier that I, atleast, have some skills in; Browser Addons/Extensions.

Now one thing I learned was that developing a  browser addons is mostly the same as making a web application. Except you need to have a manifest file and have it compiled by your browser into a proper extension. That’s pretty much it.

Enter;Anime Info

The reason why it’s such a generic name was because I had thought of the possibility of releasing it into the wild market, free to download and use. So the name would easily be search-able and would get SEO points. But the thing is it’s been about a month and Opera’s market takes almost forever to review an Addon and Chrome and Firefox have you make an initial payment to start putting addons/apps on their market, and I don’t have an card available at the moment so ¯\_(ツ)_/¯.

screen1screen2screen3

I don’t plan on updating the addon, it was merely for searching up Anime and viewing it just so I could get the gist of browser addon development and to see how my REST API worked with it. The conclusion was pretty nice.

The whole project is open-source and available on Github: https://github.com/irfan-dahir/anime-info

Designed with Material Design in mind and developed with speed in mind, it’s probably the only addon out there for its’ purpose. (No kidding, I couldn’t find anything remotely close). Sure, it can be updated to allow users to login and even update their Anime and Manga lists and even view and search Mangas – but again, nah.