I use Careem. A lot. The most I’ve used it was until earlier this year for booking rides to and from my College. It’s one of the most affordable modes of transport in Pakistan right now.
For those who don’t know what Careem is, it’s a ride hailing service like Uber.
As a fan of data, I really wanted to know how much I’ve spent, covered distance, etc, over the period I’ve used Careem. Unfortunately, there’s no public API made available by Careem. There is something of the likes however you have to contact them for access and I haven’t got a reply yet. 😦
With my experience in building a successful REST API service, I decided to turn to reverse engineer their website dashboard for users. This dashboard allows you to book rides online and shows you your ride history.
My initial thought was web scraping but that required me to create a scraper that authenticated through their web form, bypass their CSRF protection and that hidden Google ReCaptcha. If I could that by passing around headers and sessions, this would’ve allowed me to develop an unofficial API but ain’t nobody got time fo dat. What I really wanted was my rides data.
One way to do so is to go to the rides page and intensly click on “Load More” until you get a good amount to “select all” and click “Export as CSV”.
This method wasn’t really cutting out to be efficient, I needed something that’d be much more faster. And I got something really interesting that output a lot more data than what they’d want to show on the frontend.
Careem’s frontend, like any other, gets it’s data from their server via GET or POST requests. I popped up the handy Chrome Dev Tools and performed a network monitor for any requests being made as I clicked on the “Show More Rides” button – since it updates the DOM (updates the page with more data without reloading).
And this brought my attention to the following request:
This request fetches the most recent 10 rides. This is why when you check your most recent rides on the dashboard or the Careem App, it only shows 10 at a time.
Let’s look at some useful parameters being thrown in there.
limit, where the former is the starting offset of rides and the limit is how many I’d want to fetch. There’s a few other request props such as
key. Both of them were of no use and could be removed from the request.
Careem makes this request via POST which means you shouldn’t be able to view it via your browser URL like I did (screenshot below).
But it works anyways, so why not ¯\_(ツ)_/¯
Anywho, as you can see. There’s 10 items in the data array. This is my 10 recent rides. I don’t know what’s up with the other properties like
success. I don’t know why they’re null when this request is obviously successful. Possibly leftover development code.
So I successfully went up to requesting from 10 rides to 100 rides data in a single request. My GET request would be this
Oh also, I do need to mention if you’re not aware, you have to be logged in to view this URL.
And for some reason this sometimes returned an error and didn’t work out if you give it a large range. e.g 200 rides. Would respond with the following:
Downloading my data
I only needed to make 2 requests and download the content since I only had 380 rides. The data is in a proper data structure format called JSON, which makes the work for me so much easier. All my rides were in a property called
data, which was an array, in both files and I just had to write a simple script to merge them both into one.
This is the COLLAPSED (not all the info as you can see most of the arrays are collapsed) data structure. Pretty convenient.
What can I do with this data?
Now I have this huge dataset telling me about anything and everything about the ride. This includes trip pricing break downs, dropoff and pickoff metadata and coordiantes, total price, distance covered, waiting time, in route time, when the driver arrived, how long the driver had to wait, when we reached our destination, whether I (the client) or the driver is verified, whether my ride was waived, client data, driver data, car data, car type and A WHOLE LOT more.
Introducing Careem Analyzer
composer require irfan/careem-rides-analysis
I spent the next few hours developing a small PHP parser to read the important parts of data available, which is open source and available here. Visit that link to read more about what kind of data you can access for each ride.
And now I was finally able to produce a POC, called in the library and looped through my rides to sum how much I had used Careem.
Rs. 40,731 and 2502.82 Km.
Wow that’s a lot… less. Compared to other direct modes of transport such as Rickshaws, that is. An estimate 45% in savings had I taken a Rickshaw for my college transport. Possibly a lot more, this is just a very basic calculation.
So let’s go deeper. I’ve made an example.php file which uses the parser library to analyze the data and even create a CSV with every driver’s info.
Here’s the output for my rides:
Total Rides: 380
Total Spent: Rs. 40,731
Total Distance: 2502.8235 Km
Average price per km: 16.274 Per Km
Traveled in: GO, Bike, Go Mini, GO+ car types
Waived Rides: 17
Avg. In Journey Wait Time: 4.98 min
Avg. Initial Wait Time: 1.97 min
Total In Journey Wait Time: 1886 min
Total Initial Wait Time: 746 min
Car Type: “GO”
Rides: 217 ride(s)
Total Spent: Rs. 25742.9
Avg Price/Ride: Rs. 118.63 /Ride
Avg Price/Km: Rs. 17.67 /Km
Avg. Distance/Ride: 6.71 Km/Ride
Avg. Duration/Ride: 15.33 Min/Ride
Total Distance: 1457 Km
Total Duration: 55.45 Hours
First Ride: Sunday Feb 5, 2017 at 9.38am
Car Type: “Bike”
Rides: 64 ride(s)
Total Spent: Rs. 2175.73
Avg. Price/Ride: Rs. 34 /Ride
Avg. Price/Km: Rs. 5.35 /Km
Avg. Distance/Ride: 6.35 Km/Ride
Avg. Duration/Ride: 14.98 Min/Ride
Total Distance: 406 Km
Total Duration: 15.98 Hours
First Ride: Monday Mar 19, 2018 at 8.34am
Car Type: “Go Mini”
Rides: 1 ride(s)
Total Spent: Rs. 87.93
Avg. Price/Ride: Rs. 87.93 /Ride
Avg. Price/Km: Rs. 27.36 /Km
Avg. Distance/Ride: 3.21 Km/Ride
Avg. Duration/Ride: 13.77 Min/Ride
Total Distance: 3 Km
Total Duration: 0.23 Hours
First Ride: Sunday Aug 12, 2018 at 8.13pm
Car Type: “GO+”
Rides: 98 ride(s)
Total Spent: Rs. 12726.96
Avg. Price/Ride: Rs. 129.87 /Ride
Avg. Price/Km: Rs. 20 /Km
Avg. Distance/Ride: 6.49 Km/Ride
Avg. Duration/Ride: 15.06 Min/Ride
Total Distance: 636 Km
Total Duration: 24.59 Hours
First Ride: Friday Mar 24, 2017 at 8.34am
And here’s some cool graphs from the CSV dump for driver info.
Do note, the data is messy and I’m not bothering to tidy up because this is PoC only and I’m bored of this project already.
Is this a security risk for Careem?
With what happened earlier this year in mind, I doubt this is a security risk as it’s not a hack. I don’t have any other person’s ride data available except for mines and mines only.
But there is a slight concern I had with the data that was available for each driver. Although the data is displayed on the frontend as well, in this bulk amount it could be very useful for marketing or something.
Nevertheless, I feel that Careem possibly puts too much driver information in the hands of the client. Although it’s understandable why might be needed for the client, such as losing something in their car or wanting to report the driver. But the data ranges all the way back to your initial ride. And that’s something to think about.
That’s all for this post and project.