DRW: Getting back into the groove
Some Metaplex, paying to scrape weed prices, and Google Cloud disappointment
I took two weeks off. I’m back.
Metaplex
Initially, I wanted to make a line-by-line breakdown of Metaplex’s official developer guide. Instead, I followed the Solana blog’s instructions to create my own Metaplex storefront hosted with Github pages.
Up until this weekend, I was using an icon taken from Flaticon as Cluutch’s logo. In order to have an original piece to post on my storefront, I created a new logo using Gimp. A single NFT was created for the logo and is up for auction now. Oh and the logo is a black PNG with a transparent background. That’s why you don’t see anything in the preview.
I also cleaned the EXIF data from the logo before putting it up for auction. There did not really seem to be much sensitive information, but better safe than sorry.
Paying to scrape weed prices
We’ve taken things as far as they can go with play data. If we’re going to start building Solana apps that actually consume weed data from Cluutch, the underlying data needs to be improved. Web scraping is not an activity I want to spend time on so I gotta pay.
The requirements:
Scrape weedmaps.com as frequently as possible
Select all flower products
Extract price, merchant name, product name, etc
Here are the options I considered:
xbyte.io [No proposal] - They were very responsive to my initial email. But the spelling was off and phrasing was weird. I didn’t give them a shot and never replied.
Scrape Hero [$2,000 / year] - The first company to provide a quote. They were only willing to deliver one output per week of a sample of the website.
Zyte [$3,000 / year] - Daily crawl of full site.
Apify [?] - I honestly can’t remember if I got a quote from them. If I did, it was expensive. And their platform is not as full service as other options.
Octoparse [No proposal] - Like Apify, Octoparse would require me to set up some tooling to do the scrape.
Scrape Labs [No proposal] - I met with a sales rep for an initial call but never got a follow-up.
Top Tal [$5,000+] - Top Tal has a minimum engagement fee of $5,000. The scraping needed for this project does not require any advanced techniques to justify this spend.
I have chosen to go with Zyte. A contract has been signed and I’m waiting for the first sample dump. Hopefully by next week, Cluutch will have weed prices that are truly usable.
Paying too much for Google Cloud
I was very pleased with myself when I setup a no-code ETL pipeline on Google Cloud using Cloud Data Fusion. However, I did not read the fine print and was on track to spend over $1,000 per month on the service. That’s the cost of an entire Kubernetes cluster, so it didn’t occur to me it could be that high. Even if I reduce to the developer plan, the service will still cost $250 per month.
The pipeline is dead simple and only needs to:
Read data CSVs from Google Cloud Storage
Transform fields (eg parsing dates)
Write to table on Big Query
Run every time a CSV is added to GCS
It would probably take me a day to write a naive script on my computer that I manually re-run or something similar on Cloud Functions. This should take the cost down to less than $5 / month (GCS + Big Query costs). But my plan is to defer this problem by creating a new Google Cloud account. This comes with $300 in credit.
I will spend one month using Cloud Data Fusion then reasses.