GCP Instance with NVIDIA Tesla T4

Context Many interesting projects now require a modern GPU (or M1, but I’m not desperate enough to downgrade from Linux to OSX). Below are notes on how to spin up a VM instance with GPU in GCP and run a basic PyTorch workload. I chose Watermark-Removal-Pytorch. Cost/Performance After studying the available GPU configurations as well as [VM instance pricing][https://cloud.google.com/compute/vm-instance-pricing] I determined that the most affordable Accelerator optimized configuration (~$250/month) is the N1 + nvidia-tesla-t4 (I operated in the europe-west1 region).

Read more...

Next Thoughts

In 2022, I launched www.4ks.io, a recipe editing and forking website. I used React and ViteJS v2 for the front-end and was really impressed with its performance, particularly its quick Time to Interactive (TTI) metrics. Recently, I began exploring NextJS, initially with version 13 and then upgrading to Next 14, to integrate server-side rendering (SSR) for better SEO and providing better i18n tooling. The performance with NextJS was on par with ViteJS, but I ran into some challenges, such as duplicate API calls during SSR.

Read more...

Dev K8s Options

Minikube, KinD (Kubernetes in Docker), and k3d (K3s in Docker) are all tools for running Kubernetes clusters locally, primarily for development purposes. My personal experience with all 3 has been very positive. For the last couple of years I’ve been operating on Fedora linux and have been keeping up with the latest releases. Originally I used minikube, but switched over to k3d whe DNS issues prevented it from reaching docker hub.

Read more...

Cloud Load Balancer Cost Savings

Context In an effort to learn more about the Google Cloud Platform, I built and deployed a website using Cloud Run functions, hosted behind a GCP Load Balancer. The performance was great. Even without conducting any performance or benchmark tests, I observed that the website was very responsive globally, as relayed by a family member in Singapore. Unfortunately, the costs were not as favorable. The cost of running a single load balancer was about ~$25 per month.

Read more...

Gorm UUID Many to Many

Hasham Ali’s How to Use UUID Key Type with Gorm article was terrific for suggesting how to handle using UUID as the ID in gorm. It took a little more fiddling to be able to use the keys in a many-to-many relationship. In the end, it worked by having to explicitly define the join table and the foreign key constraints. Sample code is below. import ( "time" "github.com/google/uuid" "gorm.io/gorm" ) // BaseAttributes contains common columns for all tables.

Read more...

Go Runtime Frames

Both the go-kit/log and rs/zerolog loggers provide a Caller method that returns the caller of the function that called it. This is useful for logging the function name in the log output. This functionality is immensly useful and roused my curiosity as to how it is implemented. zerolog logger caller example import "github.com/rs/zerolog" import "github.com/rs/zerolog/log" func main() { log.Logger = log.With().Caller().Logger() // <-- log.Debug().Str("foo", "bar").Msg("This will be logged with a caller") } go-kit logger caller example

Read more...

History AI - Image Duplicates and Distribtion

Scraping Results This table shows some metadata about the images scraped. Prefix Size (GB) Images Distinct Images Duplicate Images Duplicate Images % A 12 71077 48126 22951 32.3% B 456 1672477 1667500 4977 0.3% C 48 290248 278891 11357 3.9% D 29 122001 121977 24 0.0% E 29 212701 209391 3310 1.6% F 5 40301 40301 0 0.0% G 0.04 216 215 1 0.5% ———— ————— —————- ——————- ————– —————- Total 579 2409021 2366401 42620 0 The scraping process resulted in 2.

Read more...

CockroachDB Local

update 2024-01-11 While not central to this article the use of the CRC32 hash in the code below is noticeable. Since writing this article I learned that the CRC32, particularly the CRC32C variant used by Google Cloud Storage (GCS), is optimized for error detection, not as a unique identifier for data. It has a higher probability of collisions (1 in 4.3 billion) compared to more robust algorithms. To overcome these limitations, SHA-256, a more robust hashing algorithm, is recommended.

Read more...

History AI - Part IV: Computer Vision

I have 2,000,000 images which all containt a watermark pattern. This post will explore options for removing the watermarks in order to improve the quality of the OCR operations to follow. 1) Skipping Watermark Removal The cheapest option in terms of time and resources is to skip watermark removal altogether. This can be done by filtering out the known watermark text from the OCR results. This is the best short-term solution, as it is relatively easy to implement and does not require any additional software.

Read more...

History AI - Part III: Scraping

Target The target site is completely free and public. While the site’s performance is sufficient it unfortunately isn’t well maintained: SSL cert is expired. Luckily the sought after information is available directly via REST calls. No html parsing necessary. Process The scraping process was performed on a Cloud Compute, Regular Performance, $5/month VM on Vultr.com. The attached 120GB block storage was quickly expanded to 500GB, which increased the cost from $3.

Read more...
1 of 2 Next Page