Katana VentraIP

Welcome to Wiki.gravy.cc

Picture what would happen if VentraIP built Wikipedia

Katana VentraIP

What is this site?

In short, wiki.gravy.cc is a katana site (generator). What do I mean? I mean that what you see here is generated using katana... But it wasn't generated by hand, this was a 20+ hour process across two developers to setup the scripts, web server and automation required to generate a katana equivalent of any wikipedia page you can visit.

It's not perfect, but read more to see the details




Total Pages Generated: 119547

Total Size On Disk: 8.4G

The Internal Process

I'll try to be quick, so you have time to look around, but below we've outlined what made the beautiful sites you are about to see.

Wikipedia Crawler

We have built a script that takes a starting page on wikipedia, and then downloads this page and starts the process. At the same time, it scrapes every link on the page and will queue it up for the next iteration of the loop. This is primitive, but should probably cover most of the pages. I have not limited the directory space that this project can take up so this could get bad. Luckily there is only so fast it can run.

Matchers

From the wikipedia page, we have to match patterns that we can use in katana. This is things like the primary header and first paragraph, or some h3 followed by paragraphs. This sounds easy but is anything but, If you want to know why, the github is available.

Generators

Once we have matched a section, we need to make the relevant API request to katana to generate the section. This is the same as what you would do in the browser manually, but wholly automated

Transformers

Now here is where the fun happens. When we made the original request, we actually included some placeholder text. Something like $_$_$DEEZ_NUTS__content--1$_$_$.

What is this? It's something we can match against when we get back the build katana page. With this in the returned result, we can replace these values in the string of text with the values that we want.

This was necessary because Katana doesn't accept custom links in all text fields, but what is wikipedia without all the links?

Upload

Now that we have an HTML file generated by katana, and with a couple of links and HTML elements injected, we send this off to my home server where I am running a web server. I made a couple of API endpoints so we can send this with a password and specify other meta information on the server

Serving the files

Now that we have generated the files, we have written a simple web server application that serves the files and handles fallback routing. This also takes the opportunity to check if a wiki page exists yet and if not, generate it for you while showing a redirect page. If you see this page, this is happening in real time.. That's right, building your page, from the latest wiki page, in under 5 seconds!

Cleanup

When the site is finished generating and put on the server, we need to cleanup. We take this chance to reset the Katana template to a basic template for the next iteration. Unfortunately this means any site on katana is reset to a base state, so in VIPControl, you can never see any useful katana site.

To help with some internal things, we also made several Katana services to ensure that we can do things concurrently, for example, generating the pages on request from the user as well as running the script that scrapes.

The poor souls in charge

I'm not kidding, we did nothing else for 3 days. If we don't win this competition, RIP my harddrives

Katana VentraIP

Ian Hogers

Did like at least 80% of the automation work

Katana VentraIP

Lleyton Morris

Knows a thing or two about web servers

Testimonials

This website is literally a masterpiece. These guys should win the site builder competition... or atleast get a pay rise!

The Primary Developer

Developer

I click on random links in this site for hours... And I didn't find a single PriceyBot quote! This is heaven

The Other Developer

Developer

When I wanted to go to sleep, they told me I had to stare at more random pages otherwise they'd pour water on my head while I sleep. This site is amazing and I am saying this of my own free will!

The Girlfriends

Girlfriend

Frequently Frequented Pages

While testing, there were a few sites that we used as a baseline, We are confident these atleast somewhat work as intended, but ultimately, we expect at least a bit of every site to work as intended. We also encourage you to replace wiki.gravy.cc with en.wikipedia.org and compare and see if you can work out how the pages are being converted

This one was the one that kept us up till 1am on Sunday night. It's not like your typical page, so we needed to make sure this was done before we could submit this.

So look, I know this is random, but this guy has more than 20 sections in his bio... And VIPSites only allows 20 FAQs at a time, so we had to generate two FAQ sections and combine them to get the full 22... This easily took the longest out of any page despite it being somewhat niche

Rabbits are cute! They were open for 100% of the process. Thank the cute bunnies!

Taylor Swift... There is no doubt that this page has the most content on it... It's really just a lot... You'll notice just how much we have left to do when you compare the original

It's worth noting that you can go to any page. While we did write a script that is probably crawling wikipedia as you read this, if the page doesn't exist, itll generate in real time when you try to access a valid wikipedia page.