.NET Fiddle learns Karate: strike first, strike hard, no mercy!
Secrets of .NET and best 80s movies
In this edition of the .NET Fiddle Newsletter, we will parse the HTML page with the latest DVD/VOD movie releases into structured and readable data.
This newsletter was brought to you by Karate Kid (1984)
This movie introduced the magic powers of Karate to American boys (and girls).
The next paragraph has spoilers:
“The Karate Kid” is about Daniel “Daniel-san” LaRusso who moves from NJ to California town and right away begins tormenting a local boy John “Johnny” Lawrence and his friends. First, he tries to steal Johnny’s long time girlfriend Ali. Then he befriends Karate Master Mr. Miyagi who helps him learn a dark and deadly form of Karate. After Daniel pulls one too many ill-advised pranks, Johnny decides to stop Daniel by challenging him to a fight in the All Valley Karate Tournament. Unfortunately, Daniel uses an illegal “Crane-kick” move to win the Tournament. Johnny is badly affected by the loss and starts drinking heavily.
And that’s where Netflix’s “Cobra Kai” series picks up. The third season is now available on Netflix and it is awesome. Happy New Year!!
Parsing The Internets
There are many APIs that make it easy to retrieve and operate on structured data. But some data is still only available on the web pages. To get that data you have to write an Html parser to … ummm... parse the HTML into structured data. The best tool for that is HtmlAgilityPack.
Here is the fiddle that parses the latest IMDB blue-ray releases:
https://dotnetfiddle.net/PqTY1A
NuGet package is available here:
https://www.nuget.org/packages/HtmlAgilityPack/
Enter the PuppeteerSharp
HtmlAgilityPack works great, but only when your page is statically rendered. Most pages these days execute a lot of Javascript on page load, especially if the page is a part of SPA in which case the whole page is rendered using JS. WebClient.DownloadString only gets the original page HTML before JS is executed. So to get the final rendered HTML you need to use the actual browser engine to render the page.
This is where PuppeteerSharp makes its big entrance with “You’re The Best Around” playing in the background.
NuGet Package: https://github.com/hardkoded/puppeteer-sharp
What this package does is actually download a version of Chrome and then “puppeteer” it. The library has methods to open browsers, create new pages, get page HTML, and even fill text boxes and click buttons.
PuppeteerSharp has everything you need to render dynamic HTML, which then can be parsed using HtmlAgilityPack.
Unfortunately, I can’t demo PuppeteerSharp using .NET Fiddle just yet, since it requires the installation of Chrome. But here is a little console app with a common pattern for using PuppeteerSharp:
Did you know that:
“Karate Kid” was directed by John G. Avildsen who also directed a little-known small-budget movie called “Rocky”.
Most recent posts:
.NET Fiddle searches for best Star Wars, but instead finds this...
Check out the new C# 9.0 features in .NET 5.
.NET Fiddle discovers that the only winning move is not to play
Use ML.NET to create a movie recommender
If you want to unsubscribe, click on the unsubscribe link at the bottom of this email, but remember to always keeps your hands up, because there will be No Mercy.