Freak Posted June 20, 2019 Share Posted June 20, 2019 As part of larger project, I found myself wanting to create a database of as many DnD monsters as possible. So I set out to write a program that would take them off of public wiki pages so that it could do all of the work for me. The wiki I used was dandwiki and specifically their monster list for 5e. The database isn't particularly useful on its own since it would be impractical to try to read these database items yourself, and like I said, it is part of a larger project. But it could still be useful if you're playing without internet and needed a reference, and it's got a lot of cool regex in it to look at and learn from. Please note that they do have a "terms and conditions for non-human visitors" page here which my program is fully compliant with. I even put a 3 second interval in between page requests so as to be less bothersome. Additionally any database created as a result of running my program will be licensed under the GNU Free Documentation License v1.3 which is available here. Now that the legal nonsense mumbo jumbo is out of the way, the program works by requesting the page for their monster list, then goes through all hyperlinks within that page and requests the pages for any of them that have "(5e_Creature)" in the URL, then uses a ton of regular expressions to find all of the relevant data. It then places that data into this configuration in a .csv file: Note that, by the nature of how this program was written and its source, many of the resulting database entries are likely to be improperly formatted or missing elements. There are simply too many entries for me to manually check if they've been properly processed, and many of the wiki entries have inconsistent formatting. That said, I've written the program to be as flexible as possible and fixed many issues while writing the program, so hopefully even inconsistencies that I'm unaware of should be properly processed. Here is the code: (the indentation messed up a tiny bit) Spoiler ##!/usr/bin/perl -w use strict; use HTML::Entities; use LWP::UserAgent; my($ua, $response, $mainContents, $thisLink, $thisContent); $ua = LWP::UserAgent->new( protocols_allowed => ['http', 'https'], timeout => 10, agent => "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0", #Necessary otherwise 403 forbidden. ); $response = $ua->get('https://www.dandwiki.com/wiki/5e_Monsters'); if($response->is_success){ $mainContents = $response->decoded_content; }else{ die $response->status_line; } my $count = 0; while($mainContents =~ /<a href=\"(.*?)\"/g){ $thisLink = $1; if($thisLink =~ /\(5e_Creature\)/){ sleep(3); $response = $ua->get('https://www.dandwiki.com'.$thisLink); if($response->is_success){ $thisContent = $response->decoded_content; if($thisLink =~ /wiki\/(.*?)_\(5e_Creature\)/){ my @abilities = (); my @features = (); my @actions = (); my @legendary = (); my @reactions = (); open(DATA, ">$1.csv"); #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/<p><i>(.*?)<\/i>/){print DATA "\"".rmHTML($1)."\",";}else{print DATA ",";} #Size/Alignment if($thisContent =~/Armor Class<\/b>(.*?)(\d+)(\s?|\S?)/){print DATA "\"".$2."\",";}else{print DATA ",";} #AC if($thisContent =~/Hit Points<\/a><\/b>(.*?)(\d+)(\s?|\S?)/){print DATA "\"".$2."\",";}else{print DATA ",";} #HP if($thisContent =~/Speed<\/a><\/b>(.*?)(\n|<)/){print DATA "\"".rmCh($1)."\",\n";}else{print DATA ",\n";} #Speed #----------------------------------------------------------------------------------------------------------------------- while($thisContent =~/\n<td>(.*?)\((.*?)\)/g){print DATA "\"".$1."\",";} #Ability scores print DATA "\n"; #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Strength(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",";} #Str save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Str|Strength)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",";} else{print DATA ",";} if($thisContent =~/Dexterity(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",";} #Dex save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Dex|Dexterity)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",";} else{print DATA ",";} if($thisContent =~/Constitution(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",";} #Con save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Con|Constitution)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",";} else{print DATA ",";} if($thisContent =~/Intelligence(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",";} #Int save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Int|Intelligence)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",";} else{print DATA ",";} if($thisContent =~/Wisdom(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",";} #Wis save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Wis|Wisdom)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",";} else{print DATA ",";} if($thisContent =~/Charisma(.*?)<\/a> \+( \d+|\d+)/){print DATA "\"".$2."\",\n";} #Cha save elsif($thisContent =~/Saving Throws<\/b>(.*?)(Cha|Charisma)(\s*?)\+(\s*?)(\d+)/){print DATA "\"".$5."\",\n";} else{print DATA ",\n";} #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/>(\s*?)Skills(.*?)((\s*?)|(.?))<b>/){print DATA "\"".rmHTML($2)."\",\n";}else{print DATA ",\n";} #Skills #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Vulnerabilities(\s*?)<\/a><\/b>(.*?)<br/){print DATA "\"".rmCh($2)."\",\n";}else{print DATA ",\n";} #Damage Vulnerabilities #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Resistances(\s*?)<\/a><\/b>(.*?)<br/){print DATA "\"".rmCh($2)."\",\n";}else{print DATA ",\n";} #Damage Resistances #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/(Dmg |Damage )Imm(.*?)<\/b>(.*?)<br/){print DATA "\"".rmCh($3)."\",\n";}else{print DATA ",\n";} #Damage Immunities #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/(Con |Condition )Imm(.*?)<\/b>(.*?)<br/){print DATA "\"".rmHTML($3)."\",\n";}else{print DATA ",\n";} #Condition Immunities #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Senses(.*?)<\/b>(.*?)(\s*?)<br/){print DATA "\"".rmHTML($2)."\",\n";}else{print DATA ",\n";} #Senses #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Languages(.*?)<\/b>(.*?)(\s*?)<br/){print DATA "\"".rmCh($2)."\",\n"}else{print DATA ",\n";} #Languages #----------------------------------------------------------------------------------------------------------------------- if($thisContent =~/Challenge(\s*?)<\/b>(.*?)(\s*?)<br/){print DATA "\"".rmCh($2)."\",\n"}else{print DATA ",\n";} #Challenge #----------------------------------------------------------------------------------------------------------------------- my $temp = ""; #Grab all Abilities/Features while($thisContent =~/<p><i><b>((.|\s)*?)<\/p>/g){#Good ones $temp = $1; chomp($temp); push(@abilities, $temp); } $temp = ""; while(shorten($thisContent) =~/ACTION((.|\s)*?)\Q$temp\E((.|\s)*?)<b>(.+)(\s*)(<b>|<\/p>)((.|\s)+)/g){#Stupid ones $temp = $5; if(not grep(/\Q$temp\E/, @abilities)){ push(@abilities, $temp); } } #----------------------------------------------------------------------------------------------------------------------- foreach my $i (@abilities){ #Sort all Abilities/Features if($thisContent =~/<\/p>(\s*)<\/td>(\s*)<td((.|\s)+)\Q$i\E/){ #Then it's flavor text beneath a picture. Don't include. #Not convinced that this will always work. }elsif($thisContent =~/LEGENDARY ACTIONS((.|\s)+)\Q$i\E/){ push(@legendary, rmHTML($i)); }elsif($thisContent =~/REACTIONS((.|\s)+)\Q$i\E/){ push(@reactions, rmHTML($i)); }elsif($thisContent =~/ACTIONS((.|\s)+)\Q$i\E/){ push(@actions, rmHTML($i)); }else{ push(@features, rmHTML($i)); } } #----------------------------------------------------------------------------------------------------------------------- foreach my $i (@features){print DATA "\"".$i."\",";} #Features print DATA "\n"; #----------------------------------------------------------------------------------------------------------------------- foreach my $i (@actions){print DATA "\"".$i."\",";} #Actions print DATA "\n"; #----------------------------------------------------------------------------------------------------------------------- foreach my $i (@reactions){print DATA "\"".$i."\",";} #Reactions print DATA "\n"; #----------------------------------------------------------------------------------------------------------------------- foreach my $i (@legendary){print DATA "\"".$i."\",";} #Legendary Actions print DATA "\n"; #----------------------------------------------------------------------------------------------------------------------- close(DATA); } }else{ die $response->status_line; } } } sub rmHTML{ my $newString = $_[0]; $newString =~s/<(.*?)>//g; $newString =~s/<(.*)//g; $newString =~s/(.*)>//g; return rmCh($newString); } sub rmCh{ my $newString = $_[0]; $newString =~s/,/;/g; $newString =~s/"/'/g; $newString =~s/[^[:ascii:]]+//g; return decode_entities($newString); } sub shorten{ my $content = $_[0]; $content =~s/((.|\s)+)Challenge(\s*?)<\/b>(.*?)(\s*?)<br//; $content =~s/<li ((.|\s)+)//g; $content =~s/<p><i><b>((.|\s)*?)<\/p>//g; $content =~s/<(.*?)div(.*?)>//g; $content =~s/<(.*?)table(.*?)>//g; $content =~s/<(.*?)span(.*?)>//g; return $content; } And here is a link to the download for the database so that you don't have to run this and bother the website owners: https://megaupload.nz/D7WdZbvfn4/Monsters_rar 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now