-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 1f92f43
Showing
11 changed files
with
1,025 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def retrieve_from_web(url, user_agent, fname):\n", | ||
" request = urllib.request.Request(url, headers = {'User-Agent': user_agent})\n", | ||
" response = urllib.request.urlopen(request)\n", | ||
" html = response.read()\n", | ||
" fname = '/home/ashutosh/Desktop/WebCrawler/HTML/' + fname\n", | ||
" fp = open(fname, 'wb')\n", | ||
" fp.write(html)\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def read_html():\n", | ||
" fp = open('/home/ashutosh/Desktop/WebCrawler/HTML/medium_html', 'r')\n", | ||
" buff = fp.read()\n", | ||
" return buff" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"#from urllib.request import urlopen\n", | ||
"import urllib.request\n", | ||
"from bs4 import BeautifulSoup\n", | ||
"user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'\n", | ||
"url = 'https://medium.freecodecamp.org/'\n", | ||
"#retrieve_from_web(url, user_agent, 'medium_html')\n", | ||
"buff = read_html()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 32, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"<ol>\n", | ||
"<li><a href = https://medium.freecodecamp.org/the-mobile-app-launch-checklist-how-to-ship-apps-like-a-boss-84a20f5d8a45?source=collection_home---6------0--------------------->The Mobile App Launch Checklist — How to Ship Apps Like a Boss</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-master-async-await-with-this-real-world-example-19107e7558ad?source=collection_home---6------1--------------------->How To Master Async/Await With This Real World Example</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/here-are-some-super-secret-vs-code-hacks-to-boost-your-productivity-20d30197ac76?source=collection_home---6------2--------------------->Here are some super secret VS Code hacks to boost your productivity</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/removing-javascripts-this-keyword-makes-it-a-better-language-here-s-why-db28060cc086?source=collection_home---6------3--------------------->Removing JavaScript’s “this” keyword makes it a better language. Here’s why.</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/a-chaotic-mind-leads-to-chaotic-code-e7d6962777c0?source=collection_home---6------4--------------------->A chaotic mind leads to chaotic code</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/i-know-nothing-but-it-is-okay-6c0d9a4fe09f?source=collection_home---6------5--------------------->I know nothing, but it is okay</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/which-programming-language-should-you-learn-next-487d077baa32?source=collection_home---6------6--------------------->Which Programming Language Should You Learn Next?</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-create-a-discord-bot-under-15-minutes-fb2fd0083844?source=collection_home---6------7--------------------->How to create a Discord bot under 15 minutes</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-go-from-scratch-to-create-react-app-on-windows-a8a24687d595?source=collection_home---6------8--------------------->How to go from scratch to Create-React-App on Windows</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-i-built-an-async-form-validation-library-in-100-lines-of-code-with-react-hooks-81dbff6c4a04?source=collection_home---6------9--------------------->How I built an async form validation library in ~100 lines of code with React Hooks</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/introducing-abs-a-programming-language-for-shell-scripting-dfbd737d621?source=collection_home---6------10--------------------->Introducing ABS, a programming language for shell scripting</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-write-a-better-cv-the-web-developer-edition-6d27f37d4e67?source=collection_home---6------11--------------------->How to write a better CV— the Web Developer edition</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/the-react-handbook-b71c27b0a795?source=collection_home---6------12--------------------->The React Handbook</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/simple-site-hosting-with-amazon-s3-and-https-5e78017f482a?source=collection_home---6------13--------------------->Simple site hosting with Amazon S3 and HTTPS</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-host-a-static-website-with-s3-cloudfront-and-route53-7cbb11d4aeea?source=collection_home---6------14--------------------->How to Host a Static Website with S3, CloudFront and Route53</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-publish-an-application-in-the-play-store-8ddcc6dc3587?source=collection_home---6------15--------------------->How to Publish An Application In The Play Store</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/the-strategy-pattern-explained-using-java-bc30542204e0?source=collection_home---6------16--------------------->The Strategy Pattern explained using Java</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-calculate-binary-tree-height-with-the-recursive-method-aafc461f2201?source=collection_home---6------17--------------------->How to calculate Binary Tree height with the recursive method</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/i-landed-an-internship-at-facebook-here-are-some-tips-i-learned-b83685cde27?source=collection_home---6------18--------------------->I landed an internship at Facebook. Here are some tips I learned.</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/essential-gems-for-rails-applications-75fed43d2798?source=collection_home---6------19--------------------->Essential Gems for Rails Applications</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/securing-managing-secrets-using-google-cloud-kms-3fe08c69f499?source=collection_home---6------20--------------------->How to secure and manage secrets using Google Cloud KMS</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/how-to-pass-oracles-java-certifications-a-practical-guide-for-developers-e9b607ba6173?source=collection_home---6------21--------------------->How to Pass Oracle’s Java Certifications — a Practical Guide for Developers</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/master-the-art-of-looping-in-javascript-with-these-incredible-tricks-a5da1aa1d6c5?source=collection_home---6------22--------------------->Master the art of looping in JavaScript with these incredible tricks</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/the-art-of-asking-questions-84c01c9987a4?source=collection_home---6------23--------------------->The art of asking questions</a></li>\n", | ||
"<li><a href = https://medium.freecodecamp.org/the-definitive-guide-to-contributing-to-open-source-900d5f9f2282?source=collection_home---6------24--------------------->The Definitive Guide to Contributing to Open Source</a></li>\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"5748" | ||
] | ||
}, | ||
"execution_count": 32, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import time\n", | ||
"soup = BeautifulSoup(buff, \"html.parser\")\n", | ||
"#print(soup.prettify())\n", | ||
"all_news = soup.find_all('a')\n", | ||
"#print(all_news[0])\n", | ||
"#print(type(all_news))\n", | ||
"#html_links = \"<n_links>\"\n", | ||
"html_links =\"<ol>\"\n", | ||
"for news in all_news:\n", | ||
" head = news.find('h3')\n", | ||
" if head:\n", | ||
" #<a href=\"https://www.w3schools.com/html/\">Visit our HTML tutorial</a> \n", | ||
" lnks = \"<li><a href = {0}>{1}</a></li>\".format(news.get('href'), head.text)\n", | ||
" html_links = html_links + \"\\n\" + lnks\n", | ||
" #print((news.get('href')))\n", | ||
" #print(type(head))\n", | ||
" #print(head.attrs)\n", | ||
" #print(head.text)\n", | ||
"print(html_links)\n", | ||
"html_links = html_links + \"</ol>\"\n", | ||
"fname = '/home/ashutosh/Desktop/WebCrawler/result/'+ str(time.strftime(\"%y-%m%-d\")) + \".html\"\n", | ||
"fp = open(fname, 'w')\n", | ||
"fp.write(html_links)\n", | ||
"#print(type(par))\n", | ||
"#print(par)\n", | ||
"#print((all_news[0].parent.name))\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"\n", | ||
"\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.5.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"cells": [], | ||
"metadata": {}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 39, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import logging\n", | ||
"#for handler in logging.root.handlers[:]:\n", | ||
"# logging.root.removeHandler(handler)\n", | ||
"\n", | ||
"logging.basicConfig(filename = \"wb.log\", format = '%(asctime)s-%(levelname)s - %(message)s', level=logging.INFO, filemode = 'w')\n", | ||
"log = logging.getLogger(__name__)\n", | ||
"#log.setLevel(20)\n", | ||
"log.info(\"logging outputr\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 34, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.5.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sjsjj |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
root - INFO - s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
<ol> | ||
<li><a href = https://medium.freecodecamp.org/the-mobile-app-launch-checklist-how-to-ship-apps-like-a-boss-84a20f5d8a45?source=collection_home---6------0--------------------- style="color: black">The Mobile App Launch Checklist — How to Ship Apps Like a Boss</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-master-async-await-with-this-real-world-example-19107e7558ad?source=collection_home---6------1--------------------->How To Master Async/Await With This Real World Example</a></li> | ||
<li><a href = https://medium.freecodecamp.org/here-are-some-super-secret-vs-code-hacks-to-boost-your-productivity-20d30197ac76?source=collection_home---6------2--------------------->Here are some super secret VS Code hacks to boost your productivity</a></li> | ||
<li><a href = https://medium.freecodecamp.org/removing-javascripts-this-keyword-makes-it-a-better-language-here-s-why-db28060cc086?source=collection_home---6------3--------------------->Removing JavaScript’s “this” keyword makes it a better language. Here’s why.</a></li> | ||
<li><a href = https://medium.freecodecamp.org/a-chaotic-mind-leads-to-chaotic-code-e7d6962777c0?source=collection_home---6------4--------------------->A chaotic mind leads to chaotic code</a></li> | ||
<li><a href = https://medium.freecodecamp.org/i-know-nothing-but-it-is-okay-6c0d9a4fe09f?source=collection_home---6------5--------------------->I know nothing, but it is okay</a></li> | ||
<li><a href = https://medium.freecodecamp.org/which-programming-language-should-you-learn-next-487d077baa32?source=collection_home---6------6--------------------->Which Programming Language Should You Learn Next?</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-create-a-discord-bot-under-15-minutes-fb2fd0083844?source=collection_home---6------7--------------------->How to create a Discord bot under 15 minutes</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-go-from-scratch-to-create-react-app-on-windows-a8a24687d595?source=collection_home---6------8--------------------->How to go from scratch to Create-React-App on Windows</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-i-built-an-async-form-validation-library-in-100-lines-of-code-with-react-hooks-81dbff6c4a04?source=collection_home---6------9--------------------->How I built an async form validation library in ~100 lines of code with React Hooks</a></li> | ||
<li><a href = https://medium.freecodecamp.org/introducing-abs-a-programming-language-for-shell-scripting-dfbd737d621?source=collection_home---6------10--------------------->Introducing ABS, a programming language for shell scripting</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-write-a-better-cv-the-web-developer-edition-6d27f37d4e67?source=collection_home---6------11--------------------->How to write a better CV— the Web Developer edition</a></li> | ||
<li><a href = https://medium.freecodecamp.org/the-react-handbook-b71c27b0a795?source=collection_home---6------12--------------------->The React Handbook</a></li> | ||
<li><a href = https://medium.freecodecamp.org/simple-site-hosting-with-amazon-s3-and-https-5e78017f482a?source=collection_home---6------13--------------------->Simple site hosting with Amazon S3 and HTTPS</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-host-a-static-website-with-s3-cloudfront-and-route53-7cbb11d4aeea?source=collection_home---6------14--------------------->How to Host a Static Website with S3, CloudFront and Route53</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-publish-an-application-in-the-play-store-8ddcc6dc3587?source=collection_home---6------15--------------------->How to Publish An Application In The Play Store</a></li> | ||
<li><a href = https://medium.freecodecamp.org/the-strategy-pattern-explained-using-java-bc30542204e0?source=collection_home---6------16--------------------->The Strategy Pattern explained using Java</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-calculate-binary-tree-height-with-the-recursive-method-aafc461f2201?source=collection_home---6------17--------------------->How to calculate Binary Tree height with the recursive method</a></li> | ||
<li><a href = https://medium.freecodecamp.org/i-landed-an-internship-at-facebook-here-are-some-tips-i-learned-b83685cde27?source=collection_home---6------18--------------------->I landed an internship at Facebook. Here are some tips I learned.</a></li> | ||
<li><a href = https://medium.freecodecamp.org/essential-gems-for-rails-applications-75fed43d2798?source=collection_home---6------19--------------------->Essential Gems for Rails Applications</a></li> | ||
<li><a href = https://medium.freecodecamp.org/securing-managing-secrets-using-google-cloud-kms-3fe08c69f499?source=collection_home---6------20--------------------->How to secure and manage secrets using Google Cloud KMS</a></li> | ||
<li><a href = https://medium.freecodecamp.org/how-to-pass-oracles-java-certifications-a-practical-guide-for-developers-e9b607ba6173?source=collection_home---6------21--------------------->How to Pass Oracle’s Java Certifications — a Practical Guide for Developers</a></li> | ||
<li><a href = https://medium.freecodecamp.org/master-the-art-of-looping-in-javascript-with-these-incredible-tricks-a5da1aa1d6c5?source=collection_home---6------22--------------------->Master the art of looping in JavaScript with these incredible tricks</a></li> | ||
<li><a href = https://medium.freecodecamp.org/the-art-of-asking-questions-84c01c9987a4?source=collection_home---6------23--------------------->The art of asking questions</a></li> | ||
<li><a href = https://medium.freecodecamp.org/the-definitive-guide-to-contributing-to-open-source-900d5f9f2282?source=collection_home---6------24--------------------->The Definitive Guide to Contributing to Open Source</a></li></ol> |
Oops, something went wrong.