GitHub - bas24/contentscraper: Simple scraper of the content of specific element on web page.

Simple scraper of the content of specific element on web page.

Install:
go get github.com/bas24/contentscraper

Example usage:

package main

import(
	"fmt"
	cs "github.com/bas24/contentscraper"
)

func main(){
	// Getting the content of all <p> tags,
	// from this html sample:
	// <div itemprop="articleBody" class="article">
	//  <p>Text</p>
	//  <p>to</p>
	//  <p>scrape.</p>
	// </div>
	// Output: "Text to scrape."

	// Minimum number of characters including 
	// whitespaces in <p> tag to be scraped.
	// If you want all content just pass 0.
	minLength := 10

	txt, err := cs.Scrape(url, "div", "p", minLength, "itemprop", "articleBody")
	// or more simple - just cs.Scrape(url, "div", "p", minLength)
	// if you don't want to specify attrs of the tag
	// or you scrape tags without attrs
	// like <div><p>...</p><p>...</p></div> 

	if err != nil {
		fmt.Println(err)
	}
	
	fmt.Println(txt)
}

"Better not very nice code than no code!"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
contentscraper.go		contentscraper.go
contentscraper_test.go		contentscraper_test.go
utils.go		utils.go
utils_test.go		utils_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

bas24/contentscraper

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages