从 Java 程序搜索 Google 示例

有一段时间,我一直在寻找一种使用Java程序搜索谷歌的方法。我很惊讶地看到谷歌有一个网络搜索API],但它很早就被弃用了,现在没有标准的方法来实现这一点。Google搜索基本上是一个HTTP get请求,其中查询参数是URL的一部分,前面我们已经看到有不同的选项来执行此搜索,如Java HttpUrlConnection或[ApacheHttpClient Example to Send GET/POST HTTP Request.)。但问题更多地与解析HTML响应并从中获取有用信息有关。这就是为什么我选择使用[jsoup](/community/tutorials/jsoup-java-html-parser(),它是一个开源的HTML解析器,能够从给定的URL获取HTML.下面是一个简单的程序,它可以在Java程序中获取Google搜索结果,然后对其进行解析以找出搜索结果。

 1package com.journaldev.jsoup;
 2
 3import java.io.IOException;
 4import java.util.Scanner;
 5
 6import org.jsoup.Jsoup;
 7import org.jsoup.nodes.Document;
 8import org.jsoup.nodes.Element;
 9import org.jsoup.select.Elements;
10
11public class GoogleSearchJava {
12
13    public static final String GOOGLE_SEARCH_URL = "https://www.google.com/search";
14    public static void main(String[] args) throws IOException {
15    	//Taking search term input from console
16    	Scanner scanner = new Scanner(System.in);
17    	System.out.println("Please enter the search term.");
18    	String searchTerm = scanner.nextLine();
19    	System.out.println("Please enter the number of results. Example: 5 10 20");
20    	int num = scanner.nextInt();
21    	scanner.close();
22    	
23    	String searchURL = GOOGLE_SEARCH_URL + "?q="+searchTerm+"&num="+num;
24    	//without proper User-Agent, we will get 403 error
25    	Document doc = Jsoup.connect(searchURL).userAgent("Mozilla/5.0").get();
26    	
27    	//below will print HTML data, save it to a file and open in browser to compare
28    	//System.out.println(doc.html());
29    	
30    	//If google search results HTML change the <h3 class="r" to <h3 class="r1"
31    	//we need to change below accordingly
32    	Elements results = doc.select("h3.r > a");
33
34    	for (Element result : results) {
35    		String linkHref = result.attr("href");
36    		String linkText = result.text();
37    		System.out.println("Text::" + linkText + ", URL::" + linkHref.substring(6, linkHref.indexOf("&")));
38    	}
39    }
40
41}

下面是上面程序的输出示例,我将HTML数据保存到文件中,并在浏览器中打开以确认输出,这是我们想要的。将输出与下图进行比较。Google Search API Java,Java Google Search,Google Search Java program

 1Please enter the search term.
 2journaldev
 3Please enter the number of results. Example: 5 10 20
 420
 5Text::JournalDev, URL::=https://www.journaldev.com/
 6Text::Java Interview Questions, URL::=https://www.journaldev.com/java-interview-questions
 7Text::Java design patterns, URL::=https://www.journaldev.com/tag/java-design-patterns
 8Text::Tutorials, URL::=https://www.journaldev.com/tutorials
 9Text::Java servlet, URL::=https://www.journaldev.com/tag/java-servlet
10Text::Spring Framework Tutorial ..., URL::=https://www.journaldev.com/2888/spring-tutorial-spring-core-tutorial
11Text::Java Design Patterns PDF ..., URL::=https://www.journaldev.com/6308/java-design-patterns-pdf-ebook-free-download-130-pages
12Text::Pankaj Kumar (@JournalDev) | Twitter, URL::=https://twitter.com/journaldev
13Text::JournalDev | Facebook, URL::=https://www.facebook.com/JournalDev
14Text::JournalDev - Chrome Web Store - Google, URL::=https://chrome.google.com/webstore/detail/journaldev/ckdhakodkbphniaehlpackbmhbgfmekf
15Text::Debian -- Details of package libsystemd-journal-dev in wheezy, URL::=https://packages.debian.org/wheezy/libsystemd-journal-dev
16Text::Debian -- Details of package libsystemd-journal-dev in wheezy ..., URL::=https://packages.debian.org/wheezy-backports/libsystemd-journal-dev
17Text::Debian -- Details of package libsystemd-journal-dev in sid, URL::=https://packages.debian.org/sid/libsystemd-journal-dev
18Text::Debian -- Details of package libsystemd-journal-dev in jessie, URL::=https://packages.debian.org/jessie/libsystemd-journal-dev
19Text::Ubuntu  Details of package libsystemd-journal-dev in trusty, URL::=https://packages.ubuntu.com/trusty/libsystemd-journal-dev
20Text::libsystemd-journal-dev : Utopic (14.10) : Ubuntu - Launchpad, URL::=https://launchpad.net/ubuntu/utopic/%2Bpackage/libsystemd-journal-dev
21Text::Debian -- Details of package libghc-libsystemd-journal-dev in jessie, URL::=https://packages.debian.org/jessie/libghc-libsystemd-journal-dev
22Text::Advertise on JournalDev | BuySellAds, URL::=https://buysellads.com/buy/detail/231824
23Text::JournalDev | LinkedIn, URL::=https://www.linkedin.com/groups/JournalDev-6748558
24Text::How to install libsystemd-journal-dev package in Ubuntu Trusty, URL::=https://www.howtoinstall.co/en/ubuntu/trusty/main/libsystemd-journal-dev/
25Text::[global] auth supported = cephx ms bind ipv6 = true [mon] mon data ..., URL::=https://zooi.widodh.nl/ceph/ceph.conf
26Text::UbuntuUpdates - Package "libsystemd-journal-dev" (trusty 14.04), URL::=https://www.ubuntuupdates.org/libsystemd-journal-dev
27Text::[Journal]Dev'err - Cursus Honorum - Enjin, URL::=https://cursushonorum.enjin.com/holonet/m/23958869/viewthread/13220130-journaldeverr/post/last

这就是在Java程序中谷歌搜索的全部内容,谨慎使用它,因为如果你的电脑有异常流量,谷歌很有可能会屏蔽你。

Published At
Categories with 技术
Tagged with
comments powered by Disqus