
20/06/10, 17:01:55
|
|
Miembro del foro
|
|
Fecha de registro: may 2010
Mensajes: 306
Modelo de smartphone: Samsung Galaxy S8
Tu operador: Movistar
|
|
¿Como obtengo datos de una web?
Buenas,
Estoy ya desesperado buscando info de como obtener datos de una web y tratarlos. La historia es que quiero hacer una aplicacion que haga una petición a una web y pueda obtener datos y tratarlos, como si se tratara de cualquier usuario que desde su pc selecciona valores de combos y posteriormente hace la consulta.
A ver si alguien puede ayudarme y darme las claves para documentarme.
Saludos.
EDITO: He encontrado esto, pero no se como adaptar esto para consultar otras webs...
Código:
public class OptionScraper {
// example XPATH queries in the form of strings - will be used later
private static final String NAME_XPATH = "//div[@class='yfi_quote']/div[@class='hd']/h2";
private static final String TIME_XPATH = "//table[@id='time_table']/tbody/tr/td[@class='yfnc_tabledata1']";
private static final String PRICE_XPATH = "//table[@id='price_table']//tr//span";
// TagNode object, its use will come in later
private static TagNode node;
// a method that helps me retrieve the stock option's data based off the name (i.e. GOUAA is one of Google's stock options)
public static Option getOptionFromName(String name) throws XPatherException, ParserConfigurationException,SAXException, IOException, XPatherException {
// the URL whose HTML I want to retrieve and parse
String option_url = "http://finance.yahoo.com/q?s=" + name.toUpperCase();
// this is where the HtmlCleaner comes in, I initialize it here
HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties props = cleaner.getProperties();
props.setAllowHtmlInsideAttributes(true);
props.setAllowMultiWordAttributes(true);
props.setRecognizeUnicodeChars(true);
props.setOmitComments(true);
// open a connection to the desired URL
URL url = new URL(option_url);
URLConnection conn = url.openConnection();
//use the cleaner to "clean" the HTML and return it as a TagNode object
node = cleaner.clean(new InputStreamReader(conn.getInputStream()));
// once the HTML is cleaned, then you can run your XPATH expressions on the node, which will then return an array of TagNode objects (these are returned as Objects but get casted below)
Object[] info_nodes = node.evaluateXPath(NAME_XPATH);
Object[] time_nodes = node.evaluateXPath(TIME_XPATH);
Object[] price_nodes = node.evaluateXPath(PRICE_XPATH);
// here I just do a simple check to make sure that my XPATH was correct and that an actual node(s) was returned
if (info_nodes.length > 0) {
// casted to a TagNode
TagNode info_node = (TagNode) info_nodes[0];
// how to retrieve the contents as a string
String info = info_node.getChildren().iterator().next().toString().trim();
// some method that processes the string of information (in my case, this was the stock quote, etc)
processInfoNode(o, info);
}
if (time_nodes.length > 0) {
TagNode time_node = (TagNode) time_nodes[0];
String date = time_node.getChildren().iterator().next().toString().trim();
// date returned in 15-Jan-10 format, so this is some method I wrote to just parse that string into the format that I use
processDateNode(o, date);
}
if (price_nodes.length > 0) {
TagNode price_node = (TagNode) price_nodes[0];
double price = Double.parseDouble(price_node.getChildren().iterator().next().toString().trim());
o.setPremium(price);
}
return o;
}
}
Última edición por systemx2 Día 21/06/10 a las 12:06:48.
|