HTCMania - Ver Mensaje Individual - [ CONSULTA ] ¿Como obtengo datos de una web?

systemx2 · #1 20/06/10, 17:01:55

Buenas,

Estoy ya desesperado buscando info de como obtener datos de una web y tratarlos. La historia es que quiero hacer una aplicacion que haga una petición a una web y pueda obtener datos y tratarlos, como si se tratara de cualquier usuario que desde su pc selecciona valores de combos y posteriormente hace la consulta.
A ver si alguien puede ayudarme y darme las claves para documentarme.
Saludos.

EDITO: He encontrado esto, pero no se como adaptar esto para consultar otras webs...

Código:

public class OptionScraper {

    // example XPATH queries in the form of strings - will be used later
    private static final String NAME_XPATH = "//div[@class='yfi_quote']/div[@class='hd']/h2";

    private static final String TIME_XPATH = "//table[@id='time_table']/tbody/tr/td[@class='yfnc_tabledata1']";

    private static final String PRICE_XPATH = "//table[@id='price_table']//tr//span";

    // TagNode object, its use will come in later
    private static TagNode node;

    // a method that helps me retrieve the stock option's data based off the name (i.e. GOUAA is one of Google's stock options)
    public static Option getOptionFromName(String name) throws XPatherException, ParserConfigurationException,SAXException, IOException, XPatherException {

        // the URL whose HTML I want to retrieve and parse
        String option_url = "http://finance.yahoo.com/q?s=" + name.toUpperCase();

        // this is where the HtmlCleaner comes in, I initialize it here
        HtmlCleaner cleaner = new HtmlCleaner();
        CleanerProperties props = cleaner.getProperties();
        props.setAllowHtmlInsideAttributes(true);
        props.setAllowMultiWordAttributes(true);
        props.setRecognizeUnicodeChars(true);
        props.setOmitComments(true);

        // open a connection to the desired URL
        URL url = new URL(option_url);
        URLConnection conn = url.openConnection();

        //use the cleaner to "clean" the HTML and return it as a TagNode object
        node = cleaner.clean(new InputStreamReader(conn.getInputStream()));

        // once the HTML is cleaned, then you can run your XPATH expressions on the node, which will then return an array of TagNode objects (these are returned as Objects but get casted below)
        Object[] info_nodes = node.evaluateXPath(NAME_XPATH);
        Object[] time_nodes = node.evaluateXPath(TIME_XPATH);
        Object[] price_nodes = node.evaluateXPath(PRICE_XPATH);

        // here I just do a simple check to make sure that my XPATH was correct and that an actual node(s) was returned
        if (info_nodes.length > 0) {
            // casted to a TagNode
            TagNode info_node = (TagNode) info_nodes[0];
            // how to retrieve the contents as a string
            String info = info_node.getChildren().iterator().next().toString().trim();

            // some method that processes the string of information (in my case, this was the stock quote, etc)
            processInfoNode(o, info);
        }

        if (time_nodes.length > 0) {
            TagNode time_node = (TagNode) time_nodes[0];
            String date = time_node.getChildren().iterator().next().toString().trim();

            // date returned in 15-Jan-10 format, so this is some method I wrote to just parse that string into the format that I use
            processDateNode(o, date);
        }

        if (price_nodes.length > 0) {
            TagNode price_node = (TagNode) price_nodes[0];
            double price = Double.parseDouble(price_node.getChildren().iterator().next().toString().trim());
            o.setPremium(price);
        }

        return o;
    }
}