Dealing with specific lines when parsing XML docs in Java
I have a huge xml file from wiktionary that I need to parse for a class project. I only need to extract data from a set of 200 lines, which start at line 395,000. How would I go about only scanning that small number of lines? Is there some sort of built in property for line number?
If line boundaries are significant in your data then it's not true XML. Accept it for what it is, a line-oriented file, and start by processing it using line-oriented text tools. Use these to extract the XML (if you can), and then pass this XML to an XML parser.
There is no built in property for line numbers.
If you want to look at all of the data from line 395,000 to 395,200 programatically, you can do so by counting new line characters.
Each line in the file ends with a new line ("\n"), so you could count 349,999 of them, and then look at the data until you see 200 more.