...making Linux just a little more fun!
J. Bakshi [j.bakshi at unlimitedmail.org]
Dear all,
Like catdoc ( to read .doc) is there any command to read .odt from command line ? did a lot googling but not found any such command like catdoc.
On the other hand I have found that .odt is actually stored in zip format. So I have executed unzip on a .odt and It successfully extracted a lot of files including "content.xml" which actually have the content
Is there any tool which can extract the plain text from .xml ?
Please suggest.
The content.xml looks like
<office:document-content office:version="1.2"> <office:scripts/> - <office:font-face-decls> <style:font-face style:name="Times New Roman" svg:font-family="'Times New Roman'" style:font-family-generic="roman" style:font-pitch="variable"/> <style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss" style:font-pitch="variable"/> <style:font-face style:name="Arial1" svg:font-family="Arial" style:font-family-generic="system" style:font-pitch="variable"/> </office:font-face-decls> <office:automatic-styles/> - <office:body> - <office:text> - <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/> </text:sequence-decls> <text:p text:style-name="Standard">This is a test </text:p> </office:text> </office:body> </office:document-content>
Note the content
<text:p text:style-name="Standard">This is a test </text:p>
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Tue, 09 Jun 2009, J. Bakshi wrote:
> Like catdoc ( to read .doc) is there any command to read .odt from > command line ? did a lot googling but not found any such command like > catdoc.
Try unoconv.
Kapil. --
J. Bakshi [j.bakshi at unlimitedmail.org]
On Tue, 9 Jun 2009 21:45:42 +0530 Kapil Hari Paranjape <kapil@imsc.res.in> wrote:
> Hello, > > On Tue, 09 Jun 2009, J. Bakshi wrote: > > Like catdoc ( to read .doc) is there any command to read .odt from > > command line ? did a lot googling but not found any such command > > like catdoc. > > Try unoconv.
Thanks but my requirement is little different. There is an indexer running on a remote server. the site is powered by typo3. There is already catdoc and its associated command tool installed to read .doc, .xls, .pdf etc and do the indexing. I need one more tool which can also read .odt. unoconv needs openoffice itself. my requirement can also be solved if there is any CLI tool which can simply extract the plain text content from .xml as I have already found that .odt saves the content in content.xml.
Francis Daly [francis at daoine.org]
On Tue, Jun 09, 2009 at 09:11:52PM +0530, J. Bakshi wrote:
Hi there,
> Like catdoc ( to read .doc) is there any command to read .odt from > command line ? did a lot googling but not found any such command like > catdoc.
I use the "o3read" suite wrapped in a script which does, essentially,
unzip -p "$input" content.xml | o3totxt
When I google for the words
read .odt from command line
the fifth link included a pointer to https://stosberg.net/odt2txt/, and claims that a Debian package is available.
apt-get install odt2txt
gives me a new command which preserves more formatting than o3totxt did.
I may well switch to using that instead now.
I only tested these on older .odt files. Perhaps new ones work less well.
> Is there any tool which can extract the plain text from .xml ?
I usually use "xmlstarlet" for this; occasionally the input needs to be processed to help xmlstarlet understand it.
I use that for a limited rss reader, as well for a gmail notifier.
f
-- Francis Daly francis@daoine.org
J. Bakshi [j.bakshi at unlimitedmail.org]
On Tue, 9 Jun 2009 18:09:15 +0100 Francis Daly <francis@daoine.org> wrote:
> On Tue, Jun 09, 2009 at 09:11:52PM +0530, J. Bakshi wrote: > > Hi there,
Hello,
thanks for your solution. meantime I have found some thing great !!!
abiword --to=txt myfile.odt
and this will create myfile.txt Next step is just " cat myfile.txt"
with best wishes