[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text
From: |
HJW |
Subject: |
[Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text |
Date: |
Wed, 9 Aug 2017 14:14:57 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 |
URL:
<http://savannah.gnu.org/bugs/?51707>
Summary: textscan seems to skip chunks of text
Project: GNU Octave
Submitted by: thrynae
Submitted on: Wed 09 Aug 2017 06:14:56 PM UTC
Category: Octave Function
Severity: 3 - Normal
Priority: 5 - Normal
Item Group: Matlab Compatibility
Status: None
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 4.2.0
Operating System: Microsoft Windows
_______________________________________________________
Details:
(context: I'm downloading webpages and merging specific content)
When opening the downloaded file in notepad++, all the text is there, but when
I use textscan, some text is missing. This seems to be stable behavior (same
text is missing each time). This is not the case under Matlab (tested on
R2017a and R2012b). My OS is a 64bit Windows 10, and my Octave version is
4.2.0.
As far as I can tell, this is not yet reported. Other files with longer lines
(77k on a single line) are not failing. I can't find any systematic reason for
this. This occurs in multiple files.
MWE:
filename='HB_SNG3.html';
if ~exist(filename,'file')
%download file
url='http://web.archive.org/web/20170807165834/https://www.bible.com/nl/bible/75/SNG.3.htb';
urlwrite(url,filename);
end
%load file
fid=fopen(filename,'rt','n');
data=textscan(fid,'%s','Delimiter','\n');
fclose(fid);
%convert file to a single long string
data=data{1};data(:,2)={' '};data=data';data=data(:)';data=cell2mat(data);
%remove the parts of the webpage that are not relevant for my goal.
pattern='<div class="book bk';
idx=strfind(data,pattern);
data=data(idx(1):end);
pattern='</div><div class="version-copyright"';
idx=strfind(data,pattern);
if isempty(idx),error('this is possibly a bug'),end
data=data(1:(idx(end)-1));
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?51707>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text,
HJW <=
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, Mike Miller, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, HJW, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, Kai Torben Ohlhus, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, Mike Miller, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, Kai Torben Ohlhus, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, HJW, 2017/08/24
- [Octave-bug-tracker] [bug #51707] textscan seems to skip chunks of text, Markus Mützel, 2017/08/25
- [Octave-bug-tracker] [bug #51707] textscan skips chunks of text on files with newline line endings, Mike Miller, 2017/08/25
- [Octave-bug-tracker] [bug #51707] textscan skips chunks of text on files with newline line endings, Markus Mützel, 2017/08/25
- [Octave-bug-tracker] [bug #51707] textscan skips chunks of text on files with newline line endings, Mike Miller, 2017/08/25